Data analytics case

23 views
Skip to first unread message

Rick

unread,
Sep 2, 2015, 11:29:38 AM9/2/15
to Data Management - watson, David Schuff
Hi

Over the last few years, I have added R to the book to support the demand for data analytics.

Here is a case that can be used as a capstone to the R material.


The core R code follows. This is not the answer, but just code for checking that the data are available and examples of how to use the distance functions.

I plan to use this as a group assignment. 

Cheers  

Rick
 

library(readr)
library(DBI)
library(geosphere)
library(ggmap)
# get the employee data
t <-  read_delim(url,delim=',')

# access the zip code databases
conn <- dbConnect(RMySQL::MySQL(), "wallaby.terry.uga.edu", dbname="zipcode", user="student", password="student")

# Query the database and create file z for use with R
loc1 <-  dbGetQuery(conn,"SELECT zipLat, zipLon from zip where zip = 30606;")
loc2 <-  dbGetQuery(conn,"SELECT zipLat, zipLon from zip where zip = 08889;")

# compute the great circle distance in miles
distGeo(c(loc1$zipLat,loc1$zipLon),c(loc2$zipLat,loc2$zipLon))*0.000621371

# compute the road distance in miles
mapdist("Athens, GA", "Whitehouse Station, NJ", mode = c("driving"), output = c("simple", "all"))[[5]]

Rick

unread,
Dec 17, 2016, 4:48:59 PM12/17/16
to Data Management - watson
Hi

David Eargle suggested some improvements to the exercise I posted some months back.

The R code I provided requires accessing a UGA server. David  found a csv of US postal addresses that can be used instead.


He revised the R code and highlighted the changes:

library(readr)
library(DBI)
library(geosphere)
library(ggmap)
library(sqldf)


# get the employee data
t <-  read_delim(url,delim=',')


# access the zip code databases
col.names <- c('country','zip','place_name','state_name','state_code','county_name','count_code','subdivision_name','subdivision_code','zipLat','zipLon','acc')
zip <- read.delim("<path to extracted US.txt>", header=FALSE, col.names=col.names)

#conn <- dbConnect(RMySQL::MySQL(), "wallaby.terry.uga.edu", dbname="zipcode", user="student", password="student")

# Query the database and create file z for use with R
#loc1 <-  dbGetQuery(conn,"SELECT zipLat, zipLon from zip where zip = 30606;")
#loc2 <-  dbGetQuery(conn,"SELECT zipLat, zipLon from zip where zip = 08889;")
loc1 <-  sqldf("SELECT zipLat, zipLon from zip where zip = 30606;")
loc2 <-  sqldf("SELECT zipLat, zipLon from zip where zip = 08889;") 

# compute the great circle distance in miles
distGeo(c(loc1$zipLat,loc1$zipLon),c(loc2$zipLat,loc2$zipLon))*0.000621371

# compute the road distance in miles
mapdist("Athens, GA", "Whitehouse Station, NJ", mode = c("driving"), output = c("simple", "all"))[[5]]

Cheers  

Rick
 
Reply all
Reply to author
Forward
0 new messages