-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathtidycensus_intro.Rmd
274 lines (189 loc) · 9.64 KB
/
tidycensus_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
title: "Pragmatic Approach to Census Analysis"
author: "Jamaal Green"
date: "September 14, 2017"
output:
revealjs::revealjs_presentation:
theme: "black"
transition: slide
self_contained: false
reveal_plugins: ["zoom"]
incremental: true
---
<script
src="https://code.jquery.com/jquery-3.1.1.min.js"
integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8="
crossorigin="anonymous"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.10.4/jquery-ui.min.js"></script>
<link href="https://ajax.googleapis.com/ajax/libs/jqueryui/1.10.4/themes/smoothness/jquery-ui.min.css" rel="stylesheet" type="text/css"/>
#Introduction
```{r, message=FALSE, warning=FALSE, include=FALSE}
library(knitr)
```
##About Me
* PhD Candidate in Urban Studies and Planning
* My dissertation is examining industrial zoning and labor market change
* I use a pretty wide array of census products for work (ACS, PUMS, LEHD)
##A Pragmatic Approach
"Let the question guide your method"
Likewise...
Let your problems guide your tools
##What's your workflow normally resemble?
The steps many social data analysts and GIS user have to make:
* Tabular Data collection/import (for us clearly varied Census products)
* Spatial Data collection(Tigerline files, anyone?)
* Tabular data cleaning, munging, and joins
* Table to spatial data joins (we've all done this in Arc with moderate success)
* If making a maps...spatial processing (clips, intersections, spatial joins)
* Other visualizations and report writing
#Enter R
##What is R?
- A powerful language
- Application
- "Do it all" workbench
##But why R?
- It's free
- It's fast
- It's data type agnostic (read any variety of text files, .shp. GEOJSON, GEOTIFF)
- Massive number of packages for statistical or spatial analysis and visualization
- Many things that are hard or slow in other applications (table joins in Arc, anyone?) are fast in R
- Large, helpful online community and growing variety of books/guides/courses
- IT'S FREE
##But why should I?
Has the following ever happened to you?
- Need to download multiple variables over multiple years and you get a data folder filled with ambiguously named tables that you end up deleting anyway?
- Had to change your geography of interest on short notice and then go through the time consuming process of redownloading and processing?
- Attempted to rename a column in ArcMap (yes, I know this is now available in ArcPro)?
##Let's Be Pragmatic
<section style="text-align: left;">
These recurring challenges can be better addressed (saving yourself precious time) by learning a little bit of R
Let's take an example...
</section>
#Tidycensus...A Better Way
##Tidycensus...one stop shop for ACS data
- R package authored by Prof. Kyle Walker at TCU to make gathering and visualizing census data easier
- The package uses the census API to call ACS and decennial data as well as ACS Data Profile tables
- Data is returned in either wide or long format and there is an option to join the data to its appropriate Tigerline geometry
##A quick example...
Our assignment: Get latest 5 year MHI for Multnomah County at Tract Level and graph the results (as an added bonus, and in the interest of transparency, let's include CVs)
```
if(!require(pacman)){install.packages("pacman"); library(pacman)}
p_load(ggplot2, tidycensus, dplyr)
acs_key <- Sys.getenv("CENSUS_API_KEY")
#Enter the variables and geographies below
census_title <- c("Median Household Income by County:\n
Coefficient of Variation")
census_var <- c("B19013_001E")
census_geog <- c("county")
census_state <- c("or")
acs_data <- get_acs(geography = census_geog, variables =
census_var, state = census_state, output = "wide")
#Make more readable column names
acs_data <- acs_data %>% rename(MHI_est = B19013_001E , MHI_moe = B19013_001M)
#Calculate the SE, CV for future reference
acs_data <- acs_data %>% mutate(se = MHI_moe/1.645, cv = (se/MHI_est)*100)
#Plot Percentages with Derived MOE
acs_plot <- acs_data %>%
ggplot(aes(x = MHI_est, y = reorder(NAME, MHI_est))) +
geom_point(color = "black", size = 2) +
geom_errorbarh(aes(xmin = MHI_est - MHI_moe, xmax = MHI_est + MHI_moe )) +
labs(title = paste(census_title),
subtitle = paste0("Oregon 2011-2015 American Community Survey"),
y = "",
x = "Median Household Income") +
scale_x_continuous(labels = scales::dollar) + theme_minimal() +
theme(panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank())
plot(acs_plot)
```
##Our Output
```{r, echo=FALSE, message=FALSE, warning=FALSE}
if(!require(pacman)){install.packages("pacman"); library(pacman)}
p_load(ggplot2, tidycensus, dplyr)
#key <- census_api_key("c67f1b3134300374d51a55c543649f843fb7d2b3",install = TRUE)
acs_key <- Sys.getenv("CENSUS_API_KEY")
#Enter the variables and geographies below
census_title <- c("Median Household Income by County:\n Coefficient of Variation")
census_var <- c("B19013_001E")
census_geog <- c("county")
census_state <- c("or")
acs_data <- get_acs(geography = census_geog, variables = census_var, state = census_state,
output = "wide")
#Make more readable column names
acs_data <- acs_data %>% rename(MHI_est = B19013_001E , MHI_moe = B19013_001M)
#Calculate the SE, CV for future reference
acs_data <- acs_data %>% mutate(se = MHI_moe/1.645, cv = (se/MHI_est)*100)
#Plot Percentages with Derived MOE
acs_plot <- acs_data %>%
ggplot(aes(x = MHI_est, y = reorder(NAME, MHI_est))) +
geom_point(color = "black", size = 2) +
geom_errorbarh(aes(xmin = MHI_est - MHI_moe, xmax = MHI_est + MHI_moe )) +
labs(title = paste(census_title),
subtitle = paste0("Oregon 2011-2015 American Community Survey"),
y = "",
x = "Median Household Income") +
scale_x_continuous(labels = scales::dollar) + theme_minimal() +
theme(panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank())
print(acs_plot)
```
#Mapping It Out
Your Assignment: map median household income at the census tract level for the PDX MSA OR counties and for the counties at the state level
##R as a GIS- tigris and sf
tigris- a package that will download tigerline shapefiles
simple features- uses well known text to signify geometry allowing for spatial objects to be treated as dataframes
##Tract Processing tidyverse style
```
if(!require(pacman)){install.packages("pacman"); library(pacman)}
p_load(sf, tigris, viridis, ggthemes, ggplot2, tidycensus, stringr, dplyr)
options(tigris_class = "sf", tigris_use_cache = TRUE)
acs_key <- Sys.getenv("CENSUS_API_KEY")
mhi_tables <- c("B19013_001")
#download tracts and county, get the tracts for PDX Metro counties and counties for the state
mhi_tract <- get_acs(geography = "tract", variables = mhi_tables, state = "OR", geometry = TRUE)
mhi_tract <- mhi_tract %>% mutate(CountyFIPS = str_sub(GEOID, 1, 5))
metro_counties <- c("41051", "41005", "41009", "41067", "41071")
metro_tract <- mhi_tract %>% filter(CountyFIPS %in% metro_counties)
or_mhi_county <- get_acs(geography ="county", variables = mhi_tables, state = "OR", geometry = TRUE)
```
##Our Tract Map Set Up
```
p1 <- ggplot() + geom_sf(data = metro_tract, aes(fill = estimate)) +
coord_sf(datum = NA) +
theme(plot.title = element_text(size = 16, face = "bold", margin = margin(b=10))) +
theme(plot.subtitle = element_text(size = 14, margin = margin(b = -20))) +
theme(plot.caption = element_text(size = 9, margin = margin(t = -15), hjust = 0)) +
scale_fill_viridis(labels = scales::dollar, name = "MHI Estimate") +
labs( caption = "Source: US Census Bureau ACS (2011-2015)",
title = "Median Household Income for PDX Metro\n at the census tract level",
subtitle = "An R 'sf' Example") + theme_minimal()
```
##Our Output
```{r, echo=FALSE, message=TRUE, warning=FALSE}
if(!require(pacman)){install.packages("pacman"); library(pacman)}
p_load(sf, tigris, viridis, ggthemes, ggplot2, tidycensus, stringr, dplyr)
options(tigris_class = "sf", tigris_use_cache = TRUE)
#stretch example is MHI at census tract level for METRO counties
#gonna collect census tracts with geography, sum up to counties for graph
#key <- census_api_key("c67f1b3134300374d51a55c543649f843fb7d2b3",install = TRUE)
acs_key <- Sys.getenv("CENSUS_API_KEY")
vartabs <- load_variables(year = 2015, dataset = "acs5")
mhi_tables <- c("B19013_001")
#download tracts and county, get the tracts for PDX Metro counties and counties for the state
mhi_tract <- get_acs(geography = "tract", variables = mhi_tables, state = "OR", geometry = TRUE)
mhi_tract <- mhi_tract %>% mutate(CountyFIPS = str_sub(GEOID, 1, 5))
metro_counties <- c("41051", "41005", "41009", "41067", "41071")
metro_tract <- mhi_tract %>% filter(CountyFIPS %in% metro_counties)
or_mhi_county <- get_acs(geography ="county", variables = mhi_tables, state = "OR", geometry = TRUE)
#Metro tract map
p1 <- ggplot() + geom_sf(data = metro_tract, aes(fill = estimate)) +
coord_sf(datum = NA) +
theme(plot.title = element_text(size = 16, face = "bold", margin = margin(b=10))) +
theme(plot.subtitle = element_text(size = 14, margin = margin(b = -20))) +
theme(plot.caption = element_text(size = 9, margin = margin(t = -15), hjust = 0)) +
scale_fill_viridis(labels = scales::dollar, name = "MHI Estimate") +
labs( caption = "Source: US Census Bureau ACS (2011-2015)",
title = "Median Household Income for PDX Metro\n at the census tract level",
subtitle = "An R 'sf' Example") + theme_minimal()
plot(p1)
```