1 Introduction

We are going to practice using ggplot today, focusing on the data, aesthetic, and geom layers. We are going to use data from the TidyTuesday project. For this recitation, we are going to use the Giant Pumpkins data which is collected from the Great Pumpkin Commonwealth.

At the end of of this module you will create of of this descriptive plots

1.1 Libraries

library(tidyverse)
library(lubridate)

1.2 Read in data

pumpkins_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-19/pumpkins.csv')

## Rows: 28065 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): id, place, weight_lbs, grower_name, city, state_prov, country, gpc...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2 Full pipe and plot

pumpkins_raw %>% 
  separate(col = "id", into = c("year", "type")) %>%
  filter(type == "P" & place == "1") %>% 
  mutate(weight_lbs = str_remove(weight_lbs, ",") ) %>% 
  mutate(weight_lbs = as.numeric(weight_lbs)) %>% 
  mutate(year = ymd(year, truncated = 2)) %>% 
    ggplot(aes(year, weight_lbs)) + 
    geom_point() + 
    geom_line()

3 Playing around

3.0.1 Try using different geoms besides `geom_point()` and `geom_line()`. Which might make sense in this situation?

Showing wrangling first then will plot after.

pumpkins_raw %>% 
  separate(col = "id", into = c("year", "type")) %>%
  filter(type == "P" & place == "1") %>% 
  mutate(weight_lbs = str_remove(weight_lbs, ",") ) %>% 
  mutate(weight_lbs = as.numeric(weight_lbs)) %>% 
  mutate(year = ymd(year, truncated = 2L))

## # A tibble: 9 × 15
##   year       type  place weight_…¹ growe…² city  state…³ country gpc_s…⁴ seed_…⁵
##   <date>     <chr> <chr>     <dbl> <chr>   <chr> <chr>   <chr>   <chr>   <chr>  
## 1 2013-01-01 P     1         2032  Mathis… Napa  Califo… United… Uesugi… "2009 …
## 2 2014-01-01 P     1         2324. Meier,… Pfun… Other   Switze… Europa… "2009 …
## 3 2015-01-01 P     1         2230. Wallac… Gree… Rhode … United… SNGPG … "2009 …
## 4 2016-01-01 P     1         2625. Willem… Deur… East F… Belgium Europa… "2145 …
## 5 2017-01-01 P     1         2363  Hollan… Sumn… Washin… United… Safewa… "2145.…
## 6 2018-01-01 P     1         2528  Geddes… Bosc… New Ha… United… Deerfi… "1911 …
## 7 2019-01-01 P     1         2517  Haist,… Clar… New Yo… United… Ohio V… "2005 …
## 8 2020-01-01 P     1         2594. Paton,… Ever… England United… Royal … "1875 …
## 9 2021-01-01 P     1         2703. Cutrup… Radd… Tuscany Italy   Campio… "1885.…
## # … with 5 more variables: pollinator_father <chr>, ott <chr>,
## #   est_weight <chr>, pct_chart <chr>, variety <chr>, and abbreviated variable
## #   names ¹weight_lbs, ²grower_name, ³state_prov, ⁴gpc_site, ⁵seed_mother
## # ℹ Use `colnames()` to see all variable names

Can also parse the date in a slightly different way.

(pumpkins_to_plot <- pumpkins_raw %>%
  separate(col = "id", into = c("year", "type")) %>%
  filter(type == "P" & place == "1") %>% 
  mutate(weight_lbs = str_remove(weight_lbs, ",") ) %>% 
  mutate(weight_lbs = as.numeric(weight_lbs)) %>% 
  mutate(year = as.POSIXct(year, format = "%Y")) %>%
  mutate(year = as.Date(year, format = "%Y")))

## # A tibble: 9 × 15
##   year       type  place weight_…¹ growe…² city  state…³ country gpc_s…⁴ seed_…⁵
##   <date>     <chr> <chr>     <dbl> <chr>   <chr> <chr>   <chr>   <chr>   <chr>  
## 1 2013-09-20 P     1         2032  Mathis… Napa  Califo… United… Uesugi… "2009 …
## 2 2014-09-20 P     1         2324. Meier,… Pfun… Other   Switze… Europa… "2009 …
## 3 2015-09-20 P     1         2230. Wallac… Gree… Rhode … United… SNGPG … "2009 …
## 4 2016-09-20 P     1         2625. Willem… Deur… East F… Belgium Europa… "2145 …
## 5 2017-09-20 P     1         2363  Hollan… Sumn… Washin… United… Safewa… "2145.…
## 6 2018-09-20 P     1         2528  Geddes… Bosc… New Ha… United… Deerfi… "1911 …
## 7 2019-09-20 P     1         2517  Haist,… Clar… New Yo… United… Ohio V… "2005 …
## 8 2020-09-20 P     1         2594. Paton,… Ever… England United… Royal … "1875 …
## 9 2021-09-20 P     1         2703. Cutrup… Radd… Tuscany Italy   Campio… "1885.…
## # … with 5 more variables: pollinator_father <chr>, ott <chr>,
## #   est_weight <chr>, pct_chart <chr>, variety <chr>, and abbreviated variable
## #   names ¹weight_lbs, ²grower_name, ³state_prov, ⁴gpc_site, ⁵seed_mother
## # ℹ Use `colnames()` to see all variable names

3.0.2 Can you color all the lines blue?

pumpkins_to_plot %>%
  ggplot(aes(x = year, y = weight_lbs)) +
  geom_line(color = "blue") +
  geom_point()

3.0.3 Can you color the data based on year?

pumpkins_to_plot %>%
  ggplot(aes(x = year, y = weight_lbs)) +
  geom_line() +
  geom_point(aes(color = year))

Because date is a continuous variable, we are getting a continuous color scale, which might not be what we want. We can get around it by setting date as a factor.

pumpkins_to_plot %>%
  ggplot(aes(x = year, y = weight_lbs)) +
  geom_line() +
  geom_point(aes(color = as.factor(year)))

3.0.4 Can you color and change shape based on country?

pumpkins_to_plot %>%
  ggplot(aes(x = year, y = weight_lbs)) +
  geom_line() +
  geom_point(aes(color = as.factor(year), shape = country))

3.0.5 Can you make a plot showing the distribution of weights of all giant pumpkins entered in 2021?

pumpkins_2021 <- pumpkins_raw %>% 
  separate(col = "id", into = c("year", "type")) %>%
  filter(type == "P" & year == 2021) %>% 
  mutate(weight_lbs = str_remove(weight_lbs, ",") ) %>% 
  mutate(weight_lbs = as.numeric(weight_lbs)) %>% 
  mutate(year = ymd(year, truncated = 2L))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

pumpkins_2021 %>%
  ggplot(aes(x = weight_lbs)) +
  geom_density()

## Warning: Removed 1 rows containing non-finite values (stat_density).

3.0.6 Can you make a boxplot showing the distribution of weights of all giant pumpkins across all years?

Also can you add all the datapoints on top of the boxplot? Is this a good idea? Might there be a better geom to use than a boxplot?

pumpkins_all <- pumpkins_raw %>% 
  separate(col = "id", into = c("year", "type")) %>%
  filter(type == "P") %>% 
  mutate(weight_lbs = str_remove(weight_lbs, ",") ) %>% 
  mutate(weight_lbs = as.numeric(weight_lbs)) %>% 
  mutate(year = ymd(year, truncated = 2L))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

pumpkins_all %>%
  ggplot(aes(x = as.factor(year), y = weight_lbs)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha = 0.1)

## Warning: Removed 9 rows containing non-finite values (stat_boxplot).

## Warning: Removed 9 rows containing missing values (geom_point).

pumpkins_all %>%
  ggplot(aes(x = as.factor(year), y = weight_lbs)) +
  geom_violin(draw_quantiles = 0.5)

## Warning: Removed 9 rows containing non-finite values (stat_ydensity).

LS0tCnRpdGxlOiAiZ2dwbG90IDEwMSBwcmFjdGljZSBzb2x1dGlvbnMiCmF1dGhvcjogIkRhbmllbCBRdWlyb3ogYW5kIHlvdSIKZGF0ZTogIjkvMTMvMjAyMiIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICB0b2M6IHRydWUKICAgIHRvY19kZXB0aDogNAogICAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKICAgIHRoZW1lOiBmbGF0bHkKICAgIGNvZGVfZG93bmxvYWQ6IHRydWUKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFKQpgYGAKCiMgSW50cm9kdWN0aW9uCldlIGFyZSBnb2luZyB0byBwcmFjdGljZSB1c2luZyBnZ3Bsb3QgdG9kYXksIGZvY3VzaW5nIG9uIHRoZSBkYXRhLAphZXN0aGV0aWMsIGFuZCBnZW9tIGxheWVycy4gV2UgYXJlIGdvaW5nIHRvIHVzZSBkYXRhIGZyb20gdGhlIFtUaWR5VHVlc2RheV0oaHR0cHM6Ly93d3cudGlkeXR1ZXNkYXkuY29tLykgcHJvamVjdC4gRm9yIHRoaXMgcmVjaXRhdGlvbiwgd2UgYXJlIGdvaW5nIHRvIHVzZSB0aGUgW0dpYW50IFB1bXBraW5zXShodHRwczovL2dpdGh1Yi5jb20vcmZvcmRhdGFzY2llbmNlL3RpZHl0dWVzZGF5L3RyZWUvbWFzdGVyL2RhdGEvMjAyMS8yMDIxLTEwLTE5KQpkYXRhIHdoaWNoIGlzIGNvbGxlY3RlZCBmcm9tIHRoZSBbR3JlYXQgUHVtcGtpbiBDb21tb253ZWFsdGhdKGh0dHBzOi8vZ3BjMS5vcmcvKS4gCgpBdCB0aGUgZW5kIG9mIG9mIHRoaXMgbW9kdWxlIHlvdSB3aWxsIGNyZWF0ZSBvZiBvZiB0aGlzIGRlc2NyaXB0aXZlIHBsb3RzCgojIyBMaWJyYXJpZXMKCmBgYHtyIGxpYnJhcmllcywgd2FybmluZyA9IEZBTFNFLCBtZXNzYWdlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGx1YnJpZGF0ZSkKYGBgCgojIyBSZWFkIGluIGRhdGEKYGBge3J9CnB1bXBraW5zX3JhdyA8LSByZWFkcjo6cmVhZF9jc3YoJ2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9yZm9yZGF0YXNjaWVuY2UvdGlkeXR1ZXNkYXkvbWFzdGVyL2RhdGEvMjAyMS8yMDIxLTEwLTE5L3B1bXBraW5zLmNzdicpCmBgYAoKIyBGdWxsIHBpcGUgYW5kIHBsb3QKCmBgYHtyIHBpcGUgYW5kIHBsb3R9CnB1bXBraW5zX3JhdyAlPiUgCiAgc2VwYXJhdGUoY29sID0gImlkIiwgaW50byA9IGMoInllYXIiLCAidHlwZSIpKSAlPiUKICBmaWx0ZXIodHlwZSA9PSAiUCIgJiBwbGFjZSA9PSAiMSIpICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IHN0cl9yZW1vdmUod2VpZ2h0X2xicywgIiwiKSApICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IGFzLm51bWVyaWMod2VpZ2h0X2xicykpICU+JSAKICBtdXRhdGUoeWVhciA9IHltZCh5ZWFyLCB0cnVuY2F0ZWQgPSAyKSkgJT4lIAogICAgZ2dwbG90KGFlcyh5ZWFyLCB3ZWlnaHRfbGJzKSkgKyAKICAgIGdlb21fcG9pbnQoKSArIAogICAgZ2VvbV9saW5lKCkKYGBgCgojIFBsYXlpbmcgYXJvdW5kCiMjIyBUcnkgdXNpbmcgZGlmZmVyZW50IGdlb21zIGJlc2lkZXMgYGdlb21fcG9pbnQoKWAgYW5kIGBnZW9tX2xpbmUoKWAuIFdoaWNoIG1pZ2h0IG1ha2Ugc2Vuc2UgaW4gdGhpcyBzaXR1YXRpb24/CgpTaG93aW5nIHdyYW5nbGluZyBmaXJzdCB0aGVuIHdpbGwgcGxvdCBhZnRlci4KYGBge3J9CnB1bXBraW5zX3JhdyAlPiUgCiAgc2VwYXJhdGUoY29sID0gImlkIiwgaW50byA9IGMoInllYXIiLCAidHlwZSIpKSAlPiUKICBmaWx0ZXIodHlwZSA9PSAiUCIgJiBwbGFjZSA9PSAiMSIpICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IHN0cl9yZW1vdmUod2VpZ2h0X2xicywgIiwiKSApICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IGFzLm51bWVyaWMod2VpZ2h0X2xicykpICU+JSAKICBtdXRhdGUoeWVhciA9IHltZCh5ZWFyLCB0cnVuY2F0ZWQgPSAyTCkpIApgYGAKCkNhbiBhbHNvIHBhcnNlIHRoZSBkYXRlIGluIGEgc2xpZ2h0bHkgZGlmZmVyZW50IHdheS4KYGBge3J9CihwdW1wa2luc190b19wbG90IDwtIHB1bXBraW5zX3JhdyAlPiUKICBzZXBhcmF0ZShjb2wgPSAiaWQiLCBpbnRvID0gYygieWVhciIsICJ0eXBlIikpICU+JQogIGZpbHRlcih0eXBlID09ICJQIiAmIHBsYWNlID09ICIxIikgJT4lIAogIG11dGF0ZSh3ZWlnaHRfbGJzID0gc3RyX3JlbW92ZSh3ZWlnaHRfbGJzLCAiLCIpICkgJT4lIAogIG11dGF0ZSh3ZWlnaHRfbGJzID0gYXMubnVtZXJpYyh3ZWlnaHRfbGJzKSkgJT4lIAogIG11dGF0ZSh5ZWFyID0gYXMuUE9TSVhjdCh5ZWFyLCBmb3JtYXQgPSAiJVkiKSkgJT4lCiAgbXV0YXRlKHllYXIgPSBhcy5EYXRlKHllYXIsIGZvcm1hdCA9ICIlWSIpKSkKYGBgCgojIyMgQ2FuIHlvdSBjb2xvciBhbGwgdGhlIGxpbmVzIGJsdWU/CmBgYHtyfQpwdW1wa2luc190b19wbG90ICU+JQogIGdncGxvdChhZXMoeCA9IHllYXIsIHkgPSB3ZWlnaHRfbGJzKSkgKwogIGdlb21fbGluZShjb2xvciA9ICJibHVlIikgKwogIGdlb21fcG9pbnQoKQpgYGAKCiMjIyBDYW4geW91IGNvbG9yIHRoZSBkYXRhIGJhc2VkIG9uIHllYXI/CmBgYHtyfQpwdW1wa2luc190b19wbG90ICU+JQogIGdncGxvdChhZXMoeCA9IHllYXIsIHkgPSB3ZWlnaHRfbGJzKSkgKwogIGdlb21fbGluZSgpICsKICBnZW9tX3BvaW50KGFlcyhjb2xvciA9IHllYXIpKQpgYGAKCkJlY2F1c2UgZGF0ZSBpcyBhIGNvbnRpbnVvdXMgdmFyaWFibGUsIHdlIGFyZSBnZXR0aW5nIGEgY29udGludW91cyBjb2xvciBzY2FsZSwgd2hpY2ggbWlnaHQgbm90IGJlIHdoYXQgd2Ugd2FudC4gV2UgY2FuIGdldCBhcm91bmQgaXQgYnkgc2V0dGluZyBkYXRlIGFzIGEgZmFjdG9yLgpgYGB7cn0KcHVtcGtpbnNfdG9fcGxvdCAlPiUKICBnZ3Bsb3QoYWVzKHggPSB5ZWFyLCB5ID0gd2VpZ2h0X2xicykpICsKICBnZW9tX2xpbmUoKSArCiAgZ2VvbV9wb2ludChhZXMoY29sb3IgPSBhcy5mYWN0b3IoeWVhcikpKQpgYGAKCiMjIyBDYW4geW91IGNvbG9yIGFuZCBjaGFuZ2Ugc2hhcGUgYmFzZWQgb24gY291bnRyeT8KYGBge3J9CnB1bXBraW5zX3RvX3Bsb3QgJT4lCiAgZ2dwbG90KGFlcyh4ID0geWVhciwgeSA9IHdlaWdodF9sYnMpKSArCiAgZ2VvbV9saW5lKCkgKwogIGdlb21fcG9pbnQoYWVzKGNvbG9yID0gYXMuZmFjdG9yKHllYXIpLCBzaGFwZSA9IGNvdW50cnkpKQpgYGAKCiMjIyBDYW4geW91IG1ha2UgYSBwbG90IHNob3dpbmcgdGhlIGRpc3RyaWJ1dGlvbiBvZiB3ZWlnaHRzIG9mIGFsbCBnaWFudCBwdW1wa2lucyBlbnRlcmVkIGluIDIwMjE/CmBgYHtyfQpwdW1wa2luc18yMDIxIDwtIHB1bXBraW5zX3JhdyAlPiUgCiAgc2VwYXJhdGUoY29sID0gImlkIiwgaW50byA9IGMoInllYXIiLCAidHlwZSIpKSAlPiUKICBmaWx0ZXIodHlwZSA9PSAiUCIgJiB5ZWFyID09IDIwMjEpICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IHN0cl9yZW1vdmUod2VpZ2h0X2xicywgIiwiKSApICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IGFzLm51bWVyaWMod2VpZ2h0X2xicykpICU+JSAKICBtdXRhdGUoeWVhciA9IHltZCh5ZWFyLCB0cnVuY2F0ZWQgPSAyTCkpIAoKcHVtcGtpbnNfMjAyMSAlPiUKICBnZ3Bsb3QoYWVzKHggPSB3ZWlnaHRfbGJzKSkgKwogIGdlb21fZGVuc2l0eSgpCmBgYAoKIyMjIENhbiB5b3UgbWFrZSBhIGJveHBsb3Qgc2hvd2luZyB0aGUgZGlzdHJpYnV0aW9uIG9mIHdlaWdodHMgb2YgYWxsIGdpYW50IHB1bXBraW5zIGFjcm9zcyBhbGwgeWVhcnM/IApBbHNvIGNhbiB5b3UgYWRkIGFsbCB0aGUgZGF0YXBvaW50cyBvbiB0b3Agb2YgdGhlIGJveHBsb3Q/IElzIHRoaXMgYSBnb29kIGlkZWE/IE1pZ2h0IHRoZXJlIGJlIGEgYmV0dGVyIGdlb20gdG8gdXNlIHRoYW4gYSBib3hwbG90PwpgYGB7cn0KcHVtcGtpbnNfYWxsIDwtIHB1bXBraW5zX3JhdyAlPiUgCiAgc2VwYXJhdGUoY29sID0gImlkIiwgaW50byA9IGMoInllYXIiLCAidHlwZSIpKSAlPiUKICBmaWx0ZXIodHlwZSA9PSAiUCIpICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IHN0cl9yZW1vdmUod2VpZ2h0X2xicywgIiwiKSApICU+JSAKICBtdXRhdGUod2VpZ2h0X2xicyA9IGFzLm51bWVyaWMod2VpZ2h0X2xicykpICU+JSAKICBtdXRhdGUoeWVhciA9IHltZCh5ZWFyLCB0cnVuY2F0ZWQgPSAyTCkpIAoKcHVtcGtpbnNfYWxsICU+JQogIGdncGxvdChhZXMoeCA9IGFzLmZhY3Rvcih5ZWFyKSwgeSA9IHdlaWdodF9sYnMpKSArCiAgZ2VvbV9ib3hwbG90KG91dGxpZXIuc2hhcGUgPSBOQSkgKwogIGdlb21faml0dGVyKGFscGhhID0gMC4xKQoKcHVtcGtpbnNfYWxsICU+JQogIGdncGxvdChhZXMoeCA9IGFzLmZhY3Rvcih5ZWFyKSwgeSA9IHdlaWdodF9sYnMpKSArCiAgZ2VvbV92aW9saW4oZHJhd19xdWFudGlsZXMgPSAwLjUpCmBgYAoKCgo=

ggplot 101 practice solutions

Daniel Quiroz and you

9/13/2022

1 Introduction

1.1 Libraries

1.2 Read in data

2 Full pipe and plot

3 Playing around

3.0.1 Try using different geoms besides `geom_point()` and `geom_line()`. Which might make sense in this situation?

3.0.2 Can you color all the lines blue?

3.0.3 Can you color the data based on year?

3.0.4 Can you color and change shape based on country?

3.0.5 Can you make a plot showing the distribution of weights of all giant pumpkins entered in 2021?

3.0.6 Can you make a boxplot showing the distribution of weights of all giant pumpkins across all years?

ggplot 101 practice solutions

Daniel Quiroz and you

9/13/2022

1 Introduction

1.1 Libraries

1.2 Read in data

2 Full pipe and plot

3 Playing around

3.0.1 Try using different geoms besides geom_point() and geom_line(). Which might make sense in this situation?

3.0.2 Can you color all the lines blue?

3.0.3 Can you color the data based on year?

3.0.4 Can you color and change shape based on country?

3.0.5 Can you make a plot showing the distribution of weights of all giant pumpkins entered in 2021?

3.0.6 Can you make a boxplot showing the distribution of weights of all giant pumpkins across all years?

3.0.1 Try using different geoms besides `geom_point()` and `geom_line()`. Which might make sense in this situation?