Introduction
Today you are going to be practicing what you learned in the wrangling lesson. The more you practice
modifying your data the easier it becomes. Remember, there are many ways
to accomplish the same outcome. In the recitation solutions, I will show
you a few different ways to answer the prompts and you can see how they
differ, and use the ones that resonate with you.
Load data
To practice, we will be using some data I have extracted from Gapminder. I am linking to two
files that you can download to your computer, and then read them in like
we learned in class. When you go to the links below, click on the
Download Raw File icon at the top right of the file
- Data
on the happiness index for many countries for many years
- Data
on the life expectancy for many countries for many years
Explore your data
Write some code that lets you explore that is in these two
datasets.
How many observations there in each dataset?
What years do the data contain information for?
Modifying data
Create a new dataset for life_expectancy that only includes observed
data (i.e., remove the projected data after 2022).
Calculating summaries
What country has the highest average happiness index in 2022?
What about overall average highest index?
How many countries had an average life expectancy over 80 years in
2022?
What countries are in the top 10 percentile for happiness? What about
the bottom? What about for life expectancy? You can calculate this for
the most recent data, for the mean, or really for whatever you want.
Remember there are lots of ways to do this.
Click the button Show on the right if you need a
hint
# Hint - try using the functions in the `slice_()` family.
Which country has had their happiness index increase the most from
2012 to 2022? Which dropped the most?
Joining data
Try joining the happiness and life_expectancy datasets together and
use the different *_join()
functions so you can see how
they differ. Check their dimensions and look at them. Think about how
you might want to do different joins in different situations.
If you wanted to create a plot that allowed you to see the
correlation between happiness score and life expectancy in 2022, which
joined dataset would you use and why?
LS0tCnRpdGxlOiAiV3JhbmdsaW5nIHlvdXIgZGF0YSDwn6SgLCByZWNpdGF0aW9uIgphdXRob3I6ICJZb3UiCmRhdGU6ICJXZWVrIDQiCm91dHB1dDoKICBodG1sX2RvY3VtZW50OgogICAgdG9jOiB0cnVlCiAgICB0b2NfZGVwdGg6IDQKICAgIHRvY19mbG9hdDogdHJ1ZQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQogICAgdGhlbWU6IHJlYWRhYmxlCiAgICBjb2RlX2ZvbGRpbmc6IHNob3cKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFKQpgYGAKCiMjIEludHJvZHVjdGlvbgoKVG9kYXkgeW91IGFyZSBnb2luZyB0byBiZSBwcmFjdGljaW5nIHdoYXQgeW91IGxlYXJuZWQgaW4gdGhlIFt3cmFuZ2xpbmddKDA0X3dyYW5nbGluZy5odG1sKSBsZXNzb24uIFRoZSBtb3JlIHlvdSBwcmFjdGljZSBtb2RpZnlpbmcgeW91ciBkYXRhIHRoZSBlYXNpZXIgaXQgYmVjb21lcy4gUmVtZW1iZXIsIHRoZXJlIGFyZSBtYW55IHdheXMgdG8gYWNjb21wbGlzaCB0aGUgc2FtZSBvdXRjb21lLiBJbiB0aGUgcmVjaXRhdGlvbiBzb2x1dGlvbnMsIEkgd2lsbCBzaG93IHlvdSBhIGZldyBkaWZmZXJlbnQgd2F5cyB0byBhbnN3ZXIgdGhlIHByb21wdHMgYW5kIHlvdSBjYW4gc2VlIGhvdyB0aGV5IGRpZmZlciwgYW5kIHVzZSB0aGUgb25lcyB0aGF0IHJlc29uYXRlIHdpdGggeW91LgoKIyMjIExvYWQgZGF0YQoKVG8gcHJhY3RpY2UsIHdlIHdpbGwgYmUgdXNpbmcgc29tZSBkYXRhIEkgaGF2ZSBleHRyYWN0ZWQgZnJvbSBbR2FwbWluZGVyXShodHRwczovL3d3dy5nYXBtaW5kZXIub3JnLykuIEkgYW0gbGlua2luZyB0byB0d28gZmlsZXMgdGhhdCB5b3UgY2FuIGRvd25sb2FkIHRvIHlvdXIgY29tcHV0ZXIsIGFuZCB0aGVuIHJlYWQgdGhlbSBpbiBsaWtlIHdlIGxlYXJuZWQgaW4gY2xhc3MuIFdoZW4geW91IGdvIHRvIHRoZSBsaW5rcyBiZWxvdywgY2xpY2sgb24gdGhlIERvd25sb2FkIFJhdyBGaWxlIGljb24gYXQgdGhlIHRvcCByaWdodCBvZiB0aGUgZmlsZQoKKiBbRGF0YV0oaHR0cHM6Ly9naXRodWIuY29tL2pjb29wZXJzdG9uZS9kYXRhdml6LXNpdGUvYmxvYi9tYXN0ZXIvMl8wNF93cmFuZ2xpbmcvZGF0YS9oYXBpc2NvcmVfd2hyLmNzdikgb24gdGhlIGhhcHBpbmVzcyBpbmRleCBmb3IgbWFueSBjb3VudHJpZXMgZm9yIG1hbnkgeWVhcnMKKiBbRGF0YV0oaHR0cHM6Ly9naXRodWIuY29tL2pjb29wZXJzdG9uZS9kYXRhdml6LXNpdGUvYmxvYi9tYXN0ZXIvMl8wNF93cmFuZ2xpbmcvZGF0YS9saWZlX2V4cGVjdGFuY3kuY3N2KSBvbiB0aGUgbGlmZSBleHBlY3RhbmN5IGZvciBtYW55IGNvdW50cmllcyBmb3IgbWFueSB5ZWFycwoKIyMgRXhwbG9yZSB5b3VyIGRhdGEKV3JpdGUgc29tZSBjb2RlIHRoYXQgbGV0cyB5b3UgZXhwbG9yZSB0aGF0IGlzIGluIHRoZXNlIHR3byBkYXRhc2V0cy4KCkhvdyBtYW55IG9ic2VydmF0aW9ucyB0aGVyZSBpbiBlYWNoIGRhdGFzZXQ/IAoKV2hhdCB5ZWFycyBkbyB0aGUgZGF0YSBjb250YWluIGluZm9ybWF0aW9uIGZvcj8KCiMjIE1vZGlmeWluZyBkYXRhCkNyZWF0ZSBhIG5ldyBkYXRhc2V0IGZvciBsaWZlX2V4cGVjdGFuY3kgdGhhdCBvbmx5IGluY2x1ZGVzIG9ic2VydmVkIGRhdGEgKGkuZS4sIHJlbW92ZSB0aGUgcHJvamVjdGVkIGRhdGEgYWZ0ZXIgMjAyMikuCgojIyBDYWxjdWxhdGluZyBzdW1tYXJpZXMKV2hhdCBjb3VudHJ5IGhhcyB0aGUgaGlnaGVzdCBhdmVyYWdlIGhhcHBpbmVzcyBpbmRleCBpbiAyMDIyPyAKCldoYXQgYWJvdXQgb3ZlcmFsbCBhdmVyYWdlIGhpZ2hlc3QgaW5kZXg/CgpIb3cgbWFueSBjb3VudHJpZXMgaGFkIGFuIGF2ZXJhZ2UgbGlmZSBleHBlY3RhbmN5IG92ZXIgODAgeWVhcnMgaW4gMjAyMj8KCldoYXQgY291bnRyaWVzIGFyZSBpbiB0aGUgdG9wIDEwIHBlcmNlbnRpbGUgZm9yIGhhcHBpbmVzcz8gV2hhdCBhYm91dCB0aGUgYm90dG9tPyBXaGF0IGFib3V0IGZvciBsaWZlIGV4cGVjdGFuY3k/IFlvdSBjYW4gY2FsY3VsYXRlIHRoaXMgZm9yIHRoZSBtb3N0IHJlY2VudCBkYXRhLCBmb3IgdGhlIG1lYW4sIG9yIHJlYWxseSBmb3Igd2hhdGV2ZXIgeW91IHdhbnQuIFJlbWVtYmVyIHRoZXJlIGFyZSBsb3RzIG9mIHdheXMgdG8gZG8gdGhpcy4KCioqQ2xpY2sgdGhlIGJ1dHRvbiBTaG93IG9uIHRoZSByaWdodCBpZiB5b3UgbmVlZCBhIGhpbnQqKgpgYGB7ciBoaW50LCBjbGFzcy5zb3VyY2UgPSAnZm9sZC1oaWRlJywgZXZhbCA9IEZBTFNFfQojIEhpbnQgLSB0cnkgdXNpbmcgdGhlIGZ1bmN0aW9ucyBpbiB0aGUgYHNsaWNlXygpYCBmYW1pbHkuCmBgYAoKCldoaWNoIGNvdW50cnkgaGFzIGhhZCB0aGVpciBoYXBwaW5lc3MgaW5kZXggaW5jcmVhc2UgdGhlIG1vc3QgZnJvbSAyMDEyIHRvIDIwMjI/IFdoaWNoIGRyb3BwZWQgdGhlIG1vc3Q/CgoKIyMgSm9pbmluZyBkYXRhIApUcnkgam9pbmluZyB0aGUgaGFwcGluZXNzIGFuZCBsaWZlX2V4cGVjdGFuY3kgZGF0YXNldHMgdG9nZXRoZXIgYW5kIHVzZSB0aGUgZGlmZmVyZW50IGAqX2pvaW4oKWAgZnVuY3Rpb25zIHNvIHlvdSBjYW4gc2VlIGhvdyB0aGV5IGRpZmZlci4gQ2hlY2sgdGhlaXIgZGltZW5zaW9ucyBhbmQgbG9vayBhdCB0aGVtLiBUaGluayBhYm91dCBob3cgeW91IG1pZ2h0IHdhbnQgdG8gZG8gZGlmZmVyZW50IGpvaW5zIGluIGRpZmZlcmVudCBzaXR1YXRpb25zLgoKSWYgeW91IHdhbnRlZCB0byBjcmVhdGUgYSBwbG90IHRoYXQgYWxsb3dlZCB5b3UgdG8gc2VlIHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuIGhhcHBpbmVzcyBzY29yZSBhbmQgbGlmZSBleHBlY3RhbmN5IGluIDIwMjIsIHdoaWNoIGpvaW5lZCBkYXRhc2V0IHdvdWxkIHlvdSB1c2UgYW5kIHdoeT8KCgoKCgo=