Cleaning SFO Weather Data

Helpful Data Wrangling Notes
  • month.abb is a built-in object in R with 3-letter month abbreviations
  • You can create your own data frame with the tibble() function. Look up the documentation for this function by typing ?tibble::tibble in the Console.
  • You can create regular sequences in R with :, eg, 3:5 generates the sequence c(3, 4, 5).
  • You can create regular sequences in R with seq(), eg, seq(from = 3, to = 5, by = 1) generates the sequence c(3, 4, 5). Look up the documentation for this function by typing ?seq in the Console.
Practicing Keyboard Shortcuts

Try out the following as you work on this exercise:

  • Tab completion (Try this out when writing your file paths! Typing out a partial path will pull up a mini file-explorer)
  • Insert a code chunk
  • Run a code chunk
  • Navigating around words and lines (selecting and deleting them)
  • Run selected lines (not a whole code chunk)
  • Insert the assignment operator (<-)
  • Insert the pipe operator (|>)

Exercise

Carryout the following steps to clean and save the San Francisco Weather data. Make sure to download and add the data file to your portfolio repository as instructed.

  1. Read in the weather data in this file with the correct relative file path after you move it to the instructed location.
Code
library(tidyverse)
library(readr) #To get read_csv in the CSV

#Step 1
weather_data <- read_csv("../../data/raw/weather.csv") #Reading in the Data
  1. There is a variable that has values that don’t make sense in the data context. Figure out which variable this is and clean it up by making those values missing using na_if().
Code
#Step 2
#Cleaning out large value in PrecipYr
weather_clean <- weather_data |> 
  mutate(PrecipYr <- na_if(PrecipYr, 99999))
  1. Create a variable called dateInYear that indicates the day of the year (1-365) for each case. (Jan 1 should be 1, and Dec 31 should be 365).
Code
#Step 3
#Creating variable that shows the day of the year
weather_clean <- weather_data |> 
  arrange(Month, Day) |> 
  mutate(dateInYear <- seq(from = 1, to = 365, by = 1))
  1. Create a variable called month_name that shows the 3-letter abbreviation for each case.
Code
#Step 4
#Creating variable that shows 3 letter month abbreviation
weather_clean <- weather_data |> 
  mutate(month_name <- month.abb[Month]) 
  1. Save the wrangled data to the data/processed/ folder using write_csv(). Name this file weather_clean.csv. Look up the documentation for this function by typing ?write_csv in the Console. You’ll need to write an appropriate relative path.
Code
#Step 5
#Saving the data to another source 
write_csv(weather_clean, file = "../../data/processed/weather_clean.csv")