Cleaning SFO Weather Data

Helpful Data Wrangling Notes
  • month.abb is a built-in object in R with 3-letter month abbreviations
  • You can create your own data frame with the tibble() function. Look up the documentation for this function by typing ?tibble::tibble in the Console.
  • You can create regular sequences in R with :, eg, 3:5 generates the sequence c(3, 4, 5).
  • You can create regular sequences in R with seq(), eg, seq(from = 3, to = 5, by = 1) generates the sequence c(3, 4, 5). Look up the documentation for this function by typing ?seq in the Console.
Practicing Keyboard Shortcuts

Try out the following as you work on this exercise:

  • Tab completion (Try this out when writing your file paths! Typing out a partial path will pull up a mini file-explorer)
  • Insert a code chunk
  • Run a code chunk
  • Navigating around words and lines (selecting and deleting them)
  • Run selected lines (not a whole code chunk)
  • Insert the assignment operator (<-)
  • Insert the pipe operator (|>)

Exercise

Carryout the following steps to clean and save the San Francisco Weather data. Make sure to download and add the data file to your portfolio repository as instructed.

  1. Read in the weather data in this file with the correct relative file path after you move it to the instructed location.
  2. There is a variable that has values that don’t make sense in the data context. Figure out which variable this is and clean it up by making those values missing using na_if().
  3. Create a variable called dateInYear that indicates the day of the year (1-365) for each case. (Jan 1 should be 1, and Dec 31 should be 365).
  4. Create a variable called month_name that shows the 3-letter abbreviation for each case.
  5. Save the wrangled data to the data/processed/ folder using write_csv(). Name this file weather_clean.csv. Look up the documentation for this function by typing ?write_csv in the Console. You’ll need to write an appropriate relative path.
Code
library(tidyverse)
weather <- read_csv("../../data/raw/weather.csv")
Code
weather_clean <- weather %>%
  mutate(PrecipYr = na_if(PrecipYr, 99999))

weather_clean
# A tibble: 365 × 18
   Month   Day   Low  High NormalLow NormalHigh RecordLow LowYr RecordHigh
   <dbl> <dbl> <dbl> <dbl>     <dbl>      <dbl>     <dbl> <dbl>      <dbl>
 1    11    20    48    55        48         62        35  1964         69
 2     6    16    52    68        53         70        46  1952         90
 3     5     9    47    63        50         66        41  1950         88
 4    10    26    47    69        52         69        39  1954         89
 5     9    27    55    82        55         73        47  1955         96
 6     7     6    52    70        54         71        47  1953         86
 7    11     3    48    60        51         66        40  1971         84
 8     3    26    47    58        47         62        38  1980         79
 9    10     4    57    66        55         72        47  1989         95
10    11    26    49    59        47         60        36  1952         76
# ℹ 355 more rows
# ℹ 9 more variables: HiYear <dbl>, Precip <dbl>, RecordPrecip <dbl>,
#   PrecipYr <dbl>, date <chr>, Record <lgl>, RecordText <chr>, RecordP <lgl>,
#   CulmPrec <dbl>
Code
weather_clean <- weather_clean %>%
  arrange(Month, Day) %>%
  mutate(dateInYear = 1:365)
Code
weather_clean <- weather_clean %>%
  mutate(month_name = month.abb[Month])
Code
write_csv(weather_clean, "../../data/processed/weather_clean.csv")