Calculating Water Year and Moving Mean

Estimating Annual Precipitation and Moving Averages using R

This script walks through producing a water plot of water years from precipitation data and plotting a moving average across that period. Some amount of data cleaning is required to use the raw data from National Oceanic and Atmospheric Administration (NOAA).

Hydrologists in the United States of America often define a water year as the dates between October 1st to September 30th with the year being at the end of the period (September 30th). This script walks you through pulling data from the NOAA Climate Data search tool and building a graph showing annual precipitation over each water year in the dataset.

Start by visiting NOAA’s Climate Data Search Tool and downloading a CSV formated file containing daily precipitation.

You can download an example data file here.

It is important to look at your data and understand it. I reviewed my data in Excel using a pivot chart and noticed that in 1950 and 51 the station has very few records. I assume there was an equipment failure during that time. I include a small piece of code near the bottom of this tutorial to disregard those two years. Excluding those two years which had abnormally low observations actually increases the mean near the beginning of the record.

Load the necessary libraries

library(ggplot2)
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(tools)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(gtools)
library(plyr)
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:plotly':
## 
##     arrange, mutate, rename, summarise

Write a function to add a water year column (for use later in the script)

wtr_yr <- function(dates, start_month=10) {
  # Convert dates into POSIXlt
  dates.posix = as.POSIXlt(dates)
  # Year offset
  offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
  # Water year
  adj.year = dates.posix$year + 1900 + offset
  # Return the water year
  adj.year
}

Load your data file (downloaded from NOAA)

noaa_stations <- read.csv("noaa_stations.csv", header=TRUE)

Calculate precipitation in inches if data was downloaded in milimeters

noaa_stations$PRCP_in <- noaa_stations$PRCP/25.4 #Multiply by 25.4 to make mm into inches (if desired)

Choose what stations you want to use (useful if you downloaded multiple stations)

noaa_station <- subset(noaa_stations, noaa_stations$STATION == "USC00045941")

Ensure the dates are stored as dates by using as.Date()

noaa_station$DATE <- as.Date(noaa_station$DATE)

Add a column for water year

noaa_station$water_year <- wtr_yr(noaa_station$DATE)

Remove incomplete water years 1950 and 1951

noaa_station <- subset(noaa_station,  !(noaa_station$water_year ==1950))
noaa_station <- subset(noaa_station,  !(noaa_station$water_year ==1951))

Calculate annual sums by water year

noaa_station_wateryear <- aggregate( PRCP_in ~ water_year , noaa_station , sum )

Check what this output looks like

head(noaa_station_wateryear)
##   water_year   PRCP_in
## 1       1942  1.464567
## 2       1943  9.755906
## 3       1944 10.043307
## 4       1945 11.732283
## 5       1946 11.129921
## 6       1947  8.204724

Remove na values

noaa_station_wateryear <- na.omit(noaa_station_wateryear)

Make a plot showing a 10 year rolling mean

p <- ggplot(noaa_station_wateryear, aes(x = water_year, y = PRCP_in)) +
  geom_bar(stat = "identity", color="black", fill="lightsteelblue1")+
  
  labs(title = "Annual Water Year Precipitation",
       subtitle = toTitleCase(tolower(as.character(subset(noaa_stations, noaa_stations$STATION == "USC00045941")$NAME[1]))),
       y = "Annual Precipitation in Inches",
       x = "Water Year (October 1 to September 30)") + 
  geom_path(aes(y=rollmean(PRCP_in, 
                           k=10, 
                           na.pad = TRUE)),
            color="midnightblue", 
            na.rm=TRUE)+
    geom_hline(aes(yintercept=mean(PRCP_in)),
            linetype="dashed")+
  theme_minimal()
  
  p

This can be more interactive using ggplotly

ggplotly(p)

Now lets consider the data by month

noaa_station$month <- months(noaa_station$DATE)
noaa_station_month <- aggregate( PRCP_in ~ water_year + month , noaa_station , sum )


head(noaa_station_month)
##   water_year month   PRCP_in
## 1       1943 April 1.2007874
## 2       1944 April 1.0314961
## 3       1945 April 0.6535433
## 4       1946 April 0.2677165
## 5       1947 April 0.2086614
## 6       1948 April 0.8779528
noaa_station_month <- na.omit(noaa_station_month)

noaa_station_monthly <- ddply(noaa_station_month, c("month"), summarise,
                              N=length(PRCP_in), 
                              mean=mean(PRCP_in),
                              sd=sd(PRCP_in),
                              se=sd/sqrt(N))

head(noaa_station_monthly)
##      month  N      mean        sd         se
## 1    April 73 0.6069464 0.5744971 0.06723980
## 2   August 73 0.3611800 0.5788964 0.06775470
## 3 December 74 1.4109917 1.2683320 0.14744061
## 4 February 74 0.8466695 0.7218087 0.08390856
## 5  January 75 1.0298688 0.8097520 0.09350210
## 6     July 73 0.3797325 0.4730753 0.05536927
water_year_months <- c("October", "November", "December", "January", "February", "March", "April", "May", "June", "July", "August", "September")

p <- ggplot(noaa_station_monthly,
            aes(x = factor(month,
                           levels=water_year_months))) +
  geom_errorbar(aes(ymin = mean - se,
                ymax = mean + se),
                width = .3)+
    geom_point(aes(x = factor(month,levels=water_year_months),
                    y=mean), shape = 15, size=2) +
  labs(title = "Monthly Precipitation By Month, with Std Error and Mean Monthly Precipitation",
       subtitle = toTitleCase(tolower(as.character(subset(noaa_stations, noaa_stations$STATION == "USC00045941")$NAME[1]))),
       y = "Monthly Precipitation in Inches",
       x = paste(min(noaa_station_wateryear$water_year),
                 "to",
                 max(noaa_station_wateryear$water_year))) + 
      geom_hline(aes(yintercept=mean(mean)),
            linetype="dashed")+
  theme_minimal()+
  scale_y_continuous(limits = c(0, 1.75))+
  theme(axis.text.x = element_text(angle = 30, hjust = 1))
  
  p