Calculating Water Year and Moving Mean
Bill Rice
6/6/2019
Estimating Annual Precipitation and Moving Averages using R
This script walks through producing a water plot of water years from precipitation data and plotting a moving average across that period. Some amount of data cleaning is required to use the raw data from National Oceanic and Atmospheric Administration (NOAA).
Hydrologists in the United States of America often define a water year as the dates between October 1st to September 30th with the year being at the end of the period (September 30th). This script walks you through pulling data from the NOAA Climate Data search tool and building a graph showing annual precipitation over each water year in the dataset.
Start by visiting NOAA’s Climate Data Search Tool and downloading a CSV formated file containing daily precipitation.
You can download an example data file here.
It is important to look at your data and understand it. I reviewed my data in Excel using a pivot chart and noticed that in 1950 and 51 the station has very few records. I assume there was an equipment failure during that time. I include a small piece of code near the bottom of this tutorial to disregard those two years. Excluding those two years which had abnormally low observations actually increases the mean near the beginning of the record.
Load the necessary libraries
library(ggplot2)
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(tools)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(gtools)
library(plyr)
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:plotly':
##
## arrange, mutate, rename, summarise
Write a function to add a water year column (for use later in the script)
wtr_yr <- function(dates, start_month=10) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
Load your data file (downloaded from NOAA)
noaa_stations <- read.csv("noaa_stations.csv", header=TRUE)
Calculate precipitation in inches if data was downloaded in milimeters
noaa_stations$PRCP_in <- noaa_stations$PRCP/25.4 #Multiply by 25.4 to make mm into inches (if desired)
Choose what stations you want to use (useful if you downloaded multiple stations)
noaa_station <- subset(noaa_stations, noaa_stations$STATION == "USC00045941")
Ensure the dates are stored as dates by using as.Date()
noaa_station$DATE <- as.Date(noaa_station$DATE)
Add a column for water year
noaa_station$water_year <- wtr_yr(noaa_station$DATE)
Remove incomplete water years 1950 and 1951
noaa_station <- subset(noaa_station, !(noaa_station$water_year ==1950))
noaa_station <- subset(noaa_station, !(noaa_station$water_year ==1951))
Calculate annual sums by water year
noaa_station_wateryear <- aggregate( PRCP_in ~ water_year , noaa_station , sum )
Check what this output looks like
head(noaa_station_wateryear)
## water_year PRCP_in
## 1 1942 1.464567
## 2 1943 9.755906
## 3 1944 10.043307
## 4 1945 11.732283
## 5 1946 11.129921
## 6 1947 8.204724
Remove na values
noaa_station_wateryear <- na.omit(noaa_station_wateryear)
Make a plot showing a 10 year rolling mean
p <- ggplot(noaa_station_wateryear, aes(x = water_year, y = PRCP_in)) +
geom_bar(stat = "identity", color="black", fill="lightsteelblue1")+
labs(title = "Annual Water Year Precipitation",
subtitle = toTitleCase(tolower(as.character(subset(noaa_stations, noaa_stations$STATION == "USC00045941")$NAME[1]))),
y = "Annual Precipitation in Inches",
x = "Water Year (October 1 to September 30)") +
geom_path(aes(y=rollmean(PRCP_in,
k=10,
na.pad = TRUE)),
color="midnightblue",
na.rm=TRUE)+
geom_hline(aes(yintercept=mean(PRCP_in)),
linetype="dashed")+
theme_minimal()
p
This can be more interactive using ggplotly
ggplotly(p)
Now lets consider the data by month
noaa_station$month <- months(noaa_station$DATE)
noaa_station_month <- aggregate( PRCP_in ~ water_year + month , noaa_station , sum )
head(noaa_station_month)
## water_year month PRCP_in
## 1 1943 April 1.2007874
## 2 1944 April 1.0314961
## 3 1945 April 0.6535433
## 4 1946 April 0.2677165
## 5 1947 April 0.2086614
## 6 1948 April 0.8779528
noaa_station_month <- na.omit(noaa_station_month)
noaa_station_monthly <- ddply(noaa_station_month, c("month"), summarise,
N=length(PRCP_in),
mean=mean(PRCP_in),
sd=sd(PRCP_in),
se=sd/sqrt(N))
head(noaa_station_monthly)
## month N mean sd se
## 1 April 73 0.6069464 0.5744971 0.06723980
## 2 August 73 0.3611800 0.5788964 0.06775470
## 3 December 74 1.4109917 1.2683320 0.14744061
## 4 February 74 0.8466695 0.7218087 0.08390856
## 5 January 75 1.0298688 0.8097520 0.09350210
## 6 July 73 0.3797325 0.4730753 0.05536927
water_year_months <- c("October", "November", "December", "January", "February", "March", "April", "May", "June", "July", "August", "September")
p <- ggplot(noaa_station_monthly,
aes(x = factor(month,
levels=water_year_months))) +
geom_errorbar(aes(ymin = mean - se,
ymax = mean + se),
width = .3)+
geom_point(aes(x = factor(month,levels=water_year_months),
y=mean), shape = 15, size=2) +
labs(title = "Monthly Precipitation By Month, with Std Error and Mean Monthly Precipitation",
subtitle = toTitleCase(tolower(as.character(subset(noaa_stations, noaa_stations$STATION == "USC00045941")$NAME[1]))),
y = "Monthly Precipitation in Inches",
x = paste(min(noaa_station_wateryear$water_year),
"to",
max(noaa_station_wateryear$water_year))) +
geom_hline(aes(yintercept=mean(mean)),
linetype="dashed")+
theme_minimal()+
scale_y_continuous(limits = c(0, 1.75))+
theme(axis.text.x = element_text(angle = 30, hjust = 1))
p