An Independent School • Grades 5-12

by Nate C. '22

People around the world crave information on the weather. Often it’s wondered how it’s possible, with the breadth of human ingenuity, to be able to traverse the highest peaks, scout the ocean abyss, or land on the literal moon, yet not have the means or foresight to reliably predict the weather more than a few hours in advance. Is it possible to predict future meteorological data with prior trends? My project for the 2020 Lakeside Summer Research Institute (LSRI) was to analyze weather data sets and deduce patterns and trends that could potentially serve as a model for weather around the Lakeside area to help aid with forecast data for the Lakeside maintenance department. 

My biggest problem going into the project came before any analysis started: the data needed to be thoroughly cleaned and corrected before use. Well, as corrected data is pretty integral to the process of data analysis, it was crucial that the time series of the data sets be as accurate as possible. In other words, in order to start making comparisons between the data sets, I needed to fix each data point to its correct timestamp and know exactly when such points occur relative to others. This giant task took up the majority of the first two weeks of my work, and created the majority of learning moments, frustrations, and issues to overcome in this project.

My work involved a WS1400 IPObserver mounted on the south side of Allen-Gates Hall at Lakeside. This piece of equipment detects and records numerous weather-related parameters approximately every five minutes. One such parameter, outdoor temperature, served some difficulty during the process and is the best example of data cleaning in my time at LSRI.

Upon receiving my first instructions and data spreadsheets, I realized that this data cleaning task wasn’t going to be as easy as I had planned. Looking at my 2019 IPObserver data, it was clear that there were multiple things wrong. At times, the temperature data would jump an abnormal amount in a five- minute interval; at other times the data would seem to flatline or go blank for several hours. My first goal was to write a computer program to flag when atypical data behavior started occurring, and gradually filter data points out until data started to match timestamps. Doing so eventually got the data to a decent place for analysis, so I pivoted soon after and started my second larger goal of comparing Observer data with data from Seattle-Tacoma International Airport.

My first set of cleaned data seemed right, upon putting the data together with a similar dataset from Seattle-Tacoma International Airport (Fig. 1). It looks almost right. Almost. The peaks are off by what looks like a week or so around days 60 to 210, and then seemingly align back in place toward the end of the year. This indicated to me that days are over-accounted for earlier in the year, and days are under-accounted for after the middle of the year, before syncing again toward the end. 

Following a second round of value filtering, I converted both datasets to daily averages for smoother, less busy graphs, and to make the load easier on my computer. This second incarnation (Fig. 2) looked a lot better, and to me looked like quality presentation material, until Dr. Town pointed out that the original lag at the beginning was still minutely visible. As he pointed out, this would be something that the people at the UW Glaciology Lunch would notice. I also thought this made the data a bit less convincing. Though I knew the data was not going to be perfect, after one last filtering session and again compiling everything to consistent daily average intervals, the spreadsheets and code spit out the third graph (Fig. 3), which looked good enough for my final UW presentation. This graph shows an almost perfect 1-2 degree C difference throughout the year, and a proper correlation between outdoor temperature at SeaTac and above Allen-Gates.

Figure 1: Relationship Graph between Outdoor Temperature from Allen-Gates and SeaTac Airport vs. Time From 01/02/19 to 12/31/19

Figure 2: Updated Relationship Graph between Daily Outdoor Temperature from Allen-Gates and SeaTac Airport vs. Time From 01/02/19 to 12/31/19

Figure 3: Final Relationship Graph between Hourly Outdoor Temperature from Allen-Gates and SeaTac Airport vs. Time From 01/02/19 to 12/31/19