by Vishnu I. ’22
This spring semester, under the guidance of Lakeside science and engineering teacher Dr. Town, I embarked on an independent study with my brother Varun in the hopes of learning more about machine learning and its applications in geoscience. Our goal was to use machine learning and data from Mount Baker to determine patterns in various weather variables and ultimately gauge the effects of climate change from a local lens. To accomplish this, we had to go through a three step process: learn about machine learning, process the data from Mount Baker, and combine these two steps to ultimately find patterns in weather variables. In this blog, I will cover the first two steps of this journey of science; after reading this, I encourage you to look at Varun’s blog to find out about patterns in weather variables we discovered.
I first began my work this semester by getting acquainted with machine learning and its various forms. While it is commonly perceived (including by me prior to this semester) that machine learning is a “perfect bullet” that solves all problems perfectly with enough data, I’ve learned that this belief is highly misleading. In the data science world, machine learning is simply a way of building and understanding models of data. The “learning” enters the picture when we give these models tunable parameters that can be adapted to observed data. Once the models have been fit to previously seen data, they can be used to predict and understand aspects of new data. However, it is still our job to verify the model's conclusions with other previously understood models of the world. At the fundamental level, machine learning can be categorized into two main types: supervised learning and unsupervised learning. While supervised learning involves modeling a relationship between measured features of data and a label associated with the data, unsupervised learning involves modeling a relationship without reference to any label.
Once we were comfortable with machine learning on a conceptual level, we transitioned to gathering and processing weather data at Mount Baker. To accomplish this, we turned to the Weather Research and Forecasting (WRF) model. The data in this model were provided by the Puget Sound Clean Air Agency, and was at 1.3 km resolution over Washington and Oregon. This model produces simulations based on actual atmospheric conditions across the entirety of the western U.S. daily, storing the values of many weather variables in files. While each file merely represents one hour in the day, because the model stores a lot of information, the files are each 1 GB. Therefore, our first obstacle was to write a program to efficiently download the data from three months -- January 2020, March 2020, and February 2021. Because the files were so large, we needed our program to continue from where it left off if there was an error; for example, if it had downloaded five files but abruptly stopped, it needed to continue from where it left off and not start again and waste time. After we overcame this barrier, we sent our program to our data contact, who downloaded the data from the three months and gave it back to us. Now that we had the data, we could process it. This is where I hit my second roadblock: because the model was so complex, the data stored in each file was also very complex. I had to spend hours guessing and checking to find the point in the array corresponding to the specific coordinates of Mount Baker and then pulling the necessary variables we wanted. In fact, in going through the grueling process of data munging and processing, I have realized that this can be (in this case it was) the hardest part of data science: collecting, processing, and optimizing the data. While machine learning can seem daunting, getting accurate and reliable data is paramount to the end goal of finding patterns: this process ultimately took me four weeks.
To me, science is merely a journey of constantly solving problems; when tackling a complex issue, you will inevitably encounter barriers along the way. However, when you solve these barriers, a new set will emerge, but these obstacles will be deeper and more worthwhile. And throughout this independent study, I have done just that. From the barriers I overcame this semester to the challenges that remain, I know I have unlocked the portal of the unknown and propelled science forward.