I hate the hot, humid summers in Tallahassee. One day, after a particularly torrid week, I decided to figure out exactly how long this weather would last.
So, I downloaded Tallahassee average temperature data from the National weather service. I then analyzed the data. I expected the temperature data would roughly follow a sine curve, so I wanted to use a sinusoidal regression. There were a couple of problems, however. Firstly, look at the graph:
As you can see, temperatures are asymmetrical: it takes a lot longer to heat up in the spring than it does to cool down in the fall. So, I decided to split the data into two sections, one for the spring and one for the fall. Then, I would do a sinusoidal regression for each data set.
This brings me to my second problem: How on earth to do a sinusoidal regression? Most spreadsheet programs don't do it. In the SciPy python package, there's an equation solver, which theoretically could do a regression against any kind of equation, including a sine wave. However, for it to work you need to have a reasonable first guess.
The big problem is the frequency. Coming up with a guess for the frequency really had me stumped. Fortunately, I found a post on stack exchange about sinusoidal regressions which recommended using a fourier transform to guess the frequency (fast fourier transform is also in SciPy).
I extended the data, and then loaded the data into this python script. The two equations are:
(spring) y = 16.16 * sin(0.01466x + 4.747) + 66.92 (fall) y = 15.97 * sin(0.01981x + 3.521) + 66.79
Here is the graph: