Tallahassee Temperature Graphs

I hate the hot, humid summers in Tallahassee. One day, after a particularly torrid week, I decided to figure out exactly how long this weather would last.

So, I downloaded Tallahassee average temperature data from the National weather service. I then analyzed the data. I expected the temperature data would roughly follow a sine curve, so I wanted to use a sinusoidal regression. There were a couple of problems, however. Firstly, look at the graph:

a line graph of the 
average temperature data in Tallahassee

As you can see, temperatures are asymmetrical: it takes a lot longer to heat up in the spring than it does to cool down in the fall. So, I decided to split the data into two sections, one for the spring and one for the fall. Then, I would do a sinusoidal regression for each data set.

This brings me to my second problem: How on earth to do a sinusoidal regression? Most spreadsheet programs don't do it. In the SciPy python package, there's an equation solver, which theoretically could do a regression against any kind of equation, including a sine wave. However, for it to work you need to have a reasonable first guess.

The big problem is the frequency. Coming up with a guess for the frequency really had me stumped. Fortunately, I found a post on stack exchange about sinusoidal regressions which recommended using a fourier transform to guess the frequency (fast fourier transform is also in SciPy).

I extended the data, and then loaded the data into this python script. The two equations are:

      (spring) y = 16.16 * sin(0.01466x + 4.747) + 66.92
      (fall)   y = 15.97 * sin(0.01981x + 3.521) + 66.79
      

Here is the graph: a line graph 
containing the temperature data, with a model prediction.  The model closely 
fits the data