# Mark Gilbert's Blog

## 100% chance of rain (at some point in the future)

For the longest time, I’ve maintained that meteorologists just throw darts to figure out what tomorrow’s weather is going to be.  It’s amusing, but it’s an unfair characterization.  Meteorologists can tell you a tremendous amount about how weather works, but it’s such a complicated subject that there is a very steep drop-off in the accuracy of predictions even a few hours into the future.  Case in point – why do we still have tornado sirens?  Those are only good to tell you when one’s actually been sighted.  Why can’t we predict even 15 minutes from now when one will form?

At any rate, the few occasions in the past where I’ve actually paid attention to the weather forecasts a week or so leading up to some outdoor event, it seems like the "it’s gonna precipitate / it’s gonna shine" switches at least twice before the day arrives.  I write software for a living so I’m all for updating estimates when it’s clear that they are no longer valid.  However, when a weather forecast changes daily – or at least it seems to – what’s the value in anything other than a 4 HOUR, not day, forecast?  Or, to put it another way, how accurate ARE 1-day, 2-day, 1-week, etc. forecasts?

To find out, I propose an application that pulls together forecasts and actuals from a variety of sources, and then does a statistical analysis to see how well each source actually does at forecasting the weather.  Here are some questions I think an application like this could make:

1. How accurate are source X’s 1-day, 2-day, 3-day, 1-week (etc.) forecasts?
2. How closely does X get to the actual high and low temperature for the day?
3. How well do they predict when the sun will shine versus when it will rain/snow?
4. How does source X compare to other sources?
5. Is source X’s 2-day forecasts more accurate than source Y?
6. Does source Y’s 1-week forecasts more accurate than source X?

The forecasts and current conditions are available from several places now in the form of RSS and other feeds. The bulk of the gathering piece to this application would be building the interfaces to those sources, and merging the data together into a common storage structure.

This kind of data gathering could lead to habits such as "if you have to look more than 3 days out, look at Source Z’s forecasts – they are the most accurate.  Once you get to the day before, however, then Source W’s forecasts are actually more accurate than Z’s."  Then, you could use the data and analysis from this first program to feed into a second that always displays the most accurate information at any given time.

Now, to spice things up a bit, let’s introduce a forecast of our own, and see how it stacks up to the professional ones.  These forecasts would be based on some relatively arbitrary formula, taking into account things like month of year, the previous day’s weather, and some random number used to vary the temperatures and probability of precipitation.  In other words – would throwing darts be just as accurate after all?