21st Century Schizoid Man: In media res...

Mittwoch, Februar 22, 2006

In media res...

Sorry, middle of a forecasting round. I do so wish that statistical offices world-wide would just plain get their sh*t together and learn, finally, that they really, really, really need to simply do a couple of quality control checks. This is causing me, right now, no end of aggravation: a number of data updates have been, bluntly, less than worthless. Hard to forecast when the data is bent. It's being addressed, but it shouldn't really be happening in the first place. And no, I am NOT going to name names and show the data: got to get a forecast out.

For everyone involved: statistical quality control is actually very, very simple.

You need to have two databases: the one you've just updated and the archive copy without the update. Let's call the update T and the archive base T1.

Do a standard deviation on the absolute value of the Y/Y percentage change of the updated time series and store it as SDT, the same with the archive and call it SDT1. Do NOT seasonally adjust.

And this is the key: if SDT1-SDT > 0,01*SDT, then you have a change in the database that needs to be reviewed. The value 0,01 is arbitrary, but it's a darn good starting point, empirically speaking. This can be fine-tuned if you find you have too many false positives, but it should capture virtually all true negatives. What it is saying is that any new data that causes more than a 1% change in the standard deviation of the absolute values of the year/year nonseasonally adjusted data is something that needs to be reviewed. This has been empirically tested by yours truly on more than 100,000 time series, mostly industrial statistics, which have a significant amount of white noise ( i.e. changes month-to-month that have no underlying trend), from more than 20 countries.

It works. We use it here where I work to check no fewer than 1,000,000 time series and it is, for our poor data people, a godsend, since our database processing tools run the checks automagically and then plot them on the screen for review: it gets everything. Did the statistical office revise data from 10 years ago? It catches it. Did they change the data for the last 3 months? It catches it. Did they change the base year and forget to tell us? It catches it. Did they punch in a rate of change instead of a level? It catches it. Did they punch in a 0 instead of the proper value? It catches it. Did they make a mistake? It catches it. It has given us a reputation with several data suppliers of being extraordinarily on the ball: their own quality control didn't catch the errors.

What you capture with this are two things: first of all, a data point that is so completely out of the ordinary that it is more likely to be an error than a real datum; second, revisions that the statistical offices usually neglect to mention, or more likely, have mentioned, but it was buried somewhere so abstruse as to qualify for a mention in Hitchhikers' Guide To The Galaxy.

What you test for in the above is not merely how meaningful the new data point(s) are. but also of there is a significant change in the seasonal pattern. One data point does not a pattern make (except for Democrats clutching desperately at straws or German politicians trying to whitewash how badly they have mauled their economy), but you'd be surprised how often statistical offices make mistakes.

Unless, of course, you do industrial forecasting and see it every day. :-(

But that is why posting is otherwise parsimonious: but the forecast will be done shortly...