(As a side note, this is the problem with not having enough time. I caught up on that story very early, and wrote most of that –but was offline. By now, it’s done the rounds and more. Still, here’s another take)
Regular readers (yes, you two, I’m talking to you) will remember that I wrote an article (What happened to the scientific process?) expressing some dismay regarding what can apparently be published these days, even when failing the simplest rules of data analysis, the kind that you learn in a two day course if you work in a lowish operational job.
OK, maybe I should have rephrased that as “what can apparently be published when it happens to suit what the powerful want to read”, but that would have been a stronger statement, and all I was concerned was the (very) poor quality of the analysis.
Still, it seems that this was small fry. Those who follow macroeconomics at all –or who follow politics beyond poll-tracking- will probably have heard, either of Carmen Reinhart and Kenneth Rogoff, or at any rate will have heard of one of their results, even if they were not named (despite being highly controversial in Academic circles, it has been treated as consensual by the mainstream media, aka Very Serious People, who say things like “90% of GDP, the level of debt that economist recognise as strongly hampering econoc prospects”).
Actually, since I’ve also been posting about chess, some people have heard of them even without following economics or politics at all: Kenneth Rogoff is a chess grandmaster, and articles about him, and his research, have been published in chess media.
Anyway. It turns out that their most discussed (by no mean the only one, and they seem to have done some properly conducted and interesting research as well) result is based on even worse than what I had been discussing. Before I briefly (it’s been done very well by others on the web and it’s only fair to link to their work there, there, or there for the original refutation paper) discuss what they did, let’s state their “conclusion”: when public debt to GDP goes over 90%, growth is strongly hampered and in fact probably turns negative.
The study was embarrassingly poor in any case (the 90% threshold was arbitrary, extremely small data samples, everything done through a simple correlation while there was a well-established reverse causation that they did not even try to correct for…), but it gets worse when you simply try to replicate their results. Because if you’re honest and careful, you can’t do it.
It appears that there are no less than three howlers (one of them possibly –I do mean possibly, since it’s hard to prove intention but even harder to disprove it- a simple clicking mistake), none of which was discussed in their description of the study.
First, they eliminated quite a few points of data, for reasons that we can only guess since they don’t even mention that they have done it. ALL of the points they took out would have gone against their claim. In fact, the 6 points they took of New Zealand would have, had they been included, added 1.5% per year to the calculated growth worldwide, annihilating their claims. Instead, what they did was to eliminate all the years that had average growth despite high debt, and keep the final year, that had plummeting growth. This is beyond ineptitude. If a country brings its debt back through one heroic effort at cutting public spending (which will tank the economy), you’d expect to have hugely negative growth, until going back to the 90% threshold where they would stop appearing in the data. But then it’s the silly massive spending cut that is the cause, not the initial debt, as proven by the previous years of high debt and average growth.
Second, they then use one point per “episode” of high debt, disregarding the number of years (or indeed the size of the country affected –a major factor in estimating the effectiveness of Keynesian stimulus). So, having taken all the years of good growth but high debt for several countries, they just kept the one year of tanking economy and called it an episode. For instance, New Zealand was deemed to have had an episode of high debt that resulted in -7.4% growth.
And that had the same weighting as the 25 straight years of health growth in the UK despite much higher debt. The reasoning behind that extraordinary choice is… well, about to be published I should think.
Third, having a nice set of by now completely unrepresentative numbers, they… forget to include the last 5 countries in their average. Had they not done that, it would have appeared as +0.2% rather than -0.1%. And of course, the sign change made a big difference in the overall impact.
Now, this time, I’m not going to say that we ran similar exercises in introductory classes and that we would not go further until everyone understood. No, it’s much worse. It’s not that a simple thing has not been noticed, it’s coming up with very creative ways of causing grievous bodily harm to your dataset. We would not even have discussed selectively removing data, or averaging between samples of sizes differing by a factor of 1 to 25 (or 1 to 500 if you want to take the size of the country into account).
It’s just a whole new dimension. So next time you go talk to your banker, insist that stats on your account eliminate any day that is not payday. Then average the one month where you get your bonus with an equal weight than the other 11 put together. Then remove the less flattering years. Then state that the result is your typical daily income. I’m not sure that you’ll get a much bigger mortgage, but you may become one of the most famous macro-economist of the time.