Finance’s “replication dilemma” is a well-kept secret

Finance's

It’s a huge issue,” he says. “Step one in dealing with the replication crisis in finance is to accept that there is a crisis. And right now, many of my colleagues are not there yet.”

It may sound like a low-budget Blade Runner rip-off, but the scientific world has been gripped by a “replication crisis” during the last decade, with major ramifications for the findings of many landmark studies. Is investment suffering from a same ailment?Campbell Harvey, a Duke University finance professor, makes this explosive claim. At least half of the 400 allegedly market-beating methods identified in leading financial publications throughout the years, he believes, are fake. Worse, he is concerned that many of his colleagues are unaware of the situation.

Highlights

  • He has written more than 150 papers on finance, several of which have won prestigious prizes. In fact, Harvey’s 1986 PhD thesis first showed how the bond market’s curves can predict recessions. In other words, this is not like a child saying the emperor has no clothes. Harvey’s escalating criticism of the rigour of financial academia since 2015 is more akin to the emperor regretfully proclaiming his own nudity.

  • Harvey is not some obscure outsider or performative contrarian attempting to gain attention through needless controversy. He is the former editor of the Journal of Finance, a former president of the American Finance Association, and an adviser to investment firms like Research Affiliates and Man Group.

In 2005, Stanford medical professor John Ioannidis published a bombshell essay titled “Why Most Published Research Findings Are False”, which noted that the results of many medical research papers could not be replicated by other researchers. Subsequently, several other fields have turned a harsh eye on themselves and come to similar conclusions. The heart of the issue is a phenomenon that researchers call “p-hacking”.

To understand what the ‘replication crisis’ is, how it has happened and its implications for finance, it helps to start at its broader genesis.

In statistics, a p-value is the probability of whether a finding could be because of pure chance — a simple data oddity like the correlation of Nicolas Cage films to US swimming pool drownings — or whether it is “statistically significant”. P-scores indicate whether a certain drug really does help, or if cheap stocks do outperform over time.

P-hacking is when researchers overtly or subconsciously twist the data to find a superficially compelling but ultimately spurious relationship between variables. It can be done by cherry-picking what metrics to measure, or subtly changing the time period used. Just because something is narrowly statistically significant, does not mean it is actually meaningful. A trading strategy that looks golden on paper might turn up nothing but lumps of coal when actually implemented.

Harvey attributes the scourge of p-hacking to incentives in academia. Getting a paper with a sensational finding published in a prestigious journal can earn an ambitious young professor the ultimate prize — tenure. Wasting months of work on a theory that does not hold up to scrutiny would frustrate anyone. It is therefore tempting to torture the data until it yields something interesting, even if other researchers are later unable to duplicate the results. Obviously, the stakes of the replication crisis are much higher in medicine, where lives can be in play. But it is not something that remains confined to the ivory towers of business schools, as investment groups often smell an opportunity to sell products based on apparently market-beating factors, Harvey argues. “It filters into the real world,” he says. “It definitely makes it into people’s portfolios.”

AQR, a prominent quant investment group, is also sceptical that there are hundreds of durable and successful factors that can help investors beat markets, but argues that the “replication crisis” brouhaha is overdone. Earlier this year it published a paper that concluded that not only could the majority of the studies it examined be replicated, they still worked “out of sample” — in actual live trading — and were actually further corroborated by international data.