Are all the companies that supposedly go from good to great really that great after all? Or, are they merely one of a pack of pretty average performers? As these authors state, those really great companies may only be putatively great, successful but only in a middling sort of way. The authors studied the popular literature on “great” performing companies and came to their own, original conclusions.
It’s not uncommon for managers to gain insight into their challenges by drawing examples from the world of sports. Rock climbing, chess, golf, hockey, football and basketball, among others, have all served as sources of wisdom. Typically, the advice that is drawn from the mooted similarities is based on the players or coaches and draws on their experiences and insights into strategy, leadership, motivation, perseverance, teamwork, overcoming adversity, and so on.
Here’s yet another domain to add to the list: how to think about what really counts as exceptional performance, in business or sports. The source is the sport that is the granddaddy of all useful and arcane sports statistics, baseball. For while aficionados of other sports need to debate what the greatest achievement in their favourite pastime might be, Sabrematricians1 are of one voice when it comes to baseball: Joe DiMaggio’s 56-game hitting streak in 1941.
What makes this streak so remarkable has much to teach us about what really constitutes exceptional performance in business. For in a culture obsessed with extraordinary results and the study of those who achieve them, it is surprising how little effort goes into determining whether or not a given achievement is worthy of our attention. If we think we have something to learn from the greats of business, you’d think we would by now have developed generally accepted ways of determining what counts as greatness. But we have not. Enter Joltin’ Joe.
Streak of streaks
Every batter can have a hitting streak. The likelihood of a streak of a specified length is a function of that player’s batting average. To make the math easy, imagine a .500 hitter. The odds of getting two hits in two at-bats is (0.5)2, or 0.25; three hits, (0.5)3, or 0.125. In other words, for every set of X at-bats, there’s a (0.5)x chance of getting X hits.
What about “hot hands,” you ask, in which a basketball player goes on a “streak” and the likelihood of, say, making the next shot in a game is a function of having made the last one? We might like the idea, but the data show that there’s no such thing in practically any sport: there are almost no documented streaks that go beyond what one would have expected, given players’ performances over the long run.2 Understanding someone’s average performance perfectly predicts their seemingly exceptional performance.
With one shining, indisputable exception: DiMaggio’s streak. Ed Purcell, a Nobel laureate in physics, has concluded that nothing has ever happened in baseball with a frequency beyond what would be expected, given the nature of the underlying system itself.3 Except for DiMaggio, no one has ever managed to rise above the peaks of performance that would be expected, given the players’ abilities and the rules of the game. For example, the longest runs of team wins or losses are as long as they should be, and occur about as often as they ought to, given the winning percentages and, critically, the variability in those percentages for each team. In other words, most records are merely the right tails of the distribution that defines the system. They are, to borrow a term from statistical process control, “common cause” variations, and so, strictly speaking, unexceptional.
This framing puts DiMaggio’s streak in a class of one. Purcell calculated that to make it likely (probability greater than 50 percent) that a run of even fifty games will occur once in the history of baseball (up to the late 1980s when Purcell did this work) there would have had to have been either four lifetime .400 batters, or fifty-two lifetime .350 batters, with careers of over one thousand games. In fact, only three men have sufficiently long careers and lifetime batting averages in excess of .350, and no one is anywhere near .400 (Ty Cobb at .367, Rogers Hornsby at .358, and Shoeless Joe Jackson at .356) over that many games. Other hitting streaks, such as the 44 games of Wee Willie Keeler and Pete Rose, fall within what one would have expected. So although those men were no doubt working hard, and were indisputably in the right tail of the distribution, their greatness – if greatness it be – is hidden behind common cause variation: that is, it was in the nature of baseball to produce such streaks, and those men were simply the ones who happened to have produced them. They were a predictable product of their own abilities and the system within which they operated.
In contrast, DiMaggio’s streak is so remarkable that it can be heard above that white noise. It is truly exceptional because it is sufficiently unlikely. There is “special cause” variation at work – something (we still don’t know what) allowed or enabled or inspired DiMaggio to go beyond the nature of the system – indeed, beyond what would have been expected, even given DiMaggio’s long-run batting average.
Fooled by Randomness
Why should this matter to managers? Because many who lead organizations rely on the study of allegedly remarkably successful companies. What we’ve come to refer to as “success studies” constitute a genre of business book that has a specific formula: identify high-performing firms and isolate their defining characteristics. The brand names in this space are well-known: In Search of Excellence and Good to Great are the reigning champions, but there are many others, including What Really Works, Stall Points, and Profit from the Core.
Of late, these studies have come under attack for failings in their clinical method. It’s been argued that, in particular, the data used to describe the behaviors of these putatively exceptional firms are biased by “halos” – that is, they do not capture the attributes in question independently of the performance one seeks to explain.4
We have a different point to make, one informed by an analysis of DiMaggio’s hitting streak: The companies held up as exemplars of success just might not be so unambiguously exemplary after all. What if the “great” or “excellent” companies have performance profiles that are indistinguishable from common cause variation? In other words, what if we’re studying Pete Rose – or worse, your run-of-the-mill 0.250 hitter – instead of Joe DiMaggio? Studying even the right tail of a distribution doesn’t tell you how to break free of the distribution. In short, if you want to use inferential methods to get outside the box, you have to look at someone who is outside the box!
To see the importance of this step in the analysis, ask yourself this question: If two firms in the same industry had differences in shareholder returns (or sales growth or return on assets…it doesn’t really matter) of 0.1 percent over one year, would you think they were behaving in fundamentally different ways? What about 1 percent? 5 percent? 10 percent? Over two years? Or ten? How much of a difference do you need, and for how long, before the performance differences are marked enough that you’re willing to believe that the two firms behaved differently? In every success study we’ve reviewed, what constitutes great performance is simply asserted. Wouldn’t it be useful instead to know – really know – which companies have delivered truly great performances?
The Power of 10
To answer this question, we first characterized the nature of the underlying system in which all publicly traded firms compete. We looked at all the companies traded on U.S. exchanges for the longest time period for which we could get reliable data: 1966-2006.5 This gave us just over 220,000 firm-year observations. We then ranked each firm’s performance as measured by return on assets in each year, and ascribed a decile ranking to each firm, controlling for the confounding effects of industry, year, size, and share, among other variables.6 We then observed the frequency with which firms moved from any given decile to every other decile. So, for example, we counted the number of times a firm in the 4th decile in year X moved to each of deciles 0 through 9 in year X+1. The result was the “decile transition matrix”, or DTM (Table 1). These frequencies are proxies for the probabilities associated with given movements in competitive “space.” Perhaps the most noteworthy observation is that performance can be very “sticky”: the likeliest outcome next year is that your relative performance will be the same as this year’s.
Since we know how many firms we have, and we know the life span of each of those firms, we can use the DTM as the foundation for repeated simulations of the last 41 years of business activity. Conceptually, each firm gets a ten-sided die loaded according to the DTM’s probabilities. It gets to roll that die the same number times that it appeared in our database. We can observe its performance profile, characterized by how many times it appears in each decile over its lifespan. We repeat this 1000 times for every company (over 17,000 unique firms over our 41 year period). What emerges is a distribution of the kind of performance we might “expect,” given the nature of the system and the actual population of firms.
Now we can quantify the likelihood that the system in which firms function can be expected to generate a particular profile for firms with specified life spans. In the same way that Purcell revealed that Pete Rose’s streak was predictable , but DiMaggio’s wasn’t, we’re in a position to assess whether any given firm’s performance is unlikely enough to be plausibly more than the predictable outcome of common causes.
A Tower of Babel
To test whether or not success studies have been studying unambiguously successful firms we looked at the companies adduced by 11 credible or popular success studies. As shown in Table 2, by the most generous interpretations of our method, only 30 out of 228 different firms held out as exemplars of successful companies are successful enough to stand above the merely fortunate ranks of the hoi polloi.
A “so what” is not an uncommon reaction at this stage; after all, many managers have found any given study profoundly helpful. After all, if the advice “works”, why should it matter if it’s based on true beliefs? The problem is that these studies don’t build on each other, or for that matter even disagree with each other, in any meaningful way. Consequently, each framework is a form of an omnibus resolution, an integrated recipe for achieving a particular outcome. The difficulty of synthesizing the findings of different studies is born of the incompatibility of the frameworks each develops. For example, we are told that companies need “Level 5 Leadership” (Good to Great), “Inside Outside Leadership” (Blueprint to a Billion), and (to paraphrase) ‘Company First Leadership’ (The Breakthrough Company). Which is it? And is there any real difference between these prescriptions, since the various definitions of leadership do not appear to be mutually exclusive. To leadership, one could add M&A, strategy, diversification, and a host of other alleged determinants of success. The defining attributes of successful companies seem to be both amorphous and ever-changing. In short, on almost every dimension of management practice, this genre of work is frustratingly non-cumulative.
Our analysis goes a long way to explaining why, in aggregate, the advice generated by the success-study approach has been so diverse: there’s every reason to believe that each study is looking, for the most part, at little more than a selection of random walkers that have nothing material in common save their horseshoes. With that as a sample, behavioral patterns don’t so much emerge as are imposed by the researchers. However, that’s not science, it’s astrology.
This matters a lot, because it changes fundamentally how one should think about the advice offered in success studies. The authors of these works are savvy observers of the business world, but their advice should be used in the way we use fables, not as evidence-based conclusions. For example, no one reads The Tortoise and the Hare and, faced with a chance to bet on such a race, chooses the tortoise. Rather, people take from this tale the idea that there is merit in perseverance, while arrogance can lead to a downfall. Similarly, because the prescriptions of most success studies lack an empirical foundation, they should not be treated as how-to manuals, but as a source of inspiration and fuel for introspection. In short, their value is not what you read in them, but what you read into them.
Sorting out the “unambiguously great” from the “possibly merely lucky” is a prerequisite for putting any success study on a solid foundation. This simulation method is one way to quantify the degree to which a given company’s performance is truly remarkable and so potentially worthy of further investigation.
For our own success study effort we are using this statistical machinery to define “Miracle Workers” (MWs), “Long Runners” (LRs) and “Average Joes” (AJs). MWs are companies that have delivered a sufficient number of 9th decile years (i.e., top 10 percent annual performances), given their lifespan, to have a less than 10 percent chance of being merely lucky and a less than 10 percent chance of being a false positive. Long Runners have delivered sufficient numbers of years in the 6th-8th deciles, conditional on their life spans, to meet those cutoffs, while Average Joes have average performance, and variability in performance and life spans.
Since whether a firm is remarkable enough to warrant closer analysis is a function of the unlikely nature of its performance profile as assessed by the simulations, we no longer need insist on a specific window of time or life span – e.g., 10 years. Instead, we can now see through the idiosyncrasies among firms to the underlying structure of their performance profiles considered as a whole. Figure 1 shows the percentage of its lifetime over which a firm must deliver 9th decile performances in order to be sufficiently unlikely to be plausibly exceptional.
By way of illustration, consider our analysis of the pharmaceutical industry. Our analysis of all the data reveals that a firm observed in every year of our database must deliver at least 15 years in the 9th decile to be a Miracle Worker. Merck delivers 22 such performances – leaving a vanishingly small probability that Merck’s performance was a function of the system. This is why we are justified in attributing Merck’s performance to Merck. Eli Lilly is exceptional as well, but in a slightly different way: it delivers 40 out of 41 years in the 6th-9th decile, but only nine years in the 9th. So we label Lilly a Long Runner. We can now search for behavioral differences between these two firms confident that there is a material and precisely defined difference in the outcomes of those behaviors.
Better still, we can quantify the magnitude of the difference we’re trying to explain: Merck’s return on assets (ROA) advantage over Eli Lilly is equivalent to about $17 billion in “excess” net income over the 41 years of our study. (That is, had Merck earned Eli Lilly’s ROA over that same time period, it would have earned $17 billion less than it actually did.) And so any explanation we find must be able to account for at least that much money. Finally, we can also identify when Merck generated this “excess” income, allowing us to narrow down when those behavioral differences needed to have manifested themselves in corporate performance.
Now, instead of examining an admixture of great and lucky firms in unknown proportion and looking for consistent behavioral differences, we can identify firms with statistically significantly different profiles and target our search for the influence of key behavioral differences with a known economic impact. We’ve gone from dynamite fishing in randomly chosen bodies of water to fly-fishing in breeding grounds during spawning season. Still no guarantees of coming home with salmon…but which approach would you bet on?
Luck, skill and story telling
“Luck” and “randomness” for many people connote “inexplicable”. That’s why we’ve invoked the term “common cause variation”: no company’s performance profile is causeless, and those causes are unlikely to be beyond our grasp. Pick any firm – no matter how magnificent or mediocre or miserable – and there is a fascinating tale of inspiration, hard word, luck (both good and bad), and some sort of outcome. As the songstress Amanda Marshall put it, “everybody’s got a story.”
Perhaps part of the reason all these stories have yet to cumulate into a coherent picture of the management practices associated with success is that we have been applying the tools of special- cause analysis – case studies – to outputs generated by common causes. The examination of companies with performance that is most likely a result of properties of the system doesn’t tell you anything about how to create great company-level performance. What you end up with, if the fruits of this approach are any indication, are post hoc explanations with no predictive power. If we’re researching the behaviors that drive exceptional performance, it is perhaps much more effective to study companies with exceptional performance. That requires finding “corporate DiMaggios”.7
But taking this seriously is very difficult. Attributing the outcomes of individuals to properties of the system is a very-nearly paradigmatic shift for many. The subtle point upon which so much turns is not that luck “causes” greatness; conceptually, at least, “greatness” is an attribute of the company that is reflected in its performance rather than a guarantee of great performance: at the limit, a company could be “great” and have terrible performance thanks to very bad luck – after all, as the saying goes, “bad things happen to good people.” And, likewise, entirely average firms could appear great thanks to a sufficiently long streak of good fortune.
The key insight is that luck can hide the “true nature” of any company: companies that are behaviorally entirely average can have what appears to be remarkable performance, while great firms might be buried under adversity. Our analysis doesn’t say that the companies identified by others as great are merely lucky; it implies that we can’t say with any confidence that they aren’t merely lucky.
How did these researchers create such compelling narratives, then, if their samples are suspect? The human mind being what it is, the reality is that wherever we go looking for patterns, we find them. The smarter and more insightful the searcher, the more compelling the story behind the pattern, so we can’t count on distinguishing patterns revealed and patterns imposed based simply on the results of the quest. And so, whether or not particular prescriptions are true depends on whether or not they were generated through a rigorous application of the scientific method, not on how much we like the outcome.
To be sure, when it comes to communicating findings, science, like everything else, is ultimately an exercise in story telling: it’s how the mind is wired. But the confidence we have in any theory must ultimately lie in the confidence we have in the process that generated it. And so until one can distinguish between the fabulous and the fortuitous, we can’t really say much at all. In short, we can’t explore what Joe DiMaggio did until we first discover him.
Table 1: Decile Transition Matrix
Probabilities of Moving from Starting Decile(t-1) to Outcome Decile (t),
along with Expected Values of Decile Outcomes for each Starting Decile(t)
|Decile Outcome(t) Probabilities||Expected Value of
Table 2: What are many studies of success most likely studying?
|# High Performers||# We Categorize||Probability of Special Cause Variation > 90%||Probability of False Positive < 10%|
|Big Winners / Big Losers||9||8||3||38%||2||25%|
|Blueprint to a Billion||26||24||15||63%||3||13%|
|Built to Last||18||14||7||50%||2||14%|
|Good to Great||11||8||5||63%||0||0%|
|In Search of Excellence||14||13||8||62%||3||23%|
|Profit from the Core||32||18||13||72%||7||39%|
|What Really Works||14||13||9||69%||6||46%|
Figure 1: Percentage of total lifespan required in the 9th decile of performance to meet various combinations of special cause variation and false positive benchmarks
For firms with ten years of data, companies with 4 years (or 40% of their lifespan) in the 9th decile can attribute their performance to special cause variation with 90% confidence. However, there are so many firms with ten years of data that there is a greater than 50% chance that any given 10-year firm with 4 or more years in the 9th decile has a greater than 50% change of being a “false positive.”
In contrast, if we insist on a probability of false positives of less than 10% when studying 10-year firms, we must insist on 10 years out of 10 in the 9th decile, which also raises our confidence in having identified special cause variation to more than 99%.
The patterns of the columns reveal that as a firm’s lifespan increases it need spend a decreasing proportion of it in the 9th decile to be considered exceptional (columns are shorter for longer lifespans), and that insisting on a higher likelihood of special cause variation also decreases the likelihood of false positives (red columns – indicating a false positive likelihood of less than 10% — appear only when the likelihood of special cause variation is 99% or greater).
- A neologism for the Society of American Baseball Research
- See www.thehothand.blogspot.com
- This summary of Purcell’s work is adapted from “The Streak of Streaks” by Stephen Jay Gould, New York Review of Books, Vol. 35 No. 13 (1988).
- The Halo Effect and the Eight Other Delusions that Deceive Managers, by Phil Rosenzweig (2007).
- There is, of course, sampling bias here, since we’re only looking at firms traded on US exchanges. This element of our design is driven by the availability of reliable data. This has more than a whiff of “looking where the light is brightest”, but it is a constraint for which we know of no viable solution.
- The method is described in detail at www.deloitte.com/persistence.
- By which we mean statistically significant performance generally, not unbroken streaks.