Oops, Were We Wrong? Rethinking How to Measure CSR


When we buy shares in a company, we expect to earn a fair return on our equity. But nowadays, many also hope to send a message to management along with their money: Compete hard, earn a profit, and do it ethically. By favoring socially responsible firms, more and more investors hope to reward good corporate citizens and make a point to those that fall short.

And now an entire industry has grown to help accomplish that. In 2014 there were nearly 1,000 investment funds using ESG criteria — environmental, social, and governance factors — in managing their portfolios, with assets totaling $4.3 trillion. But to create the right incentives, these funds have to reward the right firms, and that’s not as easy as it sounds. It presumes that we can accurately distinguish varying levels of social commitment in companies.

The industry has long relied on a database called KLD STATS that evaluates firms each year on whether or not they meet more than 80 specific positive and negative social criteria, in areas ranging from environmental impact to diversity to labor practices. Then, adding up the point scores, the KLD Index ranks companies on a continuum from least to most responsible, making it easy to identify or rule out investment candidates.

But just how reliable is that ranking? In a new study, Brian Richter, assistant professor at the McCombs School of Business, along with Robert Carroll of Florida State and David Primo of the University of Rochester, tested the KLD Index to see how well it actually predicts social responsibility. The results weren’t good. “It takes a fundamentally flawed approach,” Richter says.

Instead, the researchers propose a new measure based on item response theory that generates far superior rankings even when applied to the same underlying data, giving investors a more realistic idea of which companies are actually leaders in corporate social responsibility.

Oops. Sorry, Walmart
A fund manager applying the KLD Index rankings a decade ago would have had good reason to shift money out of Walmart stock and into a high-tech firm like Apple. Starting from a low score in 1991, Walmart declined further in subsequent years to become one of the “worst” corporations in America.

Applying the new measurement technique, however, Richter and his coauthors found just the opposite: Walmart had rapidly and steadily improved, overtaking Apple by the early 2000s and going on to become one of the most socially conscious firms. Of the large tech firms in the S&P 500, on the other hand, two-thirds rank lower under the new measure than the existing one.

How could this happen? The KLD database itself offers a rich portrait of corporate behavior. The problem, Richter says, is in how the Index combines that information to come up with an overall rating. By simply adding up point totals for each question (+1 or 0 for “strengths,” –1 or 0 for “concerns”) like a quiz in a magazine, it implicitly assumes they’re all equally good proxies for the underlying social commitment we’re trying to measure.

“That’s pretty hard to justify,” Richter says. “Does having a female CEO offset illegal dumping of toxic waste?” Likewise, it ignores the fact that some criteria are harder to satisfy than others, and the difficulty might vary from industry to industry: A bank will always pollute less than a manufacturer. Does that really mean banks are more responsible?

Testing the Test
In fact, Richter says, the data itself can tell us which indicators are more meaningful. “It’s like on a calculus exam: If there’s a question everyone gets right, it doesn’t distinguish varying levels of knowledge. Maybe recycling is like that today. It has lost meaning over time.” But in the KLD Index, it would have the same value as something more ambitious like converting to solar power.

The new metric solves that problem by evaluating the test-takers and the test at the same time, seeing which questions provide the most insight. Then it recalculates the scores, applying different weights to the questions to improve the results.

“The basic insight is that there’s more information in the data — between the lines, as it were,” Richter says. “That’s how we’re able to generate better results.”

To be sure, it’s not as simple as adding up points in a spreadsheet. The analysis begins with a mathematical model of how a company’s (unobservable) level of social commitment gives rise to its (observable) choices. That’s what item response theory does: It measures hidden, or “latent,” attributes by way of proxy indicators. (It’s also been used to gauge political ideology, rating legislators on a conservative-liberal continuum based on their votes.)

Richter and his research colleagues ran 5,000 simulations on their model, ending up with a final data matrix of 109 million elements. “This would have challenged computer processors 10 years ago,” Richter admits. “Now we can do it on a laptop, but it still takes days.”

New Insights
To show that the new approach yields more accurate rankings, the researchers took the ratings generated by each metric on 2009 data and used them to predict how those same firms would score on new questions that were added to the database the following year. In each case, the new measure was superior; on some, the KLD Index performed worse than random guessing.

This means the researchers drew different conclusions from the same data. For instance, where the KLD Index shows that corporations generally became less socially responsible over the past 25 years, the new measure shows steady improvement. “That’s more in line with what we hear anecdotally,” Richter says. “Managers today often say this is a necessary part of doing business.”

They also find that most of the gain in social responsibility is coming from the very largest firms. For instance, contrary to the popular notion that Google has drifted from its early “Don&;t be evil” idealism as it grew, the results show that it started out as middling and has rapidly improved.

Apple also improved over the long haul, though less consistently: There were marked dips whenever Steve Jobs returned from a hiatus. “That fits in with theories that social performance is driven by top management,” Richter says.

While their research focuses on social responsibility, the authors stress that it’s just one example of the power of their approach. Poor measurement has been called “one of the most serious threats to strategic management research,” and item-response modeling can improve results in many other areas.

But with trillions of well-intentioned dollars trying to hold firms to account and make a difference, the real-world value of this new metric is that it can help make sure the money gets to those companies that deserve it.

View companies’ CSR data at http://socialscores.org.


Using Item Response Theory to Improve Measurement in Strategic Management Research: An Application to Corporate Social Responsibility is published in the Strategic Management Journal.


Faculty in this Article

Brian Richter

Assistant Professor, BG&S McCombs School of Business

Brian Richter holds degrees from MIT, UCSD, and UCLA, where he earned a Ph.D. in global economics and management. Prior to joining the faculty at...

About The Author

Lee Simmons

Lee is a writer and editor at Wired magazine. He studied at Harvard Business School and MIT and taught economics at Harvard College. He was later...

Leave a comment

We want to hear from you! To keep discussions on-topic and constructive, comments are moderated for relevance and for abusive or profane language.
Login or register to post comments