## Big data and social science: Mullainathan’s Edge talk

The embedded video links to an Edge talk with Sendhil Mullainathan on the implications of big data for social science. His thoughts come out of research he is doing with computer scientist Jon Kleinberg [website] applying methods for big data to questions in behavioral economics.

Mullainathan focuses on how inference is affected when datasets increase widthwise in the number of features measured—that is, increasing “K” (or “P” for you ML types). The length of the dataset (“N”) is, essentially, just a constraint on how effectively we can work with K. From this vantage point, the big data “revolution” is the fact that we can very cheaply construct datasets that are very deep in K. He proposes that with really big K, such that we have data on “everything,” we can switch to more “inductive” forms of hypothesis testing. That is, we can dump all those features into a machine learning algorithm to produce a rich predictive model for the outcome of interest. Then, we can test an hypothesis about the importance of some variable by examining the extent to which the model relies on that variable for generating predictions.

I see three problems with this approach. First, just like traditional null hypothesis testing it is geared toward up or down judgments about “significance” rather than parameter (or “effect size”) estimation. That leaves the inductive approach just as vulnerable to fishing, p-hacking, and related problems that occur with current null hypothesis testing.* It is also greatly limits what we really learn from an analysis (statistical significance is not substantive significance, and so on). Second, scientific testing is typically some form of causal inference, and yet the inductive-predictive approach that Mullainathan described in his talk is oddly blind to questions of causal identification. (To be fair, it is a point that Mullainathan admits in his talk.) The possibilities of post-treatment bias and bias amplification are two reasons that including more features does not always yield better results when doing causal inference (although bias amplification problems would typically diminish as one approaches having data on “everything”). Thus, without careful attention to post-treatment bias for example, the addition of features in an analysis can lead you to conclude mistakenly that a variable of interest has no causal effect when in fact it does. The third reason goes along with a point that Daniel Kahneman makes toward the end of the video: the predictive strength of a variable relative to other variables is not an appropriate criterion for testing an hypothesized cause-effect relationship. But, the inductive approach that Mullainathan describes would be based, essentially, on measuring relative predictive strength.

Nonetheless, the talk is thought provoking and well worth watching. I also found the comments by Nicholas Christakis toward the end of the talk to be very thoughtful.

Posted in Uncategorized | 3 Comments

## jobs: reducing gender inequality research consultancy with World Bank

Working in collaboration with various partners, [the consultant] will focus mainly on the design, implementation, and data analysis for a set of rigorous impact evaluation studies, also working to design innovative development interventions to address gender inequality.

See the attached terms of reference for more details including how to apply: [PDF].

## Meta-analysis and effect synthesis: what, exactly, is the deal?

Suppose we have perfectly executed and perfectly consistent, balanced randomized control trials for a binary treatment applied to populations 1 and 2. Suppose that even the sample sizes are the same in each trial ($n$). We obtain consistent treatment effect estimates $\hat \tau_1$ and $\hat \tau_2$ from each, respectively, with consistent estimates of the asymptotic variances of $\hat \tau_1$ and $\hat \tau_2$ computed as $\hat v_1$ and $\hat v_2$, respectively. As far as asymptotic inference goes, suppose we are safe to assume that $\sqrt{n}(\hat \tau_1 - \tau) \overset{d}{\rightarrow} N(0, V_1)$ and $\sqrt{n}(\hat \tau_2 - \tau) \overset{d}{\rightarrow} N(0, V_2)$, with $N\hat v_1 \overset{p}{\rightarrow} V_1$ and $N\hat v_2 \overset{p}{\rightarrow} V_2$.* (This is pretty standard notation, where $\overset{d}{\rightarrow}$ is convergence in distribution, and $\overset{p}{\rightarrow}$ is convergence in probability, under the sample sizes for each experiment growing large.) Even with the same sample sizes in both population, we may have that $V_1 > V_2$, because outcomes are simply noisier in population 1. Suppose this is the case.

A standard meta-analytical effect synthesis will compute a synthesized effect by taking a weighted average where the weights are functions, either in part or in their totality, of the inverses of the estimated variances. That is, weights will be close or equal to $1/\hat v_1$ and $1/\hat v_2$. Of course, if $\tau_1 = \tau_2 = \tau$, then this inverse variance weighted mean is asymptotic variance-minimizing estimator for $\tau$. This is the classic minimum distance estimation result. The canonical econometrics reference for the optimality of inverse variance weighted estimator for general problems is Hansen (1982) [link], although it is covered in any graduate econometrics textbook.

But what if there is no reason to assume $\tau_1 = \tau_2 = \tau$? Then, how should we interpret the inverse variance weighted mean, which for finite samples would tend to give more weight to $\hat \tau_2$? Perhaps one could interpret it in Bayesian terms. From a frequentist perspective though, which would try to relate this to stable population parameters, it seems to be interpretable only as “a good estimate of what you get when you compute the inverse variance weighted mean from the results of these two experiments,” which of course gets us nowhere.

Now, I know that meta-analysis textbooks talk about how, when it doesn’t make sense to assume assume $\tau_1 = \tau_2$, one should seek to explain the heterogeneity rather than produce synthesized effects. But the standard approaches for doing so rely on assumptions of conditional exchangeability— that is, replacing $\tau_1 = \tau_2$ with $\tau_1(x) = \tau_2(x)$, where these are effects for subpopulations defined by a covariate profile $x$. Then, we effectively apply the same minimum distance estimation logic, using inverse variance weighting to compute the $\tau_2(x)$, most typically with an inverse variance weighted linear regression on the components of $x$. The modeling assumptions are barely any weaker than what one assumes to produce the synthesized estimate. So does this really make any sense either?

It seems pretty clear to me that the meta-analysis literature is in need of a “credibility revolution” along the same lines as we’ve seen in the broader causal inference literature. That means (i) thinking harder about the estimands that are the focus of the analysis, (ii) entertaining an assumption of rampant effect heterogeneity, and (iii) understanding the properties and robustness of estimators under (likely) misspecification of the relationship between variables that characterize the populations we study (the $X_j$s for populations indexed by $j$) and the estimates we obtain from them (the $\hat \tau_j$‘s).

*Edited based on Winston’s corrections!

Posted in Uncategorized | 3 Comments

## What does The Economist know about affirmative action?

This week the cover feature of The Economist magazine argues that it is “time to scrap affirmative action” (link). In the US, the article anticipates the Supreme Court’s hearing cases from Michigan and Texas that may have major impact on the practice of affirmative action in college admissions. (See this post for a summary of affirmative action’s evolution in the US: link.) The magazine also considers affirmative action in the US against experiences elsewhere, with separate stories focusing on South Africa and Malaysia. This comparative perspective is welcomed. I recently taught an undergraduate seminar studying affirmative action policies around the world (course page). The seminar considered arguments on either side of affirmative action debates in different countries, trying to develop for students a nuanced perspective.

As is often the case with Economist features, I came away annoyed by the superficiality of its journalists’ treatment of the issue. First, the journalists point to the fact that while affirmative action policies are targeted toward historically “disadvantaged” (a euphemism…) groups, the beneficiaries are often relatively better-off members of those groups. This is taken to mean that affirmative action is broken, insofar as it does not necessarily help the neediest. This kind of argument often comes up, and a commonly proposed “solution” is to do away with group based affirmative action and pursue instead policies aimed at improving prospects for the poor—that is, to replace “race” with “class” as the basis for preferential policies.

This line of argument is highly problematic to me. Why should families that are members of historically disadvantaged groups become disqualified for reparation because of their success? Reparation is still meaningful for such families: a family that is middle class despite discrimination may well have been upper middle class had there been no discrimination. Many reasonable theories of justice would take such a gap to be quite worthy of redress (cf. the work by Lowry, Weisskopf, and Fryer on the syllabus for my seminar linked above). Arguments going back over a century to DuBois’s “Talented Tenth” essay (link) go even further to propose that redress of this variety restores among society’s elite positions for members of groups that have been discriminated-against. Note that a natural implication of such restoration is an increase in income inequality within the formerly discriminated-against group. So long as this is a reflection of the income “ceiling” being lifted rather than an income “floor” falling, such an increase in inequality poses no ethical dilemma with respect to concerns related to redressing legacies of discrimination. (Whether inequality per se nonetheless deserves attention is a separate issue.)

In a nutshell, there is a problem with conflating redress of legacies of racial discrimination on the one hand with poverty relief or removing barriers to class-based mobility on the other. A switch from race-based to class-based affirmative action would inevitably dilute reparation of race-based discrimination in favor of transfer to groups who were not victims of institutionalized discrimination. In the US, say, the problem is that sustaining group- rather than class-based preferences is a very hard position to sustain with the mass electorate, because it conflicts with the self interest of the (non black) majority. As such, sustaining this position would require keeping it out of the hands of the mass electorate—the usual minority protection arguments. In places like South Africa and, now, India, the situation is different, as “disadvantaged” status applies to majorities.

The other problem I had with the Economist’s treatment of the issue was their failure to engage adequately with the deep empirical literature on this topic. In discussing affirmative action in the US, the journalists spend a lot of time on Sander and Taylor’s work on the so-called “mismatch” hypothesis, without considering solid critiques of these findings by some of the sharpest minds in social science: link. That fact that none of these critiques were mentioned is a sad statement either of how journalists abuse scientific evidence to make points that they find appealing on the basis of taste or ideology, how little research is actually done in the production of pieces for highly influential venues like the Economist, or how catchy but potentially fallacious arguments generate buzz that drown out critiques (cf. Rogoff-Reinhart).

This is not to say that I am an uncritical believer in the expansion of affirmative action policies. I take the potential for perverse incentives seriously and understand how thorny it can be to design mechanisms for redress in a worse-than-second-best world. What I do hope for is that such policies are considered on the basis of relevant considerations and a serious review of the evidence.

Posted in Uncategorized | 2 Comments

## Northeast Methodology Program Annual Meeting: Methods for Text, Friday, May 3

For those of you in the greater NYC area, we will be hosting the annual Northeast Methodology Program meeting at NYU on Friday, May 3. The program starts at noon with lunch, followed by an afternoon of presentations. This year’s presentations focus on methods for analyzing text. The lineup includes the following:

• Justin Grimmer (Stanford University), “The Impression of Influence: How Legislator Communication and Government Spending Cultivate a Personal Vote”
• Burt L. Monroe, Eitan Tzelgov, and Douglas R. Rice (Penn State University), “Measurement of Topics and Topicky Concepts in Text”
• Nick Beauchamp (Columbia University), “Someone is Wrong on the Internet: Political Argument as the Exchange of Conceptually Networked Ideas”