. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Nap Time!!!

Sunday, September 09, 2007
Explosively Obvious

I'm important!!!
Last spring, Cal graduate student Mandy Johnson wrote a paper looking at why parents picked certain schools in the choice-based San Francisco district.

"I just thought it would be interesting," says Johnson, who is now a policy analyst for the district. "I realized that it could be explosive if I could prove this."
What was this explosive result?
The top factors correlated with low demand were the prevalence of low-income students and - here's the really troubling one - race. Specifically, Johnson found, "as the percentage of African American students in the school increases, kindergarten demand decreases."
Er... I didn't realize that wasn't obvious.
By the way, for those assuming this is something that can be explained away by the interplay of race and poverty, it isn't. Johnson said she used a statistical tool called regression analysis, which allowed her to isolate factors such as income and skin color. For example, the researcher found no correlation between school choice and the number of Latino students, who are disproportionately lower-income.
Oh, hey, a statistical tool. Of all the statistical tools to use in order to isolate factors, regression analysis is the one you don't use. Indeed, regression analysis requires independent predictors. If they are highly correlated (as race and poverty are), then regression analysis is going to give you ridiculous results.

The statistical tool you're looking for is called "control." You compare schools of similar socioeconomic status (according to some measure) and look for racial correlation. This is how you "isolate" factors. Still, the measurement needs to control for a lot of other things. The comparison to Latino students may also serve as a control of sorts, but that's not regression analysis.

The conclusion is blindingly obvious, of course, so there wasn't any need for even a bad analysis. I don't know if the analysis was good, by the way, but the reporting by C.W. Nevius at the very least suggests he doesn't know what he's talking about.

Update: Actual study here. The study suggests both directions of causality (i.e. black people and poor people may also just suck at picking good schools in S.F.'s school selection process) are possible. But it looks like the study just went ahead and did the regression analysis that C.W. Nevius reports, which seems downright stupid.

posted by Beetle Aurora Drake 9/09/2007 11:53:00 AM #
Comments (4)
. . .
Actually, regression analysis does control for correlation between variables. When you include a variable in a regression, you call it "controlling" for the effect of that variable. This method works as long as the two aren't exactly the same (or linear combinations of each other).
If they're closely correlated, you get a very flimsy regression that'll be dramatically wrong even with very little random error. Unless the whole independent variable space is well-covered, you can't reasonably extend your linear approximation to the rest of the space.
Define closely correlated. Just because two regressors (race and income) tend to move in the same direction doesn't mean they can't both be accounted for in a regression. If every African American student was poor and every non-African American student was non-poor, then we'd have problems.

Also, now that I actually looked at her regression, the results for the middle column (Total Requests Per Seat) imply that "% White students" decreases demand significantly and more than "% African American". But she didn't put it in bold so it doesn't count.
The correct statement for when we'd have problems is if every black-dominated school was poor, rather than on a student-by-student basis. And that's actually almost true.

Correlation isn't itself problematic, but if all your data falls pretty much on a hyperplane smaller than the whole space (and I think it does, here), then it just doesn't work. I'm not surprised to find that result concerning whites, for this very reason.

It's the duty of a statistical report to justify its validity. I don't think this one does.
Post a Comment

. . .