## Monday, May 26, 2008

Percentages don't always do what you expect

I was reading an article at Dailykos (with the ironic title "Montana's huge black population gives Obama the edge"), which is mostly debunking the myth that Obama has a problem with blue-collar whites. (I am not normally given to reading Dailykos, but I do pop in occasionally.)

At one point in the article I was reminded of Simpson's Paradox, and I thought I'd discuss it here, since it lets me talk about elections and about simple mathematics at the same time. Note that the issue doesn't occur in the story to my knowledge - it just served to remind me of it.

Imagine the following situation. Let's say you're looking at how much likely voters approve a particular candidate - let's imagine people are asked a question about exactly one candidate - whether they agree with a statement something like "the candidate would make a good president".

[NB: These numbers are completely made up! They're not real, but are just there to illustrate a point.]

Imagine we ended up with the following table (of percentages):

`Percentage of people surveyed that agree that   candidate would make a good presidentMcCain  (R)    38  Clinton (D)    41  `

Note that in this case, different people are being asked about Clinton and McCain; this would not usually be the case with a typical survey. Also, these figures don't have to add to 100% - all candidates might have 80% approval, or all might have 30%.

So on these figures Clinton is slightly leading McCain in approval (maybe not outside the margin of error, but let's ignore that issue). Let's say you want to figure out whether McCain's biggest problem is with males or females. Fortunately, it turns out that this information is available.

`Percentage of people surveyed that agree that   candidate would make a good president            Men Women McCain  (R)  46    36 Clinton (D)  43    34 `

Hang on a minute! McCain leads Clinton on men and on women!? Did we make a mistake somewhere?

Actually, no, this is possible. Look at these numbers (counts of people):

`Number of people surveyed that agree that   candidate would make a good president                  Men Women TotalMcCain Approved    80  189  269  Number Asked    174  526  700    Clinton Approved  233   55  288    Number Asked  540  160  700`

If you check those numbers out (unless I made a mistake somewhere), they give the percentages I quote above (to the nearest whole percent).

This is called Simpson's Paradox. It's worth knowing about if you're comparing rates or percentages.

What's going on? Well, as it turns out in our sample, women were less approving overall than men, and more women were asked about McCain than were men.

So which figure matters? Well, actually, if the proportion within each of the sub-samples are representative of the voting population, you'd need to weight each figure by the proportion in the population ... and then McCain would tend to come out looking better.

Dana Hunter said...

Wow... I very nearly understood that.

(Note that I am the kind of person who has to think about things like 2 + 19 = 21.)

Amazing how interesting stories can be hidden away inside those dull ol' numbers, innit? Thanks for this!

Efrique said...

I know a mathematics professor who would have to pull out a calculator for that. Indeed, in one discussion I had with him, he did pull out his calculator - which he always carried - to add a one digit to a two digit number, because he had no sense of what they added to. The one digit number was larger but the two digit number was smaller, and it was about as difficult.

He's a good mathematician, too - far better than me; he just can't do simple arithmetic in his head.

[My six year old daughter, on the other hand, blows me away. One evening, she asked for more and more "hard" addition problems, and after a while I was out of inspiration, so I taught her the rule for the Fibonacci numbers (start with 0 and 1, and the next number is always the sum of the previous two). I said "just keep finding the next one until you think you've done enough", expecting her to stop at 21 or 34.

- oh, and she doesn't yet know how to do carries in multidigit addition, so multidigit addition is done entirely in her head -

well, a couple of minutes later, she comes back ... with a page of numbers, running down the page.

All correct, down to the 17th Fibonacci number (1597**).

She was adding three-digit numbers to three-digit numbers in her freaking head. (Without knowing how to carry, so she was using some pretty sophisticated tricks to do them.)

** just in case anyone works them out, the 0 at the start is not the first Fibonacci number, it's the zeroth (yes, mathematicians are odd like that), so she actually had 18 numbers written in her list.

Dana Hunter said...

I think I'm going to have to borrow your daughter to do the complex math for my worldbuilding... wow. You have some incredible kids, Efrique - which doesn't surprise me a bit, considering who their father is.

Thanks for sharing the story about the math prof - gives me hope, that does! If I can get past the mundane crap, calculus and I may turn out to be friends. One never knows!