Wer die Welt verstehen will…

darf sich vor Statistik nicht fürchten. Folgende Zahlen hat Robert Basic auf gestern zitiert:

2005 betrugen die gesamten Einnahmen durch SMS über 74 Milliarden Dollar weltweit. Zum Vergleich: Hollywood spielte etwas unter 30 Milliarden Dollar ein, die Einnahmen der Musikindustrie lagen weltweit bei 35 Milliarden Dollar, und Videospiele, Konsolen, und Software brachten um die 40 Milliarden Dollar ein. Der Wert aller 2005 verkauften Laptops betrug 65 Millarden Dollar. SMS alleine bringen mehr ein als irgendeiner dieser Wirtschaftszweige (…) und SMS bedeuten immer noch über 90% Profit. Wir sollten diese Industrie lieben! (Tomi T. Ahonen, englischer Fach-Autor)

Und weil Statistik meist so eine trockene Angelegenheit ist, an dieser Stelle der wichtige Hinweis auf – die wunderbarste aller Statistikseiten (die ich kenne) – es ist wirklich geradezu unglaublich, wie Hans Rosling und seine Mitarbeiter langweilige UN-Statistiken zum Leben erweckt haben.

Mittlerweile gibt es auch (vielleicht gar nicht so überraschend) eine google-spezifische Flash-Version davon. Ich kann auch jedem nahe legen, sich den Webcast von Hans Roslings Vortrag anzusehen – wirklich sehr erhellend.

compulsory reading

Jane Galt on Signal And Noise

Megan McArdle has an interesting post about the difficulties of gaining useful data through polls. She correctly states that – however carefully chose your sample may be – some people will always lie, that is, she implies, a lot of people will lie.

Personally, I think that questionnaire framing in conjunction with the reality construction effect of increased statistical attention to a particular issue (if you increase statistical research of domestic violence, chances are you’re going to see a huge – statistical – increase simply because you’re asking more often) is the bigger problem with respect to reliable idea of what’s going on in people’s minds – as it is usually done with the result in mind.

But Megan is right: people will lie for the pure fun of it, because they want to be in compliance with what they believe are societal expectations, or those of the person asking them, even in anonymous surveys, they will “lie” because they don’t really get what the pollster asks and feel it is embarrassing not to understand, and they will lie for a whole lot of other reasons I haven’t thought of and for which a poll would probably not be helpful. But on the other hand, sometimes “lying” is too big a word for a different perception of reality… just ask the PR people in the US Department of Defense, they would probably agree these days ;-) Megan writes –

Surveys are bad because people lie. And the more important/interesting the subject, the more they lie. Imagine you did a survey: would you hide Jews in your basement if you lived in Nazi Germany? You’d probably get a “yes” response 90+ percent of the time. Yet if you transported all the people you surveyed to Nazi Germany, where they would actually have the opportunity to dedicate an unknown number of years to hiding a dangerous person in their basement, feeding and clothing them, emptying their chamberpots, and putting their entire family in danger from the kind of people who roll up to your door in the middle of the night and carry away even your smallest children to a location where their fingernails may be pulled out, their eyes gouged and their bones broken without disturbing the neighbors, you would find that your “yes” response dropped to a tiny fraction of 1%. The 90% yes response is what we call stated preference, and it doesn’t correlate very well with revealed preference, which is what we call what people actually do, rather than what they say they would do.

Interestingly, I remember reading about a recent poll in which Germans were asked about their ancestor’s behavior during the Nazi period (unfortunately, I can’t remember where I read that) and such an astoundingly large part of their ancestors were actively involved in saving oppressed people that someone commenting on the figures wondered what happened to all the people who looked the other way when Nazi thugs came to pick up their neighbours. Some were clearly lying for one or the other reasons above, others just stressed the part of reality they found more convenient to convey. Think about it, could “my granny told me how she did not report someone/something to the authorities” not be interpreted as “active” protection? As reality is so complex, there’s no clear-cut way to differentiate what’s true, and what’s not. If you define cut-off points, ie categories, you once again run the risk of creating reality instead of reporting it. Gosh, Werner v. Heisenberg was a wise guy indeed…

A statistical problem relating to more recent German politics are the poll results for fringe parties in any election. Believe it or not, but the figures reported for fringe parties on tv are basically just made up. They are informed guesses, but if my information is correct, they are essentially made up. The reason for this is that the polling institutes simply do not get any useful number of responses for these parties – because of small, “representative” samples as well as the reasons cited above. They know that these parties do get some votes – you can count them, after all, later on.

But in the meantime, they’re just guessing. And as it’s only fringe parties, it usually doesn’t make a real differencein most cases. But it’s nonetheless interesting to know how the tv presenter arrived at “other parties: 3,9 percent”.