Sunday, June 29, 2008

Not fooling ourselves (I) - the unmeasuring of asymmetry

In order to impose some kind of structure on our understanding of complex phenomena, we tend to use simple terms to describe what may be surprisingly nonsimple.

In order to make progress, we often attempt to quantify those descriptives. This is not just useful, it's often unavoidable, but the act carries with it a special danger, because we then (almost universally) invest the nonunique quantification of the concept with a "reality" that it doesn't merit - and that can lead to nonsense.

[It may be this is another version of the common phenomenon of believing mathematical models when at the beginning we knew them to be at best a rough approximation. Models tend to take on a life of their own, and their conclusions are often treated with a respect we did not accord the original model when it was first tentatively adopted. We need to step back and remember the model was never exactly the thing it was used to describe.]

I think this may be the flip side of what Blake Stacey was talking about when he was discussing the confusions that come in when an inherently mathematical concept is translated into a nonmathematical description.

Let me take a concrete example with which I am familiar. It relates to descriptions of probability distributions.

We begin with something that is inherently mathematical - symmetry. Symmetry is an extraordinarily useful, almost universal concept in mathematics. For what I'll be talking about you can just use the more common senses of reflection symmetry and rotational symmetry.

In the case of symmetry of distributions, there are several ways to define it (if you have a continuous or a discrete random variable, you can define it in the usual "mirror symmetry" sense), but the more general definition would have something like "m is the centre of symmetry if Prob(Xm-a) = Prob(Xm+a) for all a" (which effectively corresponds to rotational symmetry of the distribution function about the points (m, 1/2), if you tidy up details the right way).

Anyway, "symmetry" is both easily understood and fairly easy to pin down. Of course, almost all distributions are not symmetric (but symmetry arises in a natural way in particular circumstances, so it's far from a useless concept).

Aside from "unimodal" (one "hump"), or the overused and badly abused "bell-shaped", one of the most common descriptions of a distributional shape that is applied would probably be "skewed". An elementary book might have a diagram like the top one below (see for example, the diagram at Wikipedia's entry on skewness). The corresponding distribution function is underneath.

Diagram of a right (positive) skewed density and distribution function.

If you flipped the above density left-to-right (or rotated the distribution about the median), it would have left (negative) skewness.

Notice that "right skewed" means the long tail is to the right (this is often the opposite of a beginner's intuition about what the term should mean).

The problem comes when this seemingly clear but actually vague notion is quantified. There are numerous quantities that have been called "skewness". By far the most popular is the standardized third moment (or, equivalently, the standardized third cumulant) - so much so that it is frequently called "the" skewness. Equivalent sample statistics are used for samples.

Now for a distribution like the one above, this quantity is positive (when it exists). If you flip that density shape left to right, the skewness measure is negative. Importantly, when the density is symmetric and the first three moments exist, the skewness is 0. So far so good - left and right skewness and symmetry in pictures generally correspond to negative, positive and zero quantities on the measurement.

However, the problem comes when interpreting a standardized third moment back in terms of the density.

A positive number will lead people to call the distribution "right skew" without looking at it. A number near zero will cause people to call the distribution "symmetric". In the first case, the distribution will be asymmetric but it may not appear to be skewed with a tail to the right. And the value can be exactly zero without the distribution being symmetric. While symmetry implies zero third moment (if it exists), the implication does not go back the other way.

Consider a fair six-sided die with the following labels on its faces: 0, 0, 5, 5, 5, 9. The distribution of outcomes is not symmetric, yet its "skewness" measure is zero. Unfortunately, based only on the information that the third central moment was zero, many people would in fact describe it as symmetric.[Added later: here's one that most people would say on inspection was "right skew" - an ordinary die labelled 0, 0, 5, 5, 7, 10. But the third central moment is zero.]

Other examples, both continuous and discrete, abound. Continuous asymmetric distributions exist for which all odd central moments are zero.

Other measures of skewness (and there have been many) also have their problems, though some are very useful.
The sort of problem described here is essentially unavoidable - there are so many ways a distribution may be asymmetric that a single measure of asymmetry cannot possibly suffice, except perhaps within the framework of a particular family of distributions.

This cautionary tale is not an argument against using measures like the third central moment to attempt to capture something of deviations from symmetry - it's an argument against investing them with more than the limited meaning than they posses.


Anne M. Archibald said...

Continuous asymmetric distributions exist for which all odd central moments are zero.

Really? I thought that the moments were like Taylor coefficients, sufficient to reassemble the PDF. Do you have an example, or a link to a paper with one?

Efrique said...

It is not the case that you can always go from moments to a distribution, even when all moments exist.

Kendall and Stuart, Advanced Theory of Statistics 2nd Edition, Vol 1 has an example (it would also be in 3rd edition, it might be in 4th Edition). See Example 3.12, p91, which gives a family of distributions on [0,∞) that depend on a parameter λ, but whose moment sequences are all the same (the moments exits but do not depend on λ).

Consequently, if one were to take random variables X1 and X2, from that family (based on two different values of λ), and construct a new random variable Y which is an equal mixture of X1 and -X2, it will have an asymmetric distribution on (-∞, ∞) whose odd moments are all zero.

Further, see exercise 6.21 on p179 in the same volume, which establishes that the lognormal distribution is not uniquely determined by its moments.

[I seem to vaguely recall that there might also be something on this issue in Romano and Siegel's book "Counterexamples in Probability and Statistics", but I can't be sure, not having a copy to hand.]

So, anyway, a given sequence of moments is not necessarily unique to a distribution - two different distributions may share a moment sequence.

What's going on? There is a one-to-one correspondence between characteristic functions and distribution functions, but even when moments of all orders exist, the MGF may not exist.

[If the MGF exists, you can generally invert the transform to obtain the distribution - note the connection between MGFs and Laplace transforms, and so existence issues correspond: the same kind of underlying connection exists between characteristic functions and Fourier transforms - the characteristic function of X is the Fourier transform of the density of -X.]