|
kohelet
|
 |
« on: October 18, 2006, 02:14:18 PM » |
|
I have a pretty specific stats question. I've hunted and hunted for an answer to this question ever since I took my first statistics course, and I'm thinking about it now as I prepare a review of descriptive statistics for class tomorrow night: Why, oh why, is the standard deviation the preferred measure of spread instead of the average absolute deviation?
Standard deviation: Square the distance between each value and the mean, add them up, divide by N, and take the square root of that. Affected a lot by outliers, which get more weight than values closer to the mean; needs a correction for sample size, so you actually end up dividing by the mysterious "N-1," which I don't entirely get, either.
Average absolute deviation: Add up the absolute values of the distances between each value and the mean and divide by N to get the average distance. Simple.
The AAD is so much more intuitive to me, and I can't figure out why everyone--and every statistical test--uses SD and not AAD. You would put a longstanding pestering question to rest for me if you could give me an answer or point me to one.
(Man, I'm a nerd: I just about have goosebumps in anticipation of maybe getting an answer to this!)
|
|
|
|
|
Logged
|
|
|
|
|
placid_casual
|
 |
« Reply #1 on: October 18, 2006, 03:29:55 PM » |
|
I am not a statistician and this is not my field, so I may bungle this terribly. But my (shaky) understanding is: Punishing you for big outliers is the point of using sd, rather than an unintended consequence. It is to do with the criterion for a good slope estimate, which assumes equal errors (at zero) and so should be sensitive to non-equality (i.e. large outliers). N-1 is to do with degrees of freedom. In a set of data with the cumulative total known, every value is 'free to vary' (could assume a lot of different values) as they are revealed sequentially *except* the last one, which must equal x in order for the overall total to come out right. So if we have n=3 and the total is 10, once we know a=3 and b=5, then c *must*=2, ie is not "free to vary".
|
|
|
|
|
Logged
|
|
|
|
|
psychle
|
 |
« Reply #2 on: October 18, 2006, 05:21:29 PM » |
|
Interesting question. I don't have an answer for you, but I'm thinking that SD may have some desirable mathematical properties that make it better than AAD. Or maybe it's just an arbitrary choice that was made in the history of statistics.
Regarding division by N-1, what placid_casual said is correct. N-1 refers to your degrees of freedom (i.e., the number of values that are free to vary to obtain a given measure). If you're interested in the mean for a single group of size N, then all but one value can vary, hence N-1 degrees of freedom.
|
|
|
|
|
Logged
|
|
|
|
|
kohelet
|
 |
« Reply #3 on: October 25, 2006, 12:47:05 PM » |
|
Thanks for the replies, psychle and placid. I'm guessing the lack of other replies reflects that (1) this really is a very boring question and (2) your answers did the trick. Despite (1), I think it's interesting that SD is used almost exclusively, even for the sole purpose of describing distribution (even when NOT used in additional analyses). I can't think why AAD wouldn't be more desirable for description. Does everyone just use SD without thinking what it means? Do we use SD out of habit? Do we use SD just in case we need to plug it into some formula later? Does anyone ever actually do that?--don't we all just let STATA or SPSS or whatever do the work? Hmmm. Things that keep me awake at night.
I thought I understood the degrees of freedom thing, but I didn't see that it applied here, and now I'm not sure I understand it very well after all. I can remember at least three stats courses where the instructor sort of skipped over the whole issue by saying it's too complicated and you don't really need to understand it to do it, etc. I'll try to wrap my brain around that later.
|
|
|
|
|
Logged
|
|
|
|
flyguy
I can't believe they let me be a
Senior member
   
Posts: 548
Proving once again quantity rules over quality
|
 |
« Reply #4 on: October 25, 2006, 01:21:05 PM » |
|
Try this: http://www.itl.nist.gov/div898/handbook/eda/section3/eda356.htmThis sentence is key for your question: "In summary, the variance, standard deviation, average absolute deviation, and median absolute deviation measure both aspects of the variability; that is, the variability near the center and the variability in the tails. They differ in that the average absolute deviation and median absolute deviation do not give undue weight to the tail behavior. " So if you data set is noisy (lots of outliers) then the AAD would be a poor estimator of the variation around the mean. In my field (ecology) the standard error is used much more than the standard deviation, as SE estimates the grand mean, not the population mean. As far as the N-1 goes, this is from Sokal and Rohlf (pg 53): "The sample mean Y-bar is an unbiased estimator of the parametric mean (mu). The sample variance is not unbiased. On the average it will underestimate the the magnitude of the population variance. To overcome this bias, mathematical statisticians have shown that when sums of squares are divided by n-1 rather than n, the resulting sample variances are unbiased estimators of the population variance. " It's all about the bias in the estimate.
|
|
|
|
|
Logged
|
"I don't accessorize. I'm Howard Moon. There's a simple truth to me." Howard Moon
|
|
|
|
alsorun
|
 |
« Reply #5 on: October 25, 2006, 04:01:28 PM » |
|
Ok, hopefully I can satisfy you. There are several reasons. First, the most important normal distribution is determined by the mean and standard deviation. The AAD does not factor in. Because of this, the standard deviation (similarly standard error of an estimate) is an important quantification of a distribution. Second, standard deviation (variance to be more precise) is easier to manipulate mathematically.
The AAD, however, is more robust against outliers. It is used but just not as often. For example, instead of least square regression (L2 distance), you can also use L1 distance as measure of fit. I heard there has been some important new development on L1 measure in handling large number of predictors and small number of subjects.
|
|
|
|
|
Logged
|
|
|
|
|
psychle
|
 |
« Reply #6 on: October 25, 2006, 07:46:51 PM » |
|
In my field (ecology) the standard error is used much more than the standard deviation, as SE estimates the grand mean, not the population mean.
I appreciate your post, but I think the above sentence is misleading (unless we're using the same terms to refer to different concepts). SE is an estimate of the variability of the sampling distribution, not of a measure of central tendency like the grand mean. Moreover, you never need to estimate the grand mean because it's simply the mean of all your data points.
|
|
|
|
|
Logged
|
|
|
|
flyguy
I can't believe they let me be a
Senior member
   
Posts: 548
Proving once again quantity rules over quality
|
 |
« Reply #7 on: October 25, 2006, 09:16:14 PM » |
|
In my field (ecology) the standard error is used much more than the standard deviation, as SE estimates the grand mean, not the population mean.
I appreciate your post, but I think the above sentence is misleading (unless we're using the same terms to refer to different concepts). SE is an estimate of the variability of the sampling distribution, not of a measure of central tendency like the grand mean. Moreover, you never need to estimate the grand mean because it's simply the mean of all your data points. Unqualified SE refers to standard error of the mean, whereare standard deviation would refer to the items within a population or sample. SE's are usually obtained from a single sample, not from repeated sampling to create a frequency distribution (they way we use them in most ecological work). I don't agree with your last sentence. It's my understanding that the grand or parametric mean is unknowable, and that's why we need parameters (i.e., parametric stats) to estimate it. We can know a sample mean, and assume it is reflective of the parametric mean.
|
|
|
|
« Last Edit: October 25, 2006, 09:17:00 PM by flyguy »
|
Logged
|
"I don't accessorize. I'm Howard Moon. There's a simple truth to me." Howard Moon
|
|
|
acrimone
The Red Queen's Court Assassin
Distinguished Senior Member
    
Posts: 4,049
I am not a professor at all, despite what I say.
|
 |
« Reply #8 on: October 25, 2006, 10:41:52 PM » |
|
I'm a philosopher, and I've never taken a stats course in my life. But that said, I'm pretty good with Algebra and I'm familiar with the formulas so I sat down with a pen and paper and here's what I could figure out. Apologies for not knowing the proper ASCII notation... I'm not a math guy, but if you know the forumlae you should be able to follow.
Both SD and AAD can be broken down essentially into a two-part operation. I think this is the key, because you want the two equations to "look" as similar as they possibly can.
Assume: x=number of objects in the set v=mean variance Vn= V sub n
SD breaks down to:
rad(1/x) * rad((sum n=1-->x) Vn^2)
AAD breaks down to:
1/x * ((sum n=1-->x) rad(Vn^2))
I can introduce the radical into the AAD equation because (unless I'm horribly misremembering eighth grade) taking the absolute value of something is the same as taking the root of its square. Now the two equations look really, really similar.
Now, one way to look at this is to see the expression ((sum n=1-->x) Vn^2) ("the right side") as representative of the strength of the mean variance in determining the ultimate outcome, and seeing 1/x ("the left side") as representing the strength of the sample size in determining the outcome.
As the sample size increases (i.e. as X goes up) the proportionate share of AAD that is determined by ((sum n=1-->x) Vn^2) goes up immensely (because 1/x shrinks dramatically). In other words, as sample size goes up, your value will be more sensitive to small changes in average variance (compared to SD).
As the sample size decreases (i.e. as X goes down) the same thing happens with the rad(1/x) portion of the SD equation. The bigger X gets, the larger the resultant fraction is. (Indeed, saying rad(1/x) is essentially the same as saying 1/rad(x), and since the denominator is shrinking, the value is going up). So really large sample sizes are going to downplay the importance of outlying variance (relative to AAD), while smaller sample sizes will exacerbate it.
Looking at v instead, we see the following differences in the left side of those formulae:
AAD ends up looking like this:
rad(V1^2) + rad(V2^2) + rad(V3^2) + rad(V4^2) + rad(V5^2)
While SD ends up looking like this:
rad (V1^2 + V2^2 + V3^2 + V4^2 + V5^2)
Or, written much more simply...
AAD depends on rad(x) + rad(y) + rad(z)
SD depends on rad (x+y+z)
So in SD, the bigger your outliers, the more those outliers "swallow" up the other variance values. If z is really huge, then the values of x and y are going to get "lost" in the sum, as their values will make less of a difference in pushing up the root of the total. Where the three values are rooted separately, every value makes its own contributions. In concrete terms, if your values are 1, 2, 3, 4, and 10,000, the addition of 1+2+3+4 (i.e. 10) to the 10,000 isn't going to budge its root (100) much at all. In fact, it's only going to be 100.04 -- a difference of .04, that's it. But if you are treating them separately, you get to add 1 + rad(2) + rad(3) + 2 to 100, which makes the sum 106.14 instead of 100.04.
Now of course you have to add the "left side" of the equation back into things...
so AAD takes 25% of 106.14, giving 26.535
and SD takes one half of 100.4, or 50.2
So, in addition to the sample size issues mentioned above (which egregiously affect the concrete example I gave immediately above), that is the mathematical reason that SD is more susceptible to outliers.
OK, that took me like 35 minutes... that's the longest I've ever spent on a post here. I'm done for the evening. Feel free to check my work and call me a dumbass... I'm WAAAAY out of my field here.
|
|
|
|
|
Logged
|
"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?"
|
|
|
acrimone
The Red Queen's Court Assassin
Distinguished Senior Member
    
Posts: 4,049
I am not a professor at all, despite what I say.
|
 |
« Reply #9 on: October 26, 2006, 11:34:34 AM » |
|
Ack... I was just looking at this again and raelized I was using "4" as the value of X in the concrete example, when it should have been five. So instead of taking 25% and 50%, it should be something closer to 20% and 44%.
|
|
|
|
|
Logged
|
"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?"
|
|
|
|
psychle
|
 |
« Reply #10 on: October 26, 2006, 05:40:36 PM » |
|
Moreover, you never need to estimate the grand mean because it's simply the mean of all your data points.
I don't agree with your last sentence. It's my understanding that the grand or parametric mean is unknowable, and that's why we need parameters (i.e., parametric stats) to estimate it. We can know a sample mean, and assume it is reflective of the parametric mean. We seem to be using the same term (grand mean) for different concepts. I agree that the parametric mean is unknown and we're interested in estimating it, but I consider the parametric mean to be the same thing as the population mean, not the grand mean. My understanding of the grand mean is as I described, which is perhaps more easily understood with reference to the linear composition of sample scores in an ANOVA: score = grand mean + (level mean - grand mean) + (score - level mean) If you move the lone "grand mean" to the other side, square each term, and summate: sum(score - grand mean)^2 = sum(level mean - grand mean)^2 + sum(score - level mean)^2 which represents: SS_total = SS_between-groups + SS_within-groups (SS = sum of squares) That's how I understand things. Of course, if you have only a single factor with a single level, then the grand mean equals the sample mean. I guess the meaning of statistical terms differs across fields. For example, "parametric stats" seems odd to me, because my understanding is that a parameter refers to something about the population and a statistic refers to something about the sample, such that a statistic is used to estimate a parameter (i.e., the sample is used to learn something about the population). (Apologies to the OP for this digression)
|
|
|
|
|
Logged
|
|
|
|
|
adhoc
|
 |
« Reply #11 on: October 26, 2006, 09:13:48 PM » |
|
I'm a philosopher, and I've never taken a stats course in my life. ... Apologies for not knowing the proper ASCII notation... I'm not a math guy... taking the absolute value of something is the same as taking the root of its square. I'm not quite sure where to start, in part because I'm not sure what you are trying to say. I realize that I have selected only a small part of your post on which to comment but ... 1) Your post includes no ASCII notation (which has nothing to do with math or statistics, anyway.). So it is unclear why you are apologizing. 2) The absolute value of a number is its distance from 0. In other words, for example, -5 and 5 both have an absolute value of 5. 3) Taking "the root" of a number has no real meaning. You must specify what root it is that is to be taken. For example, the square root, the cube root, the 25th root, and so on. In this case, you want the square root.
|
|
|
|
|
Logged
|
|
|
|
acrimone
The Red Queen's Court Assassin
Distinguished Senior Member
    
Posts: 4,049
I am not a professor at all, despite what I say.
|
 |
« Reply #12 on: October 26, 2006, 10:28:40 PM » |
|
1) There are standard conventions for representing mathematical concepts in ASCII code. I know they exist, but I do not know what they are.
2) No sh*t, sherlock. But the (square) root of the square of a number IS the absolute value... unless, as I said, I'm somehow wrong about that.
3) Because none of the equations we're dealing with is more than a second degree equation, square roots are the only ones that matter.
Thanks for the constructive input.
|
|
|
|
|
Logged
|
"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?"
|
|
|
|
adhoc
|
 |
« Reply #13 on: October 27, 2006, 08:59:31 AM » |
|
Acrimone,
1) I guess I understand what you are are trying to do now. ASCII does not include math symbols, in general, but the extended code set does often represent graphics symbols, including the radical sign. If that is what you mean and you are trying to incorporate that in your posting, try code 251. If it works, I'd appreciate it if you would share how you do that.
2) You are wrong about this. The square root of a number is not its absolute value (unless the number in question is 1). For example, as I said before, the absolute values of 5 and -5 are both 5. The square root of 5, on the other hand, is approximately 2.236 and the square root of -5 is 2.236i, where i is the square root of -1.
3) While I agree that the meaning of your use of "root" was fairly clear, it is not the case that square roots apply only to second degree equations, or vice versa.
|
|
|
|
|
Logged
|
|
|
|
acrimone
The Red Queen's Court Assassin
Distinguished Senior Member
    
Posts: 4,049
I am not a professor at all, despite what I say.
|
 |
« Reply #14 on: October 27, 2006, 09:13:00 AM » |
|
2) No sh*t, sherlock. But the (square) root of the square of a number IS the absolute value... unless, as I said, I'm somehow wrong about that.
Acrimone,
* * * *
2) You are wrong about this. The square root of a number is not its absolute value (unless the number in question is 1). For example, as I said before, the absolute values of 5 and -5 are both 5. The square root of 5, on the other hand, is approximately 2.236 and the square root of -5 is 2.236i, where i is the square root of -1.
Why do I feel like Sigourney Weaver in that disciplinary hearing at the beginning of Aliens? Any edvidence of this hostile organism on LV-426?No it's a rock, no indigenous life.Did IQ's just drop sharply while I was away? Ma'am I already said it was not indigenous, it was a derelict, it was an alien spacecraft, it was not from there, do you get it? We homed in on it's beacon.
|
|
|
|
|
Logged
|
"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?"
|
|
|
|