Six Sigma Quality Resources for Healthcare In association withGE Medical Systems
 Main Site > Healthcare Channel > Discussion Forum Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
2008 Version! DMAIC Training Slides: 1,176 Slides + Instructor Notes and More for $99.95
iSixSigma Magazine Signup
 iSixSigma Live!  
  Summit & Awards
  Most Successful Start-up
  Breakthrough Projects
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  iSixSigma Main
  Europe
  Financial Services
  Military
  Software / IT
 Quality Directory 
  Recent Articles
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Sample Size... Why 30?

Bookmark This Page Bookmark This Page
Email This Page Email This Page
Format for Printing Format for Printing
Cite This Article Cite This Article
Submit an Article Submit an Article
Six Sigma Article Archive Read More Articles
Message: 16002
Posted by: DT
Posted on: Thursday, 18th July 2002


We recently concluded a GB training and the question of why a sample size of 30 was suitable and where it came from.  (What if my population was less than this or was destructive in nature?)  Im lost as there is conflicting data.  One reference says determining the sample size depends on (1) the level of confidence (2) margin of error tolerated, and (3) variability in population studied.  Another says...

n=(z*s/E)squared. These do not take into consideration the population do they?  If I am faced with a transactional example and had 100000 accounts (population) that were in default and wanted to sample it to determine how many had credit scores less than 615 what sample size would be reflective of the population without offsetting costs by time spent gathering data. 

When i pose this question (thinking 30 may not reflect my population) I remain unsatisfied.  Im told that its due to large samples related to the central limit theorm or also that a typical run chart that is in control is stable after 30 data points and thats why 30 is used.  Cant find this in any reference materials...can anyone help??  Please provide an example if needed...I need laymans terms! as I am not too statistically inclined.


Message: 16005
Posted by: Sambuddha
Posted on: Thursday, 18th July 2002

DT,

You ask a very good question. There are various responses based on situation, tool you are using, type of data.

One reference says determining the sample size depends on (1) the level of confidence (2) margin of error tolerated, and (3) variability in population studied. 

The above reference is right in general. Formulae for sample size calculation vary depending on the test you are going to conduct.

The parametsrs/issues you need to address are:

  1. Type of test e.g. 2 sample T, Z, ANOVA etc
  2. Standard Deviation (variability) of the process
  3. Delta that is significant in distinguishing 2 or more effects
  4. Alpha (level of significance of the test)
  5. Power of the test (1-beta). Beta is the probability of type-II error
  6. Number of levels (ANOVA). In case you know how many levels/effects are you aiming to distinguish.
  7. Sample size

The interesting part is that in Minitab (assuming you would use it) allows you to vary any 2 parameters from Delta, Power and Sample size for any given number of levels. Try Stat>Power and Sample Size>ANOVA or the tool you want to use.

That lets you know the error (or lack there of, since you are measuring power) associated with each sample size and delta for any given setting. So you could make a trade-off study and see where your sweet spot lies. In cases where testing involves capital & consumables, this is a great tool. In your case you have the data. So it is not resource intensive that way. Still this is better than using 30 blindly.

There is a reason 30 is widely used. It is a result of simulation studies involving the Central Limit Theorem. If you are interested in the "histroy"or reason for prevalence of 30 samples as aguidance, i could give you  a few pointers.

I have a project that is similar in tool usage. There are quite a few neat things one could do with power and sample size studies.

Good luck,

Sambuddha 


Message: 16006
Posted by: DT
Posted on: Thursday, 18th July 2002

I would be VERY interested in the pointers for history or relevance of the 30 to samples...if its easier to email...

darrell_tomlinbcgroup.com


Message: 16007
Posted by: Sambuddha
Posted on: Thursday, 18th July 2002

DT,

Check your email. I have sent some information.

Hope that helps.

Best,

Sambuddha


Message: 16009
Posted by: Hrishi
Posted on: Thursday, 18th July 2002

Sambuddha:

Hi. I am very new to this forum.

Can you send me the information that you sent to DT on why a sample size of 30 is required ? I too am curious.

Thanks.


Message: 16010
Posted by: Sambuddha
Posted on: Thursday, 18th July 2002

Hrishi,

No problem. Post your email address. Either me or DT could email you.

The reason, I cannot post it here is that it is a scanned picture attachment and it is easier perhaps to email.

Best and welcome to this community,

Sam


Message: 16021
Posted by: Gabriel
Posted on: Friday, 19th July 2002

Sambuddha

You can attach it here and share it with all the forum. It would be great!

Just click on the clip here at the right, where you read "Post/attach document". It will lead you to send an email to iSixSigma with the attachment and they will post the attachment here!

Thanks for sharing!


Message: 16022
Posted by: DD
Posted on: Friday, 19th July 2002

Sambuddha

Yes as Gabriel says you can post it on his site. I am curious too.

Thanks for sharing

DD


Message: 16025
Posted by: Sambuddha
Posted on: Friday, 19th July 2002

Gabriel, DD

I thought of posting it here. Attaching was a small hassle. But looks like I have a bigger problem. It is a scanned picture of some graphs. And mea culpa, I cannot find the reference from where I took that from. I am buried amidst a bunch of books and I can't find it.

I have no problem sharing it with you all individually through email. But I am afraid if I post it in a public manner without credits, I might be in trouble for copyright violation for public distribution of intellectual property.

The good news is that the following website illustrates the same thing.

http://http://www.statisticalengineering.com/central_limit_theorem.htm

Public domain is great, isn't it? Hopefully that will satisfy your curiosity.The number 30 came as a result of simple sampling simulations from different parent populations (Uniform, Normal, Exponential, Triangular) and by the time the sample sizes reached 30-32, the distribution of the means started looking normal. That is the reason for the rule-of-thumb.

I haven't seen any theoretical explanation yet for that i.e. what is so special about 30 from analytical point of view. Shall let you all know if I come across anything to that effect.

Hope it still helps.

Best,

Sambuddha


Message: 16047
Posted by: Aush
Posted on: Friday, 19th July 2002

Sambudhha

Can I also share the info on sample size of 30. I will appriciate you emailing me the same at

piyush_ahotmail.com


Message: 16060
Posted by: Rajanga Sivakumar
Posted on: Saturday, 20th July 2002

Mr. Sambudhha,

Could you share the sample size 30 with me too? Thanks

email to rajangasify.com

Rajanga


Message: 16062
Posted by: Ted
Posted on: Saturday, 20th July 2002

Making the assumption that even with the ability of most software to sort and count the numbers out of your population of 100,000 records you wish to sample, there are two questions you have to ask. How many do I take and what risk can I assume in making the wrong assumption from the statistic.

A number of answers here address why 30 samples are needed to approximate a normal distribution allowing for the estimation based on the probabilities of the normal curve. However, once the mean and std dev have been estimated, and the cumulative probability found up to and including the critical limit you set. The second question comes into play, specifically how sensitive are you to making an error in assigning that proportion to your population.

As an example, say 6% of your population was expected to fall below your cut off, how sensitive are you that the true proportion isn’t 7% or 8% or 10% etc. You would need to calculate the Beta risk of assuming the proportion at 6% given your original sample size and the statistics you calculated. If you utilize minitab (or other software perhaps) you can adjust the minimum sample size you need to take for the risk you choose. Under the power and sample size tab – 1 proportion test, you can enter both the calculated proportion (as a percentage) and the critical proportion, along with the level of risk (beta) and it will calculate the number of sample you need to take. Go back, resample to that level and run the calculation again to find the proportion defective (credit levels below your cut off), and rerun the beta again with the new numbers. The process is iterative until you are satisfied with number of samples vs the risk you are willing to assume. Therefore you might start out with a sample of 30, find that the beta risk is too high and have to take 400 samples, do so and recalculate and find that you actually need 435 etc etc. Others here might have a better way to adjust for risk and sample size without all the iterations but that’s the only way I’ve found to consistently do it.

My other question for you however is what do you plan on using the data for. Be careful if the intent is to show that you get higher numbers of defaults with credit scores below a certain number using those accounts in default as your population for the hypothesis. Your choice of frame for the population would be wrong in that kind of test.

hope that helps.


Message: 16066
Posted by: Jay
Posted on: Saturday, 20th July 2002

Hi Sambuddha,

Could you also share the sample size of 30 info with me as well?  Please email morwickjaol.com.  Thanks!

Jay


Message: 16080
Posted by: Lawrence
Posted on: Monday, 22nd July 2002

Dear Sambuddha,

I was only reading about this discussion topic from the newsletter link today, could you also send me in a separate email on the 30 sample size information as well? appreciate it.

Best regards,

Lawrence.


Message: 16082
Posted by: Glenn Gooding
Posted on: Monday, 22nd July 2002

Sam,

Along with a great many of our colleagues, I would be interested and grateful if you could let me have a copy of the information about the rationale of the 30pc smaple size.

My e-mail address is: -

glenn.goodingbespak.co.uk

regards

Glenn


Message: 16084
Posted by: Ja
Posted on: Monday, 22nd July 2002

Dear Sambuddha,

Could you also send me in a separate email on the 30 sample size information as well? appreciate it. jamackallstatestreet.com

Best regards,

JA.


Message: 16087
Posted by: Nicholas L. Squeglia
Posted on: Monday, 22nd July 2002

In layman's terms, if you were to prepare graph on the basis of attribute data  letting sample size vs confidence, you will see that there is quite a difference from, for example, 2 to 30. This is not the optimum, but more of a minimum. 50 would be perhaps a better choice and is what Dorian Shainan used in his "lot plot" many years ago. Th slope of the curve increases after 30/50, but at a much lower rate.The central limit theorem is somewhat different, and relys on taking averages of data to show a normal, gaussian, distribution for control chart purposes although the underlying data is non-normal

Nicholas L. Squeglia, author, Zero Acceptance Number (c=0) Sampling Plans


Message: 16088
Posted by: CBetts
Posted on: Monday, 22nd July 2002

Sambuddha,

Could you please forward the Sample size information to me as well.  cedric.bettsscotts.com   Thank you.

Cedric


Message: 16099
Posted by: Janet Hunter
Posted on: Monday, 22nd July 2002

I believe you are correct about the Central Limit Theorem, at least, as I recall from my statistics classes a few years ago. You may want to contact the local college and speak to one of the professors in the mathmatics department for further direction or confirmation.


Message: 16102
Posted by: Mike Carnell
Posted on: Monday, 22nd July 2002

DT,

I have not read the entire string so if some of this is redundant I apologize. Sam gave a good answer whaen he said it was different for different situations.

Assuming that everything works off of 30 is incorrect.

Frequently you will see variable control charts listed as a sample size of 25-30. They typicaly are speaking of a sample size of 25-30 groups of 5. That makes it 125-150 actual samples. It is the subgrouping giving you a distribution of averages (Central Limit Theorem) that makes it work.

When you are doing hypothesis testing and using ANOVA the sensitivity of the test is extremely dependent on sample size. I was doing site support and found a guy who could not understand why his 2 sample t test was showing significance. He was sure it should not. His sample size was >400. The test was sensitive to < a .1 sigma shift.

There are sample size implications with virtually every tool. 30 is not a catch-all particularly if you are working with attribute data.

Good luck.


Message: 16106
Posted by: Antero
Posted on: Monday, 22nd July 2002

Sam:

Can also send me your e-mail response to the question dealing with a sample size of 30?


Message: 16121
Posted by: Ron
Posted on: Monday, 22nd July 2002

Don't confuse population sampling with process sampling these are two very different animals.

You need population variation, power etc when considering population sampling.

When process sampling the purpose of taking 30 samples is to establish with reasonable certainty these issues and develop control limits. these limits remain constant unless significant changes are made to the process.

It is common in SS training that the true essence of what is meant by the mathematics behind these issues are lost.

Processs sampling also assumes a process that is in statistical control. If it is not stop fix the process then proceed.

 


Message: 16185
Posted by: Bahram
Posted on: Tuesday, 23rd July 2002

Dear Sambuddha,

Could I trouble you to send me the information also?

bahram.khyltashmkg.com

Thanks

Bahram


Message: 16187
Posted by: Dewayne
Posted on: Tuesday, 23rd July 2002

Sambuddha,

I, too, would appreciate your sharing/sending the information on the sample size of 30. Thanks.

dburnsverityinst.com


Message: 16339
Posted by: NB
Posted on: Friday, 26th July 2002

I have also not read all previous replys to this subject, but from my experience in statistics the reason why 30 or 31 has always been the magical number is because the students t distribution approaches the normal z distribution at 30 samples. Hope this helps.

-NB


Message: 16360
Posted by: H.Kirchhausen
Posted on: Saturday, 27th July 2002

Hi add all,

it will be fine if you can send me also information or an example of the magic samplesize of 30!

Thanks in advanced

send it please to

saolimgmx.de


Message: 16373
Posted by: sw
Posted on: Sunday, 28th July 2002

Hello Sambuddha,

Appreciate if you can email the 30sample size info to me, too.

My email add is: swtan1hotmail.com


Message: 16443
Posted by: Ged Bryant
Posted on: Tuesday, 30th July 2002

Want to test magic number 30. Take any group of people, good party trick. Bet any one present that two or more of the group will have the same birthday. Month and date. This has 98% confidence.


Message: 16444
Posted by: Brian
Posted on: Tuesday, 30th July 2002

I did that game in a training class once. It worked!

But how does it work? I'd love to know so I can sound intelligent next time I do it :).


Message: 17051
Posted by: julio jaime
Posted on: Thursday, 15th August 2002

Sambuddha:

Hi. I am very new to this forum.

Can you send me the information that on why a sample size of 30 is required ? I too am curious.

 

Thks.


Message: 31121
Posted by: Allen Jacque
Posted on: Wednesday, 6th August 2003

I am interested in receiving the articles identified and originated from Sambuddah that address the sample size of 30 issue.

My email address is ajacquebkadvice.com


Message: 31245
Posted by: Mark Chockalingam
Posted on: Sunday, 10th August 2003

There are several web references on the Central Limit theorem and 30 that are interesting.  When the sample size approaches 30, we don't have to worry about the distribution of the population since it can be safely assumed to be normal for inference purposes.  Here are some references:

http://www.mathwizz.com/statistics/help/help4.htm

http://www.statisticalengineering.com/central_limit_theorem.htm

Here is a little more technical article on normal distributions and central limit theorem.

http://www.itl.nist.gov/div898/handbook/index.htm


Message: 31250
Posted by: thanachai
Posted on: Sunday, 10th August 2003

Mr. Sambuddha.

May you please send me the pointer of sample size of 30, I'm very curious to know.

Thanachai S,

thanachssamarts.com


Message: 31357
Posted by: Statistician
Posted on: Wednesday, 13th August 2003

Mr. Sambuddha,

I am a statistician by profession and as far as I know, sample size is determined by margin of error allowed, the estimate for the population variance, the risk factors (level of confidence, power as a function of the OCC, etc.), and most importantly, the assumed distribution of the population (or the estimable function) in study.

In my own experience, the magic number 30 is being used to approximate the normal distribution using the Central Limit Theorem, as used in regression analysis, factor analysis, etc., but not in sample size determination. 

I am also curios with this article. Would you be kind enough to send me a copy, too? Also, would you happen to know/ recommend six sigma training centers in the Philippines?

Thanks,

Beryl

CRomerodoleasia.com


Message: 33249
Posted by: Vicki
Posted on: Tuesday, 23rd September 2003

Hi Sambuddha,

You have been inundated with requests for this information on n=30, I am also a statistician and would really appreciate this information!  Thanks.


Message: 33253
Posted by: pH
Posted on: Tuesday, 23rd September 2003

I WOULD ALSO LIKE TO RECEVE THE INFROMATION / POINTERS.  CAN YOU EMAIL THEM TO ME AT THE ADDRESS BELOW,

herrerapsybrondental.com

Thanks!

PH


Message: 33367
Posted by: Mark
Posted on: Thursday, 25th September 2003

I have had the same question arise in the past. Mark L. Crossley wrote a good article titled "Size Matters: How Good Is Your Cpk Really?" located at http://www.qualitydigest.com/may00/html/lastword.html that seems to address your question quite well. When Mr. Crossley's equations are rearranged you can look at plots of Cpk vs sample size with various lines of constant Cpk and specific confidence intervals. For example, after generating the curves, one is able to directly determine sample size required for a desired Cpk of 2.0 with 90% confidence. In playing with the equations it was interesting to note the confidence level obtained for a 2.0 Cpk using a common sample size of 30.

I hope that helps.


Message: 33859
Posted by: Tom
Posted on: Sunday, 5th October 2003

I would also like to recieve the information/ pointers about the sample size of 30. My email adress is feterishotmail.com

thanks!

Tom


Message: 34213
Posted by: Stella
Posted on: Monday, 13th October 2003

Hi Sam, add me to the distribution list please! It is last for too long time, isn't it? Thank you inadvance!

huang.zhaohuizte.com.cn


Message: 37014
Posted by: Haim
Posted on: Tuesday, 2nd December 2003

Dear Sambudda,

I too am interested in the "Why 30?" discussion.  Please e-mail me at:

haimthepalace.org

Thank you,

Haim


Message: 37017
Posted by: Rocky Firth
Posted on: Tuesday, 2nd December 2003

I would also like to see the information. I can post it to a web location for other as well.


Message: 37034
Posted by: Stan
Posted on: Tuesday, 2nd December 2003

I don't believe 30 came from simulations involving the CLT. Please post some backup to this assertion. Sounds like Dr. Mikel's proof of the 1.5 shift.

By the way, what advice do you give on choosing sample size when you are interested in reducing sigma instead of moving the mean?


Message: 47487
Posted by: satinder
Posted on: Monday, 7th June 2004

Please let me know why 30

SK


Message: 51173
Posted by: SATTHISH KUMAR
Posted on: Tuesday, 27th July 2004

Dear sambuddha

Thank you for your reply to that Query.Now i am in the interest to know the variables like slop,linearity,bias,and Uncertinity relation with Instrument Repeatiablity and reproduciablity.

CAn you   send the same in my mail id sathish_801980yhoo.com

Expecting your reply

REGARDS

R.L.SATTHISH KUMAR


Message: 52418
Posted by: Simon
Posted on: Wednesday, 11th August 2004

I would be VERY interested in the pointers for history or relevance of the 30 to samples


Message: 55057
Posted by: Ganesh
Posted on: Friday, 17th September 2004

Pls email me the info for "why sample = 30"


Message: 56050
Posted by: Surya Gade
Posted on: Thursday, 30th September 2004

Sambuddha:

Hi. I am very new to this forum.

Can you please send me the pointers or the links that you sent to DT on why a sample size of 30 is required? That's the same question I have for a long time.

Thanks.

Surya


Message: 56051
Posted by: Surya Gade
Posted on: Thursday, 30th September 2004

Sambudha:

I forgot to mention my e-mail in my previous message...please e-mail to the following address...lamarcardinalyahoo.com

Thank you for sharing.

Surya


Message: 56599
Posted by: Mark Chockalingam
Posted on: Friday, 8th October 2004

Surya,

There are several web references on the Central Limit theorem and sample size of 30 that are interesting.  When the sample size approaches 30, we don't have to worry about the distribution of the population since the it can be safely assumed to be normal for inference purposes. 

Remember for interval estimation, the standard error is computed from a Sampling distribution of the mean.  When the sample size approaches 30, the sampling distribution approaches normality.  Here are some references:

http://www.mathwizz.com/statistics/help/help4.htm

http://www.statisticalengineering.com/central_limit_theorem.htm

Here is a little more technical article on normal distributions and central limit theorem.

http://www.itl.nist.gov/div898/handbook/index.htm

Mark Chockalingam


Message: 56602
Posted by: Robert Butler
Posted on: Friday, 8th October 2004

 

Mark,

The central limit theorem applies to the mean not to individuals – 30 samples from a log normal distribution will not suddenly become normal.  The distribution of 30 averages of data from a log normal distribution, however, will be.  To this end, the first citation you mentioned (and as quoted below) is in error.  The second and third citations, however, are correct.  (Note: some of the text from your citations don't copy over to the forum page so I had to rewrite the equation in the first citation). I also took the liberty of highlighting in order to emphasize the focus on distributions of means and not individuals.

#1 The Central Limit Theorem says that if you have a random sample and the sample size is large enough (usually bigger than 30), then

45l41191195xe" o:preferrelative="t" o:spt="75" coordsize="21600,21600">0 1 0">1">2 1 2">3 21600 pixelWidth">3 21600 pixelHeight">0 0 1">6 1 2">7 21600 pixelWidth">8 21600 0">7 21600 pixelHeight">10 21600 0">Z = (sample avg - pop avg)/(s/sqrt(n))

where Z is the standard Normal distribution with m = 0 and s = 1. This comes in really handy when you haven't a clue what the distribution is or it is a distribution you're not used to working with like, for instance, the Gamma distribution.

 

#2 The distribution of an average tends to be Normal, even when the distribution from which the average is computed is decidedly non-Normal.

Thus, the Central Limit theorem is the foundation for many statistical procedures, including Quality Control Charts, because the distribution of the phenomenon under study does not have to be Normal because it's average will be

 

#3 The central limit theorem basically states that as the sample size (N) becomes large, the following occur:

  1. The sampling distribution of the mean becomes approximately normal regardless of the distribution of the original variable.
  2. The sampling distribution of the mean is centered at the population mean, , of the original variable. In addition, the standard deviation of the sampling distribution of the mean approaches .


Message: 56613
Posted by: Mark Chockalingam
Posted on: Friday, 8th October 2004

Rob,

Thanks for copying and pasting from the source.  However, I submit humbly, that it is in appropriate to quote the content without acknowledging the source.  I agree it reads easier in one page but still it is important to insert the name of the source.

Now as to your point on the error, I don't see it.  May be it is semantics.  The CLT is a statement on the sampling distribution of the mean NOT on the sample or the original population itself.  When the sample size approaches 30, the sampling distribution approaches normality regardless of the original distribution.

Now for intereval estimation, the big leap that is made in practice is to assume that the sample standard deviation is a sufficient estimate for the population standard deviation.  Is this what is not in agreement with you when the original citation on #1 gives the formula for the Standard normal deviate.

Good discussion.

thanks,

Mark


Message: 56941
Posted by: John C
Posted on: Wednesday, 13th October 2004

Hi Sambuddha,

Kindly mail me the same at  john.chandrawipro.com

Thanks,

John C


Message: 59456
Posted by: Paul C
Posted on: Sunday, 21st November 2004

Grateful if you could send on the background to the sample size of 30..

Thanks ..

PC


Message: 59487
Posted by: SemiMike
Posted on: Monday, 22nd November 2004

Those are great URL's for any beginner to study.

One might use some "rules of thumb" based on practical experience, as well as the more rigorous statistical methods.

For example, if the DATA is to be from a MECHANICAL process for making discrete parts, then one should first try sampling the FAMILIES of possible variation, using sample size 2 for each family, per Shainon's recommendations.  2 sites on each of 2 parts, repeated every hour for 2 shifts perhaps, then graphed.  Once it is clear which family of variation is the main problem, then SPC sampling (subgroups measured over time) can be used IF the problem is temporal.  But if the problem is variation WITHIN the parts, then perhaps closer stratification of data is needed, or measuring more sites per part, or conparing that variation for all similar machines, or looking at tool wear trends of this "within-part" spread over time.  Means and ranges both can drift with tool wear.  Sampling for mean data involves the famous sample size of 30 (or 15) for OOC determinations. Sampling for changes in variance require much larger samples.  So a wandering mean is not the same as a wandering variance.  Think 1000 parts. 

RULE OF THUMB for SPC chart startup is given as 15 to 30 but if the process is non-stationary (drifts, has wandering mean and unstable variance, for example) then other methods are needed.   Box and Luceno's book (Amazon.com) talks at great lenght about modern issues with process monitoring nad adjustment methods. Assessing non-subgrouped data is another issue.

RULE OF THUMB:  Individual Charts are less sensitive, less powerful, (they give more false alarms for each rule added, for example) than X-bar charts. 

RULE OF THUMB:  Subgroup size of 2 to 4 is common and usually adequate.  But for diagnostic reasons, many engineers use subgroups of 10 to 100 sites per part or parts per subgroup.  ASQ had a paper recently on effect of large subgroup sizes.  In general, it invalides the method used by most people to calculate control limits, as many of the subgroup sites or parts are CORRELATED and so the control limits would be wrong.  What is good for diagnostics is often not good for control, given various control models.  With automated gages, more data is cheap.  But how you use it depends on whether the data is really INDEPENDENT and RANDOMLY SAMPLED and IDENTICALLY DISTRIBUTED.  Most important is that INDEPENDENCE.  And if the data is AUTO_CORRELATED, its also messy (wandering mean, showing predictability instead of randomness). 

Then there is the data that comes from CHEMICAL processes, such as continuous refining.  Read Svante Wold or Dr. John McGregor's books on PCA/PLS multivariate methods for sensor-based data, which is HUGE stream of data.   Rule of Thumb:  Get help.

Central Limit Theorem:   Only for subgrouped data!!! 

Shewhart Charts:  Only for stationary processes where samples are independent!!!  (I am not a statistician, and those guys are still arguing about these issues.  See Journal of Quality Technology, Woodall's papers, for example. 

Don't forget;  NIST online handbook.  http://www.itl.nist.gov/div898/handbook/


Message: 62330
Posted by: Chelle
Posted on: Thursday, 13th January 2005

Hello Mr Sambuddha,

Can you also send me the article on the 30 sample size that you sent to DT? I am also interested to know why 30? My email add is, roadcrashhotmail.com

Thanks, Rechel


Message: 62335
Posted by: Kevin Alderson
Posted on: Thursday, 13th January 2005

Reference sample size 30, reasonable amount to measure / analyse.

approximate 10% margin of error between a sample of 30 to 500.

Of course it would be better to do 500 for accuracy but you must take into account the cost between 30 and 500 pending what you are measuring. Remember the std dev (10%) margin and you should be fine.


Message: 62368
Posted by: quality_ab
Posted on: Thursday, 13th January 2005

Could somebody please email the link to me at quality_abWARNING: Image embedded by poster. yahoo.com

Thanks,

AB


Message: 62879
Posted by: Glo
Posted on: Saturday, 22nd January 2005

Hi Mr. Sambudhha,

Could you also share the information about sample size 30 with me? I'm very much interested. Kindly email it to glo_blue10yahoo.com

Thanks,

--Glo


Message: 62888
Posted by: DrSeuss
Posted on: Saturday, 22nd January 2005

DT, let me try to answer this from a practical experience approach,

I have also asked this question and have never receive a definitive academic answer.  Here is what I have seen from analyzing real process data.  Take a continuous process and produces data that is normally distributed (near normal is also good enough) and collect your data using rational subgroups approach. Using the Minitab Six Sigma Process report to calculate short term and long process capability. Look at report #4 or #5, it shows both the Sigma ST & Sigma Lt on a graph.  Notice how their values stabilize toward a value at the number of subgroup increase.  You will notice a flating of the curves at about 10 subgroups, then around 20-25 subgroups the curves are almost horizontal.  By the time you reach 30 subgroups the sigmas have stabilized and adding anymore subgroups will only change the sigmas in the 4th or larger decimal places.  If you are an Excel wizard, you can demonstrate this very easily also.  The idea is that after about 30 subgroups (30 data points) the variance of the data typically stabilizes.   


Message: 69027
Posted by: leon
Posted on: Wednesday, 27th April 2005

Dear Sambuddha

could you  please send the reference to me, thanks very much

mzzhangvip.sina.com

Leon


Message: 70893
Posted by: vee
Posted on: Tuesday, 24th May 2005

Dear Sambuddha - -

I'm very interested in your projects.

Please send me mail.

Thank you very much.

Sincerely,

vee


Message: 70894
Posted by: vee
Posted on: Tuesday, 24th May 2005

from vee

my email  wsu_veeyahoo.com

thank again


Message: 75309
Posted by: DEEPAK JAIN
Posted on: Saturday, 23rd July 2005

Sambudha:

please also email me , because fm last few weeks  i am finiding the answer

please reply me on email deepak69748rediffmail.com

D.JAIN

9811564123


Message: 75348
Posted by: Manav
Posted on: Monday, 25th July 2005

Sambuddha/DT,

Please send me the information on sample size. I know this msg is 3 years too late, but would appreciate your or anyone's help in getting this info to me.

Thanks

manavbhalla1yahoo.co.in


Message: 79599
Posted by: Dave
Posted on: Friday, 16th September 2005

When the population size is greater than 100, the normality condition is met when the sample size is greater than 30.  Increase sample size depending on the process being studied and the variability of the data produced.  30 is not a "magic" number applicable to all data sets and processes.


Message: 79607
Posted by: Darth
Posted on: Friday, 16th September 2005

Dave, might I suggest that you check the dates on any post that you respond to.  This is a really old one.  OK, Nick, how did I do????


Message: 85803
Posted by: Rhex
Posted on: Sunday, 1st January 2006

hi sambuddha,

I know that the forum thread has been going on for quite sometime now and am not sure if you would be able to receive this message but I'm requesting and  hoping that you would be able to send the reference materials to me as well.

Here's my email add: rhexryanyahoo.com


Message: 86286
Posted by: Kulanan
Posted on: Tuesday, 10th January 2006

Dear Sambudha I am finding the answer about sampling size. Please kindly send me the information on why sample size = 30. Because I have to use this information for my report and if you have more information please reply me. Thank you very much.

Best regards, Kulanan

http://mailto: mpranggmail.com


Message: 95224
Posted by: Sue
Posted on: Friday, 9th June 2006

Hi Sambuddha,

I'm keen to know why 30 samples too? can you send me 1 copy too?

Email: cockatoos2000hotmail.com

sue


Message: 95233
Posted by: Heebeegeebee BB
Posted on: Friday, 9th June 2006

Sue,

This is a FOUR YEAR OLD thread.


Message: 95234
Posted by: Darth
Posted on: Friday, 9th June 2006

Heck, that trumps my measly 18 month one earlier this week. 


Message: 95235
Posted by: Mike Carnell
Posted on: Friday, 9th June 2006

Heebeegeebee,

...and unfortunately we have not see Sambuddah post on here for a couple years.

Regards


Message: 95243
Posted by: Heebeegeebee BB
Posted on: Friday, 9th June 2006

Yeah,

Whatever happened to Sambuddah???


Message: 96116
Posted by: Mahesh Kumar S
Posted on: Monday, 26th June 2006

Sambuddha:

Hi. I am very new to this forum.

Can you send me the information that you sent to DT on why a sample size of 30 is required ? I too am curious.

mahesh.kumar.sridharaccenture.com

Thanks.


Message: 96151
Posted by: Heebeegeebee BB
Posted on: Monday, 26th June 2006

Mahesh,

Sambuddah's last post under that nom de plume was 2002.

It is unlikely that you will get a rouse out a 4 year old thread.

We are still tied at 4 years folks!


Message: 99944
Posted by: Tan Li Ren
Posted on: Friday, 1st September 2006

Dear Sambuddha,

Could you also send me on the 30 sample size information as well? appreciate it. tanlirenhotmail.com

Best regards,
Li Ren


Message: 100438
Posted by: Edmond
Posted on: Sunday, 10th September 2006

Dear Mr Sambuddha,

I work in the research field and have immense interest in knowing more about the sample size of 30, would you also send me articles and reference materials on this topic by e-mail at:

edmondhhfungsinaman.com

Many thanks for your sharing.

Best regards,

Edmond


Message: 100439
Posted by: Andy U
Posted on: Sunday, 10th September 2006

DT,

I first came across the n = 30 rule of thumb duing a lecture by Dorian Shainin (1983). Dorian was brought to Scotland by someone called Ted Williams, who was instrumental in bring Dorian to Motorola in Phoenix some years before.

According to Dorian, if you plot the error of the estimate of sigma as a function of n, the curve becomes asymptopic at around n = 30, where it can be estimate with a 95% confidence for n = 30. As Stan has previously pointed out Dorian always used a 95% confidence.

As Mike Carnell has also noted, typcial X-bar and R charts use 30 subgroups of n = 3 or n= 5, which is a sample size of 90 or 150 - a far cry from n = 30.

Another issue lost on many is the use of multiple subgroups,which provide a pessimistic estimate of sigma, since both the data and the subgroup mean vary in small subgroups; so the entropy of a mulitple subgroups is larger than a single subgroup.

No one in their right mind would estimate process capability based on a single subgroup of n = 30.

Regards,

Andy


Message: 100441
Posted by: Hans
Posted on: Sunday, 10th September 2006

DT,

Avoid all of the complications of interpretations of interpretations and opinions of interpretations and interpretations of opinions of interpretations and review Gosset's 1908 article in Biometrika: "On the probable error of the mean". From there you can make your own informed judgement about how other statisticians incorporated and adapted his work into theirs. What is it that they say in lean: Go see for yourself :-). Regards.


Message: 107031
Posted by: Nitesh
Posted on: Monday, 27th November 2006

Hi Sambuddha,

Please email me the information on the theory and history behind sample size being 30

niteshrungtagmail.com

Thanks,

Nitesh


Message: 107074
Posted by: steve
Posted on: Monday, 27th November 2006

It is very simple.  Harry picked 30 because it gave 1.5 in his 2003 attempt.  Other numbers give anything between 0 and 50+ for his "correction".


Message: 112384
Posted by: Shon Stewart
Posted on: Wednesday, 14th February 2007

Please forward me information about the history on a sample size of 30 as a rule of thumb.  Your help will be very appreciated.


Message: 112385
Posted by: Confusion about 2 papers
Posted on: Wednesday, 14th February 2007

n = 25 has a truly statistical justification. At n = 25 the Law of Large numbers will start to show a pronounced symmetric/normal distribution of the sample means around the population mean. This normal distribution becomes more pronounced as n is increased.
 
n = 30 comes from a quote from Student (Gosset) in a 1908 paper "On the probable error of a Correlation" in Biometrika. In this paper he reviews the error associated with drawing two independent samples from infinitely large population and their correlation (not the individual errors of each sample relative to the sample mean and the population mean!). The text reviews different corrections to the correlation coefficient given various forms of the joint distribution. In a few sentences, Student says that at n = 30 (which is his own experience) the correction factors don't make a big difference. Later, Fisher showed that the sample for a correlation needs to be determined based on a z-transformation of the correlation. So, Student's argument is only interesting historically. Also, Student wrote his introduction of the t-test in Biometrika during the same year (his prior article). Historically, the n = 30 discussed in his correlation paper has been confused with the t-test paper, which only introduced the t-statistic up to sample size 10.
 
In sum, the n = 30 is a rule of thumb that accidentally works. But ironically the n = 30 for sampling from population was confused with the n = 30 observation from correlations.


Message: 117716
Posted by: Pramod Thomas John
Posted on: Tuesday, 8th May 2007

Dear Sambudda,

Could you please mail this information to me (pointers for choosing sample size as 30). I recently had an interview when this question was asked and I drew a blank.

Thank you in advance.

Cheers

Pramod


Message: 125280
Posted by: Phillip
Posted on: Tuesday, 4th September 2007

Hi Sam or anyone has gotten his information about why minimum sample size is 30, can you pls forward it to me?

Thanks,

Phillip

pcdwanghotmail.com


Message: 125281
Posted by: Trev
Posted on: Tuesday, 4th September 2007

Did you read any of this thread?

See: http://healthcare.isixsigma.com/forum/showmessage.asp?messageID=112385


Message: 126962
Posted by: aparna
Posted on: Thursday, 27th September 2007

hi sambuddha,

can u e mail me that presentation as well at aparnattrediffmail.com


Message: 127512
Posted by: Robin
Posted on: Monday, 1st October 2007

My background is in mathematical statistics but several years past, and I tried to read through this thread which appears to be it is quite complicated, but the actual question does not seem to be answered.  I am also new to 6 sigma, but did a google on 30/sample size and this appeared to have a good discussion so I will rephrase the original question and give my opinion:  that question is:

1.  Is 30 some magic number that can be used for an adequate sample size for "most" purposes"?

I recognize that in use, there are almost always assumptions about the underlying distributions and parameters, but the calculation of power and sample size are well worked out.  I can understand assumptions that the distribution of the mean for sample size of 30 should "look fairly normal for most distributions", but the power/sample size strongly depends on the underlying variance as well as other variables that the field has not seemed to define. I suppose that we can assume that with a sample size of 30, we have a sample distribution that is normal with the population mean as the mean and the population variance divided by 30 as the sample variance.  We assume that the allowable power is .8 (why not .9 or .95?), and that the allowable difference between the true mean and estimated mean is x% of the population standard deviation.  (again arbitrary), then with all these assumptions and the right x%, perhaps a sample size of 30 might arise as a reasonable sample size.  However, we usually are more concerned with the absolute error between the sample mean and population mean which would completely negate the possibility that there is any unique N that could satisfy an adequate sample size since population variances have no bounds that I know of.  If however, there is some consensus that we are making all these assumptions it should be spelled out. 

Reading several of the comments, I contend that the number 30 is just some number that has nestled into the literature without any true mathematical/statistical verification.   Its small enough to be practical to do, but is an arbitrary number without true mathematical signficance.  What bothers me is that if we are talking about 6 sigma, it would appear that to accept 30 as a magic number for sample size rather than using the standard known statistical procedures to estimate the proper sample size is anathema to the underlying concept of precision which I am assuming that 6 sigma represents.


Message: 127516
Posted by: Historiography
Posted on: Tuesday, 2nd October 2007

I posted this response earlier. It is based on a review of Fisher's early work.

Overall, rules of thumb were heavily introduced into statistics when it became commericialized and therefore entered the engineering field. The rules of thumb regarding estimation of the parameter is only one example where classical statisticians gave up and gave way to the more pragmatically oriented statisticians. The histor of the magical numbers 22, 25 and 30 are replicated below. But other rules of thumb emerged to make the science more usable.

n=22 was proposed by Fisher in Statistical Mehthod, p. 44, when he reviewed the impact of the the exeeding of the standard deviation once in evey three trials. Twice the standard deviation is exceeded in about 22 trials "For p-value = 0.05, or 1 in 20 and 1.96 or nearly 2; it is convenient to take the point as a limit in judging whether a deviation is to be condisered dignificant or not. Deviations exceeding twice the standard deviation are thus formally regarded as signif8icant. Using this criterion we should be led to follow up a false indication only once in 22 trials even if the statsitics were the only guide. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but lowering of the standard of signficicance meet this difficulty.

n = 25 has a truly statistical justification. At n = 25 the Law of Large numbers will start to show a pronounced symmetric/normal distribution of the sample means around the population mean. This normal distribution becomes more pronounced as n is increased.
 
n = 30 comes from a quote from Student (Gosset) in a 1908 paper "On the probable error of a Correlation" in Biometrika. In this paper he reviews the error associated with drawing of two independent samples from infinitely large population and their correlation (not the individual errors of each sample relative to the sample mean and the population mean!). The text reviews different corrections to the correlation coefficient given various forms of the joint distribution. In a few sentences, Student says that at n = 30 (which is his own experience) the correction factors don't make a big difference. Later, Fisher showed that the sample for a correlation needs to be determined based on a z-transformation of the correlation. So, Student's argument is only interesting historically. Also, Student wrote his introduction of the t-test in Biometrika during the same year (his prior article). Historically, the n = 30 discussed in his correlation paper has been confused with the t-test paper, which only introduced the t-statistic up to sample size 10.
 
In sum, the n = 30 is a rule of thumb that accidentally works. But ironically the n = 30 for sampling from population was confused with the n = 30 observation from correlations.
 
So, to your point, yes there are historical reasons but the true reasons are the need for Statistics to establish itself as a useful field. Now, rules of thumb have taken over the crticial thinking about statistics. Six Sigma accelerated this movement.  


Message: 127517
Posted by: Grasshopper
Posted on: Tuesday, 2nd October 2007

Arn't you clever...oh yes you are...now reread your post and update with some additional builds to support your argument.

Grasshopper


Message: 127518
Posted by: Statistician
Posted on: Tuesday, 2nd October 2007

you're making progress, you can actually read now. great accomplishment!


Message: 128047
Posted by: Robin
Posted on: Sunday, 7th October 2007

So does the number 30 have significance in the use of an sample for an arbitrary population or is it just a number that "seems" to work because no one has actually tested it.


Message: 128050
Posted by: Vincent
Posted on: Sunday, 7th October 2007

Hello,
I would be interested to have a better understanding of how this sample size issue relate to SPC charts.
 
Usually, the sample size of an SPC chart is 5, but my understanding is that the sample size should be determined according to the 'normality' of the underlying distribution.
If the underlying distribution is 'absolutely not normal', the sample size required might be around 30 and if the underlying data is normal, there is no need to use samples and individual data can be used.
I am correct ?
Thanks
 
Vincent
 


Message: 140195
Posted by: Danny Carballo
Posted on: Tuesday, 22nd April 2008

Can you also "E" mail this attachment.

Thanks in advance.


Message: 140225
Posted by: Tiffany Lian
Posted on: Wednesday, 23rd April 2008

Hi, Sambuddha:

I am new to this forum, & very curios "why 30"? Could you please also send me the info, thank you very much.

txlianhotmail.com

Tiffany Lian


Message: 141051
Posted by: Devie
Posted on: Thursday, 15th May 2008

Hey Sambuddha,

again, I'm a newbie here, could you please send me the info about why 30 sample size... pleasee... thank you so much.

Please send it to me to devie_cynthiayahoo.com, as i will need it for my final paper.

Thanks again!


Message: 141071
Posted by: Sid
Posted on: Thursday, 15th May 2008

Hi Sambuddha,

I'm new to this forum and was intereted to know more about 30pc could you send me this project when you get a chance.

Sid


Message: 141072
Posted by: Sid
Posted on: Thursday, 15th May 2008

Hi sambuddha,

forgot to write my email id, quadrisyedhotmail.com

appreciate your help!

Thanks

Syed


Message: 141121
Posted by: Sid
Posted on: Saturday, 17th May 2008

If any one in this group has this information please do send it to me.....

Thanks,

quadrisyedhotmail.com


Message: 141237
Posted by: BelowTheBelt Certified
Posted on: Wednesday, 21st May 2008

Because the Standard Error Of the Mean improves as sample size increase to 30.


Message: 142795
Posted by: arlene
Posted on: Thursday, 26th June 2008

sam,

im a beginner in stat. im a math major but taking up master in stat. i am also curious about samle size n=30. i would really appreciate if u could send me info regarding this matter.

my email:arlene_nisyahoo.com

thanks a lot!

more power and GOD bless always!


Message: 147258
Posted by: source?
Posted on: Wednesday, 1st October 2008

Can you give references for the claim that symmetry/normality at n=25 has a LLN justification?

Thanks!


 
Rate This Article: 
  Poor    Excellent     
          1    2    3     4    5
Copyright © 2000-2008 iSixSigma – All Rights Reserved
Reproduction Without Permission Is Strictly Prohibited – Copyright Requests


Publish an Article: Do you have a Six Sigma tip, learning or case study?
Share it with the largest community of Six Sigma professionals, and be recognized by your peers.
It's a great way to promote your expertise and/or build your resume. Read more about submitting an article.


Download the iSixSigma Toolbar for 1-Click access. Search Your Way. Everyday. Without Delay.
Get 1-Click iSixSigma access. Search Your Way. Everyday. Without Delay.

BEST SELLING PRODUCTS (iSixSigma Publications)
  1. Six Sigma DMAIC Training Slides
    The complete 2008 Lean Six Sigma DMAIC course prepares participants to perform the role of a LSS Black Belt; covering wh...
  2. Process Management Training Slides
    The 2008 Process Management course is designed in two phases comprised of:352 Powerpoint slidesInstructor notesSlide exp...
  3. BPM Power Tools
    Utilize these four widely-popular tools necessary to prepare, gauge growth and implement strategy. Order the tools indiv...
  4. Gage R&R Excel Template
    Gage Repeatability and Reproducibility (R&R) studies measure the amount of measurement variation that is attributabl...
  5. Certified Lean Six Sigma Green Belt Assessment Exam
    This assessment exam is useful for students interested in assessing their knowledge of Lean Six Sigma on the Green Belt ...
  6. Certified Lean Six Sigma Black Belt Assessment Exam
    Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
  7. Six Sigma Yellow Belt Training Slides
    The 2008 Six Sigma Yellow Belt course is comprised of: 503 slidesInstructor notesSlide explanations15 data sets19 suppo...
 

Six Sigma AdLinks
Earn Your Six Sigma Green Belt or Black Belt Certificate Online
SBTI: Six Sigma for Healthcare
Juran Healthcare: Transforming Healthcare
Download free white paper on control charting
Novaces: Six Sigma for Healthcare
ASQ: Six Sigma Training and Certification
iSixSigma Live! Save up to $700
iSixSigma Job Shop: Find The Key Person


Google AdWords
 
Home | Discussion Forum | Event Calendar | Job Shop
Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing