An understanding of the statistics behind air cleanliness classification can help in designing and maintaining a cleanroom environment
By Jim Babb, Adams Instruments
Statistics are the essence of cleanroom air cleanliness classification, yet the reasons behind this are often not well understood or appreciated. This is not surprising, as normally one is following a standard operational procedure that offers little scope for interpretation. However, an appreciation of why things are done the way they are, and the additional information gleaned from collected particle count data, can be of great benefit in designing and maintaining an effective and efficient cleanroom environment.
Air cleanliness is measured by counting the numbers of particles in given size ranges. In the bio/pharmaceutical industry, the sizes are 0.5 µm and greater, while in the semiconductor industry, a size threshold of 0.1 µm or less is used. In the case of bio/pharmaceutical environments, the choices are determined by regulation while, with semiconductors, the choices are determined by the need to maximize product quality. This article explains in simple terms the statistics used in classification relative to particle counting, and how they are measured. There are additional texts available providing a more detailed explanation.1-3
When anything is measured, the result is always subject to random fluctuations as well as systematic (cause and effect) changes. For example, a car’s speed on a flat, straight road might be thought of as constant, yet it is actually changing due to variations in air speed, air density, road surface variation and many other things. When we talk about random fluctuations, we mean unknown or even unknowable effects. Although the variations (uncertainties) in measurements are random individually, when many measurements are made, these variations follow well-defined rules.
The mean value is the sum of values divided by the number of values, often expressed as:
where X is a value and N is the number of values.
The spread of the values is called the standard deviation. This is calculated using the following formula:
The standard deviation is often given the symbol σ (sigma).
If the variations in a measurement follow the Gaussian distribution (the well known bell-shaped curve), then it is expected that 67 percent of all values lie in the range Mean +/- σ. For example, if the mean is 20 and the standard deviation is 5, then 67 percent of all values lie within 15 and 25. Further, 95 percent of values lie within the range Mean +/- 2σ and 99 percent of all values lie within the range Mean +/ 3σ.
The word “significant” is often used in the context of: “Is the difference significant?” This question can be rephrased to: “Is there a greater than 5 percent (1 in 20) chance that these two measurements are equivalent?” If the answer to this question is yes, then the values are not significantly different. If the answer is no, then the values are significantly different.
It must be remembered that every mean has a standard deviation. They are inseparable; one is meaningless without the other. A common mistake in the presentation of statistical data is to forget this rule, which often manifests itself as statements such as: “There is one count per cubic foot, therefore there must be 35 counts per cubic meter.” In fact, the truth is that one count per cubic foot has a standard deviation of one count, which means expected counts per cubic meter are 35 with a standard deviation of 35, which means a ‘true’ count of between zero and 105 per cubic meter.
The statistics of counting
When measuring something that is quantified by counting, such as particle counting, the statistics follow a “Poisson” distribution. This predicts that, where discrete items are randomly distributed and a subsample is taken (for example, one cubic foot of air from a room), the standard deviation is the square root of the count. That is, the expectation value is equal to the variance, which is equal to the count.
This relationship is very helpful because it allows us to predict and compare the actual measured particle count’s standard deviation with theoretical standard deviation. If the measured and calculated standard deviations differ significantly, then we can conclude that there are non-random factors affecting the measurement. This can be a standard deviation that is too high, which could imply a problem with the sampling (for example, particles sedimenting), or a low value, which could suggest an instrumentation problem. In statistics there is nothing more suspicious than perfection.
Because we expect the standard deviation to be the square root of the mean, it follows that a count of four is not significantly different from a count of zero. Another observation is that low counts are inherently inaccurate: the higher the count the more precise the measurement. If one measures a count of 25, then the standard deviation is 5 (or 20 percent). If one measures 1,000 counts, then the standard deviation is approximately 3 percent. To put it another way, if the measurement were repeated 100 times, 67 of the results would be expected to lie within 3 percent of 1,000.
The standard error
The standard error is a measure of how accurately a result represents reality. In particle counting, this is the mean of particle counts measured at various locations, which themselves are often means of individual samples.
The accuracy of a result is given by specifying the upper and lower confidence limits. Usually this is a confidence of 95 percent (remember the word “significant”). With cleanroom classification, we are interested in the 95 percent upper confidence limit (UCL).
The standard error (se) is calculated by dividing the standard deviation by the square root of the number of samples:
The 95 percent upper confidence limit (UCL) can be calculated using the following formula:
The Tval is calculated using a complicated formula or from tables that depend on the number of values and the level of confidence being tested to. In cleanroom classification, the Tval values are precalculated and used in the prescribed method for calculating the UCL. If the number of values is greater than 9, then the Tval is set to 0, and so the standard error can be ignored.
Choosing a sample volume
The ISO 14644-14 and Federal Standard 209 (cancelled 11/29/01) classifications require a minimum sample volume that is the greater of 0.1 cubic feet or the volume that is expected to contain 20 particles at the class limit. For example, if the class to be tested expects 100 counts per cubic foot (28.3 liters), then the minimum sample volume is 0.2 cubic feet (5.6 liters). It should be noted that the ISO standard has a requirement to sample for a minimum of one minute regardless of flow rate.
British Standard 5295 (BS5295), on the other hand, specifies a sample volume of one cubic foot at a flow rate of one cubic foot per minute and varies the sample volume by specifying the number of samples per location.
The choice of 20 particles per sample volume minimum can be understood from the preceding section on the statistics of counting. Unless the sample volume is large enough, then the uncertainties in the measured values are too large and, it may be impossible to tell the difference between, say, a particle concentration of 100 counts per cubic foot and 10 counts per cubic foot.
EU GMP specifies a count of zero per cubic meter for some classes. From the previous discussion, this would require an infinite sample volume. In a sense, it is never possible to confidently measure a count of zero and know it is zero.
Choosing the number of sample locations
Each standard has its own method for determining the number of locations to sample. The method often requires more sampling locations for cleaner classification levels (ISO does not, FS209 and BS5295 do). This again is to reduce the uncertainties in the measured particle counts by increasing the number of samples.
Ideally, the locations are on a regular grid in order to give good spatial representation of the distribution of particle counts in an area. This ideal arrangement, however, can be limited by the physical layout of an area that prevents measurements at preferred locations. Another problem that can be encountered is the operator’s inclination to avoid, often subconsciously, known problem areas.
To simplify the calculation of the 95 percent UCL, with ISO and FS209 it is best to measure at 10 locations. This then means the standard error can be ignored and the mean value is considered to be the 95 percent UCL.
The number of locations for a general monitoring regimen is determined, as one might expect, by the objectives of the monitoring, which must account for the risks, objectives and benefits of the process, as well as any need for regulatory compliance.
Although there are no regulations or guidelines that specify the number of locations to be monitored, a good starting estimate for the expected number of locations is to take the square root of the floor area in square meters, as is done in the ISO standard.
Jim Babb is the director of optical engineering at Adams Instruments. Over the past 24 years, he has been involved in the development of highly complex, laser electro-optical systems and defining metrology standards for the FDA, defense contractors, and aerospace manufacturers. He can be reached at [email protected]
- Hon, Keone. An Introduction to Statistics, http://www.artofproblemsolving.com/LaTeX/Examples/statistics_firstfive.pdf
- Grinstead, Charles M., J. Laurie Snell. Introduction to Probability, http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf
- Reichmann, W.J. Use and Abuse of Statistics. Pelican, 1964.
- “Particle counter,” Wikipedia: http://en.wikipedia.org/wiki/Particle_counter