Median = MAD = Data 100.3 100.2 100.1 100 100 100 99.9 99.7 99.5 100 n = 9 Deviations from the Median Absolute Deviations Absolute Normalized 0.3 0.3 2.02 0.2 0.2 1.35 0.1 0.1 0.67 0 0 0 0 0 0 0 0 0 ?0.1 0.1 0.67 ?0.3 0.3 2.02 ?0.5 0.5 3.37 0.1 0.14
APPENDIX D: COMPARISON OF PROCEDURES—PRECISION
附录D:方法比较 - 精密度
The following example illustrates the calculation of a 90% confidence interval for the ratio of (true) variances for the purpose of comparing the precision of two procedures. It is assumed that the underlying distribution of the sample measurements are well-characterized by normal distributions. For this example, assume the laboratory will accept the alternative procedure if its precision (as measured by the variance) is no more than four-fold greater than that of the current procedure.
为了比较两种方法的精密度需要计算(真)方差比值的90%置信区间,下面的实例阐述了计算过程。需要假设样本测量值的分布本质上是良好的正态分布。对于本例,如果替代方法的精密度(以方差计算)不大于现行方法的4倍,实验室就可以接受替代方法。
To determine the appropriate sample size for precision, one possible method involves a trial and error approach using the following formula:
对于精密度实验需要确定适当的样本量,使用下列公式的试错法是一种可能的确定方法:
where n is the smallest sample size required to give the desired power, which is the likelihood of correctly claiming the alternative procedure has acceptable precision when in fact the two procedures have equal precision; α is the risk of wrongly claiming the alternative procedure has acceptable precision; and the 4 is the allowed upper limit for an increase in variance. F-values are found in commonly available tables of critical values of the F-distribution. Fα, n-1, n-1 is the upper a percentile of an F-distribution with n-1 numerator and n-1 denominator degrees of freedom; that is, the value exceeded with probabilityα. Suppose initially the laboratory guessed a sample size of 11 per procedure was necessary (10 numerator and denominator degrees of freedom); the power calculation would be as follows7:
其中n是获得预期效能的最小样本量,这样就有可能在两种方法实际上等精度时正确地证明替代方法具备适当的精度;α是做出等精度证明错误的概率;4是方差增长的允许下限。F值通常可以从F分布的临界值表中找到。Fα, n-1, n-1值是F分布的上四分位数,这个F分布具有以n-1为分子及分母的自由度;也就是说,这个值超出了概率α。假定实验室最初猜测每个方法11个样本是必需为样本量(分子及分母的自由度均为10),按下式计算效能7:
Pr [F>/4Fα, n-1, n-1] = Pr [F>/4F.05, 10, 10] = Pr [F> (2.978/4)] = 0.6751
In this case the power was only 68%; that is, even if the two procedures had exactly equal variances, with only 11
7
This could be calculated using a computer spreadsheet. For example, in Microsoft? Excel the formula would be:
FDIST((R/A)*FINV(alpha, n ? 1, n ? 1), n ? 1, n ? 1), where R is the ratio of variances at which to determine power (e.g., R = 1, which was the value chosen in the power calculations provided in Table 6) and A is the maximum ratio for acceptance (e.g., A = 4). Alpha is the significance level, typically 0.05.
可以使用计算表格进行这个运算。比如在Microsoft? Excel中公式应该是:FDIST((R/A)*FINV(alpha, n-1, n-1), n-1, n-1),其中R是用于计算效能的方差比值(如:R=1,这个值可以从效能计算表6中选择),A是最大可接受值(如:A=4)。Alpha是显著性水平,通常为0.05。
1
1
samples per procedure, there is only a 68% chance that the experiment will lead to data that permit a conclusion of no more than a fourfold increase in variance. Most commonly, sample size is chosen to have at least 80% power, with choices of 90% power or higher also used. To determine the appropriate sample size, various numbers can be tested until a probability is found that exceeds the acceptable limit (e.g., power >0.90). For example, the power determination for sample sizes of 12–20 are displayed in Table 6. In this case, the initial guess at a sample size of 11 was not adequate for comparing precision, but 15 samples per procedure would provide a large enough sample size if 80% power were desired, or 20 per procedure for 90% power.
在这个例子当中效能仅为68%,这就是说,即使两个方法实际上是等方差的,当每个方法拥有11个样本时,实验仅有68%的机会得出方法没有超过4倍的结论。更常见的,样本量的选择至少能满足80%的效能,也会选择90%的效能或者更高。为了决定适当的样本量,会测试不同的数值直到结果超过了可接受的限度(如,效能>0.90)。例如,样本量为12–20时的效能测定值列在表6当中。本例中,最初的猜测样本量为11,其不具备足够的比较精度,但是每个方法15个样本就可以在预期80%效能时提供足够的样本量,或者在预期90%效能时需要20个样本。
Table 6. Power Determinations for Various Sample Sizes (Specific to the Example in Appendix D) (Continued)
Typically the sample size for precision comparisons will be larger than for accuracy comparisons. If the sample size for precision is so large as to be impractical for the laboratory to conduct the study, there are some options. The first is to reconsider the choice of an allowable increase in variance. For larger allowable increases in variance, the required sample size for a fixed power will be smaller. Another alternative is to plan an interim analysis at a smaller sample size, with the possibility of proceeding to a larger sample size if needed. In this case, it is strongly advisable to seek professional help from a statistician.
精度度比较的典型样本量会比实际比较时的大一些。如果精密度的样本量过大对于实验室进行研究就不实际了,这时可以有一些选择。第一个是重新选择增加的允许值。如果对方差增加的允许值大一些,对于相同效能所需的样本量会小一些。另一个选择是计划使用小样本量进行一个中间分析,可以使用大样本量的概率。本例中,强烈建议向统计学家寻求帮助。
Now, suppose the laboratory opts for 90% power and obtains the results presented in Table 7 based on the data generated from 20 independent runs per procedure.
现在,假定基于每个方法20次独立测试所获得的数据,实验室选择了90%效能,表7显示所获得的结果。
Ratio = alternative procedure variance/current procedure variance = 45.0/25.0 = 1.8
Lower limit of confidence interval = ratio/F.05 = 1.8/2.168 = 0.83 Upper limit of confidence interval = ratio/F.95 = 1.8/0.461 = 3.90
Table 7. Example of Measures of Variance for Independent Runs (Specific to the Example in Appendix D) Procedure Alternative Current Variance (standard deviation) 45.0 (6.71) 25.0 (5.00) Sample Size 20 20 Degrees of Freedom 19 19
For this application, a 90% (two-sided) confidence interval is used when a 5% one-sided test is sought. The test is
one-sided, because only an increase in standard deviation of the alternative procedure is of concern. Some care must be exercised in using two-sided intervals in this way, as they must have the property of equal tails—most common intervals have this property. Because the one-side upper confidence limit, 3.90, is less than the allowed limit, 4.0, the study has demonstrated that the alternative procedure has acceptable precision. If the same results had been obtained from a study with a sample size of 15— as if 80% power had been chosen—the laboratory would not be able to conclude that the alternative procedure had acceptable precision (upper confidence limit of 4.47).
在这种情况下,当寻求单侧5%区间时,需要使用90%(双侧)置信区间。检验是单侧的,因为只有替代方法标准偏差的增加才是需要考虑的。此时使用双侧区间需要加以小心,因为他们必须具备等尾的特性-通常区间具备这一属性。因为单侧置信上限(3.90)小于允许限度(4.0),研究显示替代方法具有可接受的精密度。如果使用样本量15的研究获得了相同的结果,假设选择了80%效能,实验室不能做出替代方法具有可接受精密度的结论(此时置信上限为4.47)。
APPENDIX E: COMPARISON OF PROCEDURES—DETERMINING THE LARGEST ACCEPTABLE
DIFFERENCE, δ, BETWEEN TWO PROCEDURES
This Appendix describes one approach to determining the difference, δ, between two procedures
(alternative-current), a difference that, if achieved, still leads to the conclusion of equivalence between the two procedures. Without any other prior information to guide the laboratory in the choice of δ, it is a reasonable way to proceed. Sample size calculations under various scenarios are discussed in this Appendix.
这是一种改进过的ESD检验,它可以从一个正态分布的总体当中发现预先设定数量(r)的异常值。对于仅检测1个异常值的情况,极端学生化偏离检验也就是常说的Grubb's检验。不建议将Grubb's检验用于多个异常值的检验。设定r=2,而n=10。
Tolerance Interval Determination
Suppose the process mean and the standard deviation are both unknown, but a sample of size 50 produced a mean and standard deviation of 99.5 and 2.0, respectively. These values were calculated using the last 50 results generated by this specific procedure for a particular (control) sample. Given this information, the tolerance limits can be calculated by the following formula:
这是一种改进过的ESD检验,它可以从一个正态分布的总体当中发现预先设定数量(r)的异常值。对于仅检测1个异常值的情况,极端学生化偏离检验也就是常说的Grubb's检验。不建议将Grubb's检验用于多个异常值的检验。设定r=2,而n=10。
x ±KS
in which x is the mean; s is the standard deviation; and K is based on the level of confidence, the proportion of results to be captured in the interval, and the sample size, n. Tables providing K values are available. In this example, the value of K required to enclose 95% of the population with 95% confidence for 50 samples is 2.3828. The tolerance limits are calculated as follows:
这是一种改进过的ESD检验,它可以从一个正态分布的总体当中发现预先设定数量(r)的异常值。对于仅检测1个异常值的情况,极端学生化偏离检验也就是常说的Grubb's检验。不建议将Grubb's检验用于多个异常值的检验。设定r=2,而n=10。
99.5 ± 2.382 × 2.0
hence, the tolerance interval is (94.7, 104.3).
这是一种改进过的ESD检验,它可以从一个正态分布的总体当中发现预先设定数量(r)的异常值。对于仅检测1个异常值的情况,极端学生化偏离检验也就是常说的Grubb's检验。不建议将Grubb's检验用于多个异常值的检验。设定r=2,而n=10。
Comparison of the Tolerance Limits to the Specification Limits
8
There are existing tables of tolerance factors that give approximate values and thus differ slightly from the values reported here.