1010 USP39-NF34 ANALYTICAL DATA INTERPRETATION AND TREATMENT(3)

2019-03-15 19:36

的差距。相应地，即使被测总体符合典型特征，这些结果也有可能由于一个分析体系中的错误而变成非典型的。当得到一个异常值的时候，就要对该值进行系统的实验室调查，在某些情况下还要进行实验过程调查，以确定产生异常值是否有一个明确的原因（assignable cause）。产生异常值的明确原因通常有但不限于以下几点——人为错误，仪器错误，计算错误，产品或者组分缺陷。如果产生异常值的明确原因可以确定与产品或者组分缺陷有关，那么如果可能就对同一样本进行重复实验，或者用新的样本进行重复实验。对于方法的精密性和准确性，USP参考品，过程趋势和规定标准限值等都要进行审核。基于这些经过证明的调查，数据有可能被发现是无效的，这时需将它从后续的计算中删除。

If no documentable, assignable cause for the outlying laboratory result is found, the result may be tested, as part of the overall investigation, to determine whether it is an outlier.

如果没有发现实验室结果异常存在可证明的明确原因，那么它要作为整体研究的一部分进行检验，以确定它是否是个异常值。

However, careful consideration is warranted when using these tests. Two types of errors may occur with outlier tests: (a) labeling observations as outliers when they really are not; and (b) failing to identify outliers when they truly exist. Any judgment about the acceptability of data in which outliers are observed requires careful interpretation.

但是，当进行这些检验的时候一定要小心谨慎。在进行异常值检验的时候会犯两类错误。第一类是将不是异常值的值当做异常值；第二类是把异常值当做正常值。对于观测到异常值的数据可接受性，所做出的任何一种判断都要进行详细的解释。

―Outlier labeling‖ is informal recognition of suspicious laboratory values that should be further investigated with more formal methods. The selection of the correct outlier identification technique often depends on the initial recognition of the number and location of the values. Outlier labeling is most often done visually with graphical techniques. ―Outlier identification‖ is the use of statistical significance tests to confirm that the values are inconsistent with the known or assumed statistical model.

“异常值的标识(outlier labeling)”是对可疑实验数据的非正式识别，要用更正规的方法进一步调查。异常值正确识别方法的选择通常依赖于对数值的数目和位置的初步识别。标志异常值通常用绘图方法进行目视标识。“异常值识别（Outlier identification）”是使用统计学显著性方法来确定数值在已知的或假定的统计模型中是异常的。 When used appropriately, outlier tests are valuable tools for pharmaceutical laboratories. Several tests exist for detecting outliers. Examples illustrating three of these procedures, the Extreme Studentized Deviate (ESD) Test, Dixon's Test, and Hampel's Rule, are presented in Appendix C.

如果使用得当，异常值检验对于药品领域的实验室来说非常有用。有几个方法可以用于异常值检验。在附录C中有3个例子：极端学生化偏离检验（ESD检验）、狄克逊检验（Dixon检验）和Hampel规则

Choosing the appropriate outlier test will depend on the sample size and distributional assumptions. Many of these tests (e.g., the ESD Test) require the assumption that the data generated by the laboratory on the test results can be

thought of as a random sample from a population that is normally distributed, possibly after transformation. If a transformation is made to the data, the outlier test is applied to the transformed data. Common transformations include taking the logarithm or square root of the data. Other approaches to handling single and multiple outliers are available and can also be used. These include tests that use robust measures of central tendency and spread, such as the median and median absolute deviation and exploratory data analysis (EDA) methods. ―Outlier accommodation‖ is the use of robust techniques, such as tests based on the order or rank of each data value in the data set instead of the actual data value, to produce results that are not adversely influenced by the presence of outliers. The use of such methods reduces the risks associated with both types of error in the identification of outliers.

选择合适的异常值检验方法取决于样本量大小和分布假设。许多检验方法（如ESD检验）要求假设实验室结果数据是来自一个正态分布总体或者转换成正态分布总体的随机样本。如果对数据进行了转换，则异常值检验方法适用于转换后的数据。常见的转换方法包括取对数转换或平方根转换。其他处理单个或者多个异常值的方法也可以使用。这些方法包括集中趋势和离散趋势的稳健分析方法，比如中位数、中位数绝对偏差（median absolute deviation）和探索性数据分析（EDA）方法。“异常值的调适（Outlier accommodation）”是利用稳健方法使得出的结果不会因异常值存在而产生不利影响，比如可以使用每个数据值在整个数据集中的序或秩来替代原数据进行分析。使用这些方法会降低了异常值识别过程中出现上述两类错误的风险。

―Outlier rejection‖ is the actual removal of the identified outlier from the data set. However, an outlier test cannot be the sole means for removing an outlying result from the laboratory data. An outlier test may be useful as part of the evaluation of the significance of that result, along with other data. Outlier tests have no applicability in cases where the variability in the product is what is being assessed, such as content uniformity, dissolution, or release-rate determination. In these applications, a value determined to be an outlier may in fact be an accurate result of a nonuniform product. All data, especially outliers, should be kept for future review. Unusual data, when seen in the context of other historical data, are often not unusual after all but reflect the influences of additional sources of variation.

“异常值的剔除（Outlier rejection）”是将识别出的异常值从数据集中剔除。但是，异常值检验不是将异常值从实验数据中剔除的唯一方法。异常值检验连同其他数据一起，作为对结果显著性评估的一部分是很有用的。当实验目的就是评价产品的变异性时，如在含量均匀度、溶出性或释放速率实验中，异常值检验是无法使用的。在这种情况下，被确定为异常值的一个结果实际上就是确定产品不均一的一个准确结果。所有数据，尤其是异常值，要保留下来供今后进一步的评估。当放在其它历史数据中一起审视时，一些异常的数据很可能就不是异常的了，而是反映出其他变异性来源的影响。

In summary, the rejection or retention of an apparent outlier can be a serious source of bias. The nature of the testing as well as scientific understanding of the manufacturing process and analytical procedure have to be considered to determine the source of the apparent outlier. An outlier test can never take the place of a thorough laboratory investigation. Rather, it is performed only when the investigation is inconclusive and no deviations in the manufacture

or testing of the product were noted. Even if such statistical tests indicate that one or more values are outliers, they should still be retained in the record. Including or excluding outliers in calculations to assess conformance to acceptance criteria should be based on scientific judgment and the internal policies of the manufacturer. It is often useful to perform the calculations with and without the outliers to evaluate their impact.

总之，拒绝或者保留一个明显的异常值都会导致明显偏倚。（异常值）检验（方法）的特性以及对生产过程和分析方法的科学理解都必须在确定这个异常值的来源时予以考虑。一个异常值的检验永远不能代替全面的实验室调查分析。实际上，只有在调查分析中无法找出确切原因，也没有发现在产品生产和检测中存在偏离时才能使用异常值检验。即使这样的统计学检验显示有一个或者多个数据是异常值，也仍要将它们保留在原始记录中。在评估标准符合性的计算过程中，保留或排除这些异常值都应该基于科学判断和生产商内部政策。在时，使用包含异常值和不包含异常值分别计算的方法对于评价异常值的影响通常是有用的。

Outliers that are attributed to measurement process mistakes should be reported (i.e., footnoted), but not included in further statistical calculations. When assessing conformance to a particular acceptance criterion, it is important to define whether the reportable result (the result that is compared to the limits) is an average value, an individual measurement, or something else. If, for example, the acceptance criterion was derived for an average, then it would not be statistically appropriate to require individual measurements to also satisfy the criterion because the variability associated with the average of a series of measurements is smaller than that of any individual measurement.

对于那些测量过程错误导致的异常值都需要进行记录（如使用脚注），但是不用将其包含在接下来的计算中。当评价是否符合某一特定接受标准时，非常重要的一件事是确定需报告的结果（即与限值比较的结果）是均值、单次测量值，还是其他的值。比如，如果接受标准是来自于均值，那么要求单个测量值也满足这个标准在统计学意义上就是不适当的，因为一系列测量均值的变异性要小于任何一个单独测量值的变异性。

COMPARISON OF ANALYTICAL PROCEDURES

分析方法的比较

It is often necessary to compare two procedures to determine if their average results or their variabilities differ by an amount that is deemed important. The goal of a procedure comparison experiment is to generate adequate data to evaluate the equivalency of the two procedures over a range of values. Some of the considerations to be made when performing such comparisons are discussed in this section.

我们经常需要比较两种（分析）方法以确定它们的平均结果或变异性是否存在重要差异。方法比较实验的目的是获得足够的数据，以便评价在一定范围内两种方法的等效性。下面的内容给出了在进行这种比较时应该做出的考虑。

Precision 精密度

Precision is the degree of agreement among individual test results when the analytical procedure is applied repeatedly to a homogeneous sample. For an alternative procedure to be considered to have ―comparable‖ precision to

that of a current procedure, its precision (see Analytical Performance Characteristics in <1225>, Validation) must not be worse than that of the current procedure by an amount deemed important. A decrease in precision (or increase in variability) can lead to an increase in the number of results expected to fail required specifications. On the other hand, an alternative procedure providing improved precision is acceptable.

精密度是指使用分析方法对均质样本进行重复测定时，各实验结果一致的程度。因为一个替代方法应当被认为具有与现行方法“相似”的精密度，其精密度（参见<1225>中分析性能属性，确认）与现有方法相比必须不能存在明显的差异。精密度的下降（或者说变异的增大）可导致不符合规定质量标准的实验结果数量增加。另一方面，体现出更佳精密度的替代方法是可以接受的。

One way of comparing the precision of two procedures is by estimating the variance for each procedure (the sample variance, s, is the square of the sample standard deviation) and calculating a one-sided upper confidence interval for the ratio of (true) variances, where the ratio is defined as the variance of the alternative procedure to that of the current procedure. An example, with this assumption, is outlined in Appendix D. The one-sided upper confidence limit should be compared to an upper limit deemed acceptable, a priori, by the analytical laboratory. If the one-sided upper confidence limit is less than this upper acceptable limit, then the precision of the alternative procedure is considered acceptable in the sense that the use of the alternative procedure will not lead to an important loss in precision. Note that if the one-sided upper confidence limit is less than one, then the alternative procedure has been shown to have improved precision relative to the current procedure.

比较两种方法精密度的一种方式是通过评价每种方法的方差（样本方差s2即是样本标准偏差的平方），并计算替代方法与现用方法的（真）方差比值的单侧置信上限（one-sided upper confidence limit）。附录D给出了这种假设的一个具体实例。理所当然的，该单侧置信上限应该与分析实验室确定的可接受上限进行比较。如果所计算的单侧置信上限低于可接受上限，该替代方法的精密度就被认为可以接受，即认为使用该替代方法不会导致重要的精密度损失。应该注意的是，如果计算所得的单侧置信上限小于1，那么替代方法已经显示出比原使用方法的精密度高的结论。

The confidence interval method just described is preferred to applying the two-sample F-test to test the statistical significance of the ratio of variances. To perform the two-sample F-test, the calculated ratio of sample variances would be compared to a critical value based on tabulated values of the F distribution for the desired level of confidence and the number of degrees of freedom for each variance. Tables providing F-values are available in most standard statistical textbooks. If the calculated ratio exceeds this critical value, a statistically significant difference in precision is said to exist between the two procedures. However, if the calculated ratio is less than the critical value, this does not prove that the procedures have the same or equivalent level of precision; but rather that there was not enough evidence to prove that a statistically significant difference did, in fact, exist.

上述置信区间的方法特别适合用于两样本的F检验来判断方差比值的统计学显著性差异。要进行两样本的F检验，需要将样本方差比与临界值进行比较，临界值可以根据预期的置信度和每个方差的自由度在F分布表中查

出。大部分的统计书籍都提供这样的F值表。如果所计算的比值超过临界值，则认为两种方法的精密度在统计学上存在显著差异。但如果所计算的比值小于临界值，并非证明两种方法具有相同或等效水平的精密度，而只能认为没有足够的证据证明两者之间在统计学上有显著差异。

Accuracy 准确度

Comparison of the accuracy (see Analytical Performance Characteristics in <1225>, Validation) of procedures provides information useful in determining if the new procedure is equivalent, on the average, to the current procedure. A simple method for making this comparison is by calculating a confidence interval for the difference in true means, where the difference is estimated by the sample mean of the alternative procedure minus that of the current procedure. 一般认为，方法间准确度（参见<1225>中分析性能属性，确认）的比较，在确定新方法在平均水平上是否与现有方法等效方面可提供非常有用的信息。一个进行比较的简单方法就是计算真实均值之差异的置信区间，这里，该差异是通过替代方法测得结果的均值减去现用方法的结果均值进行评估的。

The confidence interval should be compared to a lower and upper range deemed acceptable, a priori, by the laboratory. If the confidence interval falls entirely within this acceptable range, then the two procedures can be considered equivalent, in the sense that the average difference between them is not of practical concern. The lower and upper limits of the confidence interval only show how large the true difference between the two procedures may be, not whether this difference is considered tolerable. Such an assessment can be made only within the appropriate scientific context. This approach is often referred to as TOST (two one-sided tests; see Appendix F)

理所当然的，计算所得的置信区间应该与实验室确定的置信上限和下限进行比较。如果置信区间完全落在其确定的可接受置信上下限内，那么可以认为两种方法是等效的；即认为两种方法的均值没有实际差异。该置信区间的上下限仅显示两种方法的真值差异有多大，而不是说明这种差异是否可以被容忍。对于是否可以容忍这种差异的评估只有在科学的背景下才能进行。这种方法一般被TOST（双单侧检验；参见附录F）。 The confidence interval method just described is preferred to the practice of applying a t-test to test the statistical significance of the difference in averages. One way to perform the t-test is to calculate the confidence interval and to examine whether or not it contains the value zero. The two procedures have a statistically significant difference in averages if the confidence interval excludes zero. A statistically significant difference may not be large enough to have practical importance to the laboratory because it may have arisen as a result of highly precise data or a larger sample size. On the other hand, it is possible that no statistically significant difference is found, which happens when the confidence interval includes zero, and yet an important practical difference cannot be ruled out. This might occur, for example, if the data are highly variable or the sample size is too small. Thus, while the outcome of the t-test indicates whether or not a statistically significant difference has been observed, it is not informative with regard to the presence or absence of a difference of practical importance.

上述这种置信区间的比较方法特别适合于使用t-检验去检测两均值差异的统计显著性问题。进行t检验的一种

共8页:

1010 USP39-NF34 ANALYTICAL DATA INTERPRETATION AND TREATMENT(3).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档