Ana
Applied Stochastic Models in Business and Industry The use of analysis of variance procedures in biological studies
The use of analysis of variance procedures in biological studies
Byron K. WilliamsBu kitabı ne kadar beğendiniz?
İndirilen dosyanın kalitesi nedir?
Kalitesini değerlendirmek için kitabı indirin
İndirilen dosyaların kalitesi nedir?
Cilt:
3
Yıl:
1987
Dil:
english
Sayfalar:
20
DOI:
10.1002/asm.3150030403
Dosya:
PDF, 1,04 MB
Etiketleriniz:
 Lütfen önce hesabınıza giriş yapın

Yardıma mı ihtiyaç var? Kindle'a nasıl kitap gönderileceğine ilişkin talmatına bakın
Dosya 15 dakika içinde epostanıza teslim edilecektir.
Dosya 15 dakika içinde sizin kindle'a teslim edilecektir.
Not: Kindle'ınıza gönderdiğiniz her kitabı doğrulamanız gerekir. Amazon Kindle Support'tan gelen bir onay epostası için gelen kutunuzu kontrol edin.
Not: Kindle'ınıza gönderdiğiniz her kitabı doğrulamanız gerekir. Amazon Kindle Support'tan gelen bir onay epostası için gelen kutunuzu kontrol edin.
İlgili Kitap Listeleri
0 comments
Kitap hakkında bir inceleme bırakabilir ve deneyiminizi paylaşabilirsiniz. Diğer okuyucular, okudukları kitaplar hakkındaki düşüncelerinizi bilmek isteyeceklerdir. Kitabı beğenip beğenmediğinize bakılmaksızın, onlara dürüst ve detaylı bir şekilde söylerseniz, insanlar kendileri için ilgilerini çekecek yeni kitaplar bulabilecekler.
1


2


APPLIED STOCHASTIC MODELS AND DATA ANALYSIS, VOL. 3, 207226 (1987) Statistical validation Biornetrics, medicine and health care THE USE OF ANALYSIS OF VARIANCE PROCEDURES IN BIOLOGICAL STUDIES BYRON K . WILLIAMS U.S. Fish and Wildlife Service, Patuxent Wildlife Research Center, Laurel, MD 20708, U.S.A. SUMMARY The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple twofactor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells. KEY WORDS ANOVA Unbalanced Parametrization Missing cells Hypothesis Analysis of variance INTRODUCTION Analysis of variance (ANOVA) procedures are among the most widely used techniques in biological science for the analysis of data. Their extensive use in both experimental and sampling studies suggests that ANOVA is a proven, thoroughly understood and readily interpretable methodology. Yet one quickly finds, in both the consulting forum as well as the applications literature, that there are substantial differences of opinion among researchers about its use and interpretation. In part this is because of a highly specialized statistical language, with such terms ; as orthogonal contrasts, generalized inverses, estimable functions and other technical jargon. In part it is because of the inherent complexity of the statistical subject matter. And in part it is because of a multiplicity of different approaches to ANOVA, each utilizing testing procedures that are specific to a different statistical model. Procedural difficulties arise whenever sampling designs are unbalanced, i.e., when the number of observations varies across levels of the design factors. A lack of balance generally occurs for one of three reasons. First, it may be a deliberate feature of the design, wherein comparisons among certain factor combinations are emphasized over others. Cost considerations, sample availability, varying requirements for power and other factors can result in such ‘planned imbalance’. Second, it may result from sample losses that are attributable to random factors beyond the scope of the design. Data of this sort may be thought of as ‘a sample of samples’, to which the ANOVA is applicable without violation of assumptions. Third, imbalance may 87550024/87/04020720$10.00 0 1987 by John Wiley & Sons, Ltd. Received March 1987 208 BYRON K. WILLIAMS result from a loss of samples that is related to the design itself, for example when certain treatment combinations induce death and others d o not. In this case biological information resides in the pattern of cell frequencies as well as the pattern of response measurements. It is often of value to highlight the former by some method of categorical data analysis. Statistical problems occur with unbalanced data that are not present in balanced designs. For example, with unbalanced data each of several standard computing procedures for ANOVA can result in a different statistical test with a corresponding distinct hypothesis. This problem is encountered frequently in biological studies, where unbalanced designs are far more common than balanced designs. Uncontrolled metabolic changes in sample organisms, unavailability of rare samples, greatly differing costs associated with different treatments, accidental losses, sample contaminations and a host of other exigencies often prevent the collection of data in equal sample sizes. Consequently researchers who analyse their data with standard textbook procedures, which typically are appropriate only for balanced designs, may get inappropriate or confusing results. This is especially apparent in the unsettling but not uncommon situation wherein design effects are significant according to one testing procedure but nonsignificant according to another. This kind of ambiguity, and the confusion it engenders, has been discussed by many researchers in the recent statistical literature. Most of the commonly used computing procedures for fixed effects designs are based, at least implicitly, on one of three statistical models. These are the cell means model (CMM), for which a single parameter is identified for each cell in the design; the fully parametrized model (FPM), with ‘main effect’ as well as ‘interaction’ parameters; and the restricted parametrization model (RPM), in which main effect and interaction parameters satisfy constraints known as the ‘Crestrictions’. Each model is discussed in some detail below, as are certain associations among their parametrizations, hypotheses and testing procedures. It is important to recognize these associations, because the hypotheses addressed by most computing programs are unstated, specific to the implied statistical model and not necessarily the hypotheses of interest to the researcher. These points are elaborated below and procedures are suggested for insuring that questions addressed in an ANOVA are those that are intended by the researcher. In the following sections the statistical models mentioned above are described, and methods for estimating model parameters for them are briefly mentioned. Then some associations of parameters among the different models are discussed, followed by a description of certain relevant hypotheses and the procedures by which they are tested. Finally, I address the problems attendant when not all factor combinations are represented in the data and make recommendations concerning appropriate testing procedures. Although the focus here is on crossed fixed effects designs and the standard hypotheses arising therefrom, an elegant and quite comprehensive theory for ANOVA is extensively documented elsewhere in the statistical literature. This theory includes considerations on design connectedness, random effects and mixed models, the use of covariates in ANOVA, the nesting of design effects, the design and analysis of specialized hypotheses and a host of other special topics. My purpose here is neither to reproduce these results nor to provide a review of currently available statistical software for ANOVA. It is rather to use simple but easily generalized statistical models to emphasize the importance of a formal specification of biological hypotheses and to highlight the need for careful interpretation of statistical inferences. ’ PARAMETRIZATIONS FOR A TWOFACTOR DESIGN MODEL Parametric expressions for the three models are given in this section. Nonmatrix forms are in T H E USE OF ANALYSIS OF VARIANCE PROCEDURES 209 the development below, with corresponding matrix forms shown in Appendix I . Throughout the development I focus on the twofactor crossed design with fixed effects and unequal numbers of samples per cell. Thus factor A is assumed to have a levels, factor B has b levels and the cell corresponding to factor levels i and j contains n i j observations. If all factor combinations are represented in the data, the number p of such factor combinations is simply the product ab. As a biological example we consider the weight of body fat in animals subjected to three diets. Observation units are the individual animals, presumably sampled randomly from a larger population. The response variable is the weight after some predetermined period over which diets are imposed. Factor A might represent sex (two levels) and factor B corresponds to the diets (three levels). A total of six statistical populations is defined, each characterized by specific levels of A and B. Cell means for the 2 x 3 layout, with corresponding sample numbers, are shown in Table I. This layout is used for illustration throughout the development below. Table I. Cell means and sample sizes for a 2 x 3 crossed design. Row and column totals of cell means are given by p, = CJpl, and p = Clpl, respectively. Row and column totals of sample sizes are given by n, = C,n, and n = C J Z ,respectively ~ Population means 1 2 3 P. 1 P.2 P.3 Cell means model Perhaps the simplest expression for the twofactor design is the cell means model yijk where = Pij k Eijk is the kth observation for cell (i, j ) in the layout, Pij is the population mean for cell is a random error term associated with y i j k . The model is so named because it describes cell means in terms of individual parameters, one for each cell. Errors are typically assumed to be independently, normally distributed with a mean of 0 and an unknown variance u 2 . The general matrix form of the CMM is displayed in Appendix I . If no constraints are imposed on the cell means, the ordinary least squares (OLS) estimation procedure6 yields the intuitive estimates yijk ( i , j ) and &ijk and ’ for cell means pi, and common variance u2 respectively. The divisor n  p is simply the number of observations reduced by the number of parameters. As an example consider the artificial data displayed in Table I1 for body fat in a 2 x 3 layout as described above. Sample sizes for the six statistical populations vary from two observations of males under treatments 1 and 2 to six observations for females under treatment 3. The matrix 210 BYRON K . WILLIAMS Table I1 Artificial data set for a diet study involving three diets and two sexes in a simple crossed design. Six populations are defined, with varying numbers of observations from each population form of the CMM, as applied to these data, is shown in Appendix I. A straightforward application of equations (1) and (2) yields the estimates CII,$ 1 2 , $13, $21, $22, $23) = (20,60,80,80, 60,20) and sz = 9.6. Fully parametrized model An unconstrained parametrization Yijk = L./ + + pj + r ; j + f i j k (Y; is perhaps the most familiar form for a twofactor design model. In this form a;expresses an ‘overall effect’ for level i of factor A, 6, expresses an analogous effect for level j of factor B and r i j expresses an interaction between the two factors. The meaning of these parameters is discussed below. The general matrix form of the FPM is shown in Appendix I along with an application to the data set in Table 11. Though at most ab parameters are required to specify cell means for a twofactor crossed design, the FPM contains ( a + l ) ( b+ 1) parameters. Thus it is said to be ‘overparametrized’ or ‘less than full rank’. An important consequence is that unique OLS estimators of the model parameters cannot be obtained. However, it is possible to uniquely estimate certain linear combinations of parameters, namely those that can be expressed as linear combinations of cell means. Criteria for estimability can also be given in terms of the FPM parameters themselves.6 Since individual parameters cannot be estimated, tests of hypotheses about them cannot be obtained. This limitation on estimability, along with attendant difficulties in interpreting (and explaining) the corresponding statistical tests, is a major drawback to the use of the FPM. Restricted parametrization model A third commonly used model for twofactor designs is the parametrization Yijk where the (Y * + * + pj* + rz + f j j k = /.l (Yj *, p* and r * are constrained by the ‘Crestrictions’ ’ Cia;=O cjpj*==O c.r*.r*o 1 IJ J IJ This model, which is merely a special case of the constrained general linear model, is emphasized here because of its common use in statistical computing procedures. The practical effect of the Crestrictions is that redundancies in the parameter structure can 21 1 THE USE OF ANALYSIS OF VARIANCE PROCEDURES be eliminated, so that unqiue OLS estimates of model parameters can be obtained. For a 2 x 3 design, for example, the Crestrictions yield * a2 = * a, P: = 0: p: r:, =  r:, = r:, + 17, r:2 = rY2 r:, = r:, From these equations it can be seen that there is only one independent a*, only two independent r*. Thus the Crestrictions reduce to six the number of parameters required to specify the model: once p * , a:, PT, P:, ,:?I and r72 have been estimated, the estimates for the remaining parameters are obtained directly from them. This parameter reduction is an important attribute in the use of the restricted parametrization model. It enables one to specify the model in such a way that the usual methods of ordinary leastsquares regression produce unique estimates of parameters. It also results in parameters that can be meaningfully interpreted. This point is argued in greater detail below. The general matrix form for the RPM is shown in Appendix I along with OLS computing formulae for the parameter estimates. The matrix format for the model is illustrated with data in Table 11. Using the computing formulae, parameter estimates for these data are (@*,2Y:,B:, 82*,FYI,i=':2) = ( 5 3 * 3 , 0 ,  3 . 3 , 6 . 7 ,  30,O)and s2 = 9 . 6 . It is easy to show that appropriate combinations of these values lead to the same estimates of cell means as those given by the CMM. A key assumption in ANOVA procedures is that observation variances are all identical. There are at least three circumstances in which this assumption can be violated. First, there may be a mathematical relationship P* and only two independent 2 f(pij) = 0 i j between population means and their corresponding variances. It is not uncommon, for example, for variation in organism sizes to be proportional to mean organism size. If such a relationship between mean and variance can be ascertained, a variancestabilizing transformation of the data can produce unbiased estimation and testing procedures. Second, heterogeneous variances can be introduced by way of subsampling. If subsamples of a sampling unit are averaged to produce a response value for the sampling unit, the corresponding variance is inversely proportional to subsampling intensity: var(x,,k) = U2/ki,k where kijk is the number of subsamples used in xijk. A simple corrective for nonconstant subsampling is to weight each sampling unit by the number of subsamples included in it and then proceed with the standard estimation and testing procedures. Third, variance heterogeneity can be introduced in the choice of the sampling frame itself. This occurs, for example, in comparisons of taxa with greatly varying taxonomic diversity, in contrasts across areas of greatly varying geographic extent, in studies involving widely varying amounts of environmental fluctuations and so on. A general expression for such heterogeneity is 05 =Wijd where Wij expresses the relative variance for population ( i , j ) . On the assumption that this variance heterogeneity can be ascertained, then an appropriate procedure is to weight the cell means in equations (1) and (2) with the terms wij.' There are in fact many sources of variability in biological studies that can result in nonconstant variances among sampling units or populations. It is usually wise to test for their occurrence at the outset of an analysis of variance (see, e.g., Brown and Forsythe lo and Milliken and Johnson for testing procedures). 212 BYRON K . WILLIAMS ASSOCIATIONS AMONG MODELS Though the three models described above have different mathematical expressions, in fact they are merely alternative representations of the same set of statistical populations. Because the same cell mean can be expressed in three different ways, certain equivalences exist among the parametrizations. For example, the RPM and CMM are related by For a 2 x 3 design these equations can be expressed in matrix form by 1 1 0 1 1 0 1 On condition that all factor combinations are included in the design, this reduces to p* = j.. a: = j l .  j,, = j.j j.. rZ = pij  j;. j . j+ j.. (3) (see Appendix I). The relationships expressed in equations (3) provide reasonable interpretation for the parameters in the RPM. Thus the 'main effect' parameters a? and P,? correspond to row and column averages p;. and p.j, whereas the interaction p i j  p;j'  p ; ' j + pi',' is equivalent to r;  I';,  I'?, + I',*.j,. For example, in the twofactor design involving diet and sex factors, the main effect a? sex i corresponds to the average (pi1 + pi2 + p;3)/3 across all three diets. Conversely, p,* corresponds to the average ( p l j + pzj)/2 for diet j across both sexes. Finally, contrasts among the r; are equivalent to contrasts among cell means for the corresponding levels of d$et and sex. In a similar fashion the association between parameters in the RPM and the FPM is given by p* + a? + 07 + r;= p + a; + Pj + r;j which, if there are no missing cells, reduces to the simple expressions p* = p + a. + 6.+ F,,   pj+ = Pj + F . j  (P.  r..  + F;, (a. r..) r; = ru Ti. F.j+ F,, $= (4) These relationships make explicit the fact that row and column averages, as represented by a: and P7, correspond to combinations of main effect and interaction parameters in the FPM. They also indicate a onetoone relationship between interaction parameters rij and r; in the two models. Finally, the association between parameters in the CMM and FPM is given by Because the FPM is overparametrized, there is no unique representation of its parameters in terms of the cell means. Indeed, infinitely many combinations of FPM parameters can satisfy equation ( 5 ) . This lack of uniqueness is a direct consequence of overparametrization. Though equivalences exist among some commonly tested hypotheses, as indicated by equations (3)(9, the CMM nevertheless has certain practical advantages over the FPM and RPM. It has the simplest mathematical form, with single parameters for each of the cell means. Any THE USE OF ANALYSIS OF VARIANCE PROCEDURES 213 linear combination of these parameters is estimable, including individual cell means. If there are no model restrictions, the estimates themselves have an intuitive form and are easily computed. Finally, the CMM is the simplest model with which to express biologically meaningful hypotheses. A comparison of hypotheses and testing procedures for the three models is made in the following sections, in part because all three are extensively used in statistical computing programs. It is emphasized that since each model represents the same twofactor design, they all must be related through their mathematical and statistical properties. In effect, information about the structure of cell means is neither lost nor gained in the choice among these models. As shown below, however, that choice can inadvertantly result in tests of hypotheses that are quite different and indeed often quite unintended. CONSTRAINTS ON MODEL PARAMETERS In many biological studies the appropriate ANOVA model for a particular design includes constraints on the model parameters. Such constraints typically arise in one of two ways. First, limitations on time, manpower or some other factor involving the sampling procedure can result in restrictions on the assignment of design factors to sampling units. These restrictions in turn can lead to constraints on model parameters. A simple example is a randomized block design wherein treatements are randomly allocated within blocks. This leads to a standard ANOVA for which there is assumed to be no block x treatment interaction. The second way constraints arise is by way of information outside the study. In many cases, for example, it is reasonable, based on scientific evidence, to assume that two factors in a completely randomized factorial design have no interaction. The corresponding model includes parameter constraints reflecting this assumption. In general, constraints are specified by a set of linear equations involving model parameters. As an example, the assumption of no interaction in the 2 x 3 layout corresponds to the pair of constraint equations p11  p12  p21 + p22 = 0 pl2  p13  p22 + p 2 3 = 0 The effect of such constraints is that not all parameters can be estimated independently. For example, the interaction constraints above can be rewritten as p21 = 1 1 1  p12 + p22 p13 = p12  p22 + p23 from which it is readily seen that knowledge of p l l , p12,p22 and p23 is tantamount to knowledge of the full set of six parameters. Appendix I1 exhibits general matrix forms for model constraints and describes a procedure whereby such constaints can be incorporated into a model to produce the appropriate estimates. Constraints on the cell means correspond, through equations (3) and ( 5 ) , to constraints on the fully parametrized and restricted parametrization models. Those of immediate concern here are the ‘nointeraction’ constraints pjj  p j j < p i , j + pj,jr which can be shown to correspond to the constraints in the FPM and RPM respectively. =z 0 (6) 214 BYRON K . WILLIAMS The effect of constraints can be illustrated with the data set in Table 11. Imposition of the ‘nointeraction’ constraints (6) results in the estimates F’ = (65,69,53,50,54,38) and s2 = 751 for the cell means and variance. On comparison with the unconstrained model this estimate of variance is seen to be greatly inflated over that of the unconstrained model. In addition, the estimates for cell means have changed substantially. From this example it should be clear that the use of such constraints can strongly influence the biological inferences made from an ANOVA. Thus the researcher should impose constraints only when the design, subject matter or data analysis suggests it is appropriate to do so. THE TESTING OF HYPOTHESES A key component of the conventional scientific method is the testing of hypotheses with experimental or field data. Indeed, the comparison of theoretically interesting hypotheses with a body of relevant observations is a recognized characteristic of all biological sciences. The ANOVA is specifically designed t o effect such a comparison. It is not uncommon, however, for biologists to test hypotheses that are either irrelevant or uninterpretable. This is so primarily for two reasons. First, biologists often fail to specify hypotheses in a way that corresponds to the sampling design of the study. For example, simple expressions such as ‘factor A has no effect’ and ‘there is no main effect for A’ are insufficiently specified. As argued below, unambiguous hypotheses must be expressed in terms of model parameters. Second, computing procedures often are used which, though convenient, are inappropriate for the hypothesis of interest. For example, in a 2 x 3 design for the study of effects of sex and diet, the statistic Cjn,j(z.j. x...)’ might be used to test the effect of diet, on the mistaken impression that it tests diet effects free and clear of sex effects. Such errors result from the the application of computing procedures without proper attention to their corresponding hypotheses. In this section I address both these sources of difficulty, first by suggesting some reasonable hypotheses for the twofactor model and then by describing the appropriate testing procedures for them. I also describe some other.test statistics that are frequently used (and misused), along with their corresponding hypotheses. Standard hypotheses for the twofactor ANOVA General computing algorithms are available for testing hypotheses about parameters in any linear model. It is necessary only that a hypothesis be expressible as a linear constraint on model parameters and that it satisfy a general ‘testability criterion’ (see, e.g., Graybil16 and Searle for comprehensive treatments). In what follows I describe three hypotheses of general interest and identify equivalent expressions for them in terms of parameters in each of the models discussed above. ’’ Average main effect of A It is often of interest to know whether one factor has an effect when averaged over all levels of the other factor. For factor A the hypothesis of interest is Ho : p;. = p;’. (7) 215 THE USE OF ANALYSIS OF VARIANCE PROCEDURES where p;, is the average of cell means for level i of factor A. Using the 2 x 3 layout as an illustration, we may wish to test the hypothesis that weights of females differ from those of males when averaged across all diets in the experiment. The test of interest concerns the 'average effect of sex' (averaged over diets), and the hypothesis is expressed as Ho (pi1 + pi2 + pi3)/3 = (pzi + p22 + p23)/3 Differential effects for levels of A across all levels of B address what is typically (but not exclusively) identified as the 'main effect of A'. It is labelled here as the 'average main effect of A'6 to emphasize that it is an average effect. It should also be emphasized that the hypothesis does not 'average out' the effects of B, since the possible effect of interactions between A and B is still present. Nor does the hypothesis specify that factor A has no influence on the pattern of cell means, again because of the possible influence of interactions. In fact the hypothesis specifies nothing more than the equivalence of row averages. Stronger assertions concerning the complete lack of influence of factor A require different, more complex hypotheses. These points often are overlooked by the users of ANOVA. Hypothesis (7) can be expressed in terms of parameters of the RPM by means of equation (3) as * = CY;'* Ho : a; (8) From equation (4), hypothesis (7) for the FPM is given by  Ho :ayi + Ti. = a,' + r;, The latter form expresses directly the influence of interactions, thus alerting the researcher not to 'overinterpret' Ho. Again it is emphasized that all three expressions for HOare equivalent, though their mathematical forms differ. These equivalences are displayed in Table 111. Table I l l . Equivalent hypotheses for average main effects and interactions, expressed in terms of parameters in each of three standard models. It is assumed that all cells are filled and that i # i' and j # j ' ~~ FPM CMM A effect B effect Interaction pi. = p r ' . p.j' aI p, @.j= p i j  pij' = pr'j  p i ' j ' + r;.= a;' + P i ' + r.j= pj, + r . j r rij= o RPM a: =o p,* =0 y*. 0 ?I  Average main effect of B An analogous hypothesis can be given for factor B, based on cell means averaged over all levels of factor A. The hypothesis is Ho :/ i . j = p,,, (9) where p . j is the average of cell means for level j of factor B. Again using the 2 x 3 layout, the test of interest concerns the average effects of diets when averaged across both males and females. The corresponding hypothesis may be expressed as (PI1 + p21)/2 = (p12 + p22)/2 Ho: ( p i i Ipz1)/2 = (pi3 + pz3)/2 Hypothesis (9) addresses what is typically called 'the main effect of B' and is labelled here as 216 BYRON K . WILLlAMS the ‘average main effect of B’. From equations (3) and (4) it also may be expressed with parameters of the RPM and FPM by  Ho : @ j + F . j = p j , + r . j , Ho:pj*=O; (10) respectively. As before, all three expressions of the hypothesis are equivalent. Interactions A third hypothesis of general interest concerns the relative effect of factor B as influenced by the level of A. The hypothesis of interest is Ho :pij  p,jf = p i ’ j  p i ’ j , (1 1) specifying that there is no interaction. The issue here is whether differences between means for two levels of factor B are specific to the level of factor A. If so, then the pattern of cell means is said to display interaction. Hypothesis (1 1) specifies that there is no interaction. From equations (3) and (4)equivalent forms for the RPM and FPM can be derived. They are Ho : I$ = 0 Ho : ri, = 0 (12) respectively. The hypotheses for average main effects and interactions are, of course, only three of many hypotheses that could be (and often are) addressed with a twofactor design. ’O” Because they are quite general, widely recognized and easily tested, these hypotheses nevertheless are appropriate components of many analyses of data. They address structural features of the cell means that are usually of interest to biologists, irrespective of any other characteristics under investigation. They are typically understood to be the hypotheses intended when ‘main effect’ and ‘interaction’ tests are conducted. l 3 * I 4 Finally, they are the hypotheses tested by many (though not all) standard computing packages. The set of equivalences described above for average main effect and interaction hypotheses is displayed in Table 111. Testing procedures A well documented procedure for testing hypotheses about the linear model is based on likelihood ratio statistics. Is In this procedure the hypothesis of interest is used as a constraint on model parameters, effectively reducing their number as described above. The test itself consists essentially of a comparison of OLS estimates of 0’ for the constrained and unconstrained models. If these estimates are not significantly different, then the hypothesis is said to be confirmed (or, more precisely, not disconfirmed). Comprehensive descriptions of test procedures for the linear model are given by many authors (see, e.g., Graybill,6 Kendall and Stuart,16 Rao l 7 and Searle”). Test statistics based on the likelihood ratio procedure are described below for hypotheses about average main effects and interactions. Where appropriate the computing forms are given in terms of the R (  )notation. l 8 For a general linear model containing the parameters 0, R ( 0 ) designates the reduction in sums of squares y ‘ y ‘due to’ inclusion of 0 in the model: R @ )= p x ’ y Average main effects of A The hypothesis for average main effects of A is given in expression (7) for the CMM, with an equivalent form in expression ( 8 ) for the RPM. Because the testing procedure for this THE USE OF ANALYSIS OF VARIANCE PROCEDURES 217 hypothesis is easier to describe in terms of the RPM, I use the latter form in what follows. It is assumed for now that all factor combinations occur in the design. Hypothesis (8) is incorporated into the RPM simply by deleting the a: and the corresponding columns of X* in the matrix form of the RPM (see Appendix I for the matrix expression of the RPM). An OLS estimate of variance is obtained for the resulting model and compared with the variance estimate for the unconstrained model, But the difference between variance estimates is just R(a* I p * , P * , r * ) = R ( p * , a * , P * , r * )  R ( p * , P * , r * ) the standard statistic used in regression analysis to test for significance of a*.Thus a test for the average main effect of A could be obtained from any OLS regression program. The following additional points should be noted: The standard regression procedure yields a valid test for average main effects if the RPM is used but not if the FPM is used. This is because R ( p , 0, r ) = R ( p , a,6 , r), so that R ( a I p, 0, r ) is identically zero. l 2 Thus the addition of ai in the model, after all other parameters are included, has no effect in reducing the sum of squares y ' y . None of the statistics R(a 1 p ) , R ( a I p , P ) and R ( a I p , P, r ) from the FPM provide a valid test of average main effects of A when the sampling design is unbalanced. For example, with the data from Table I1 a test for average main effects of sex, based on R(a* I p * , P*, I?*), is nonsignificant in the extreme (p>0.99). Indeed, for this example the average main effect of sex is identical for males and females. But tests based on R ( a I p ) and R(a I p , P ) , again using the data in Table 11, are both highly significant ( p > 0001 and p > 0.001 respectively). Clearly these statistics test different hypotheses from that of the average main effects of sex. Table IV exhibits hypotheses corresponding to R ( a 1) and R ( a I p,P). Table IV. Hypotheses corresponding to some commonly used test statistics. Both fully parametrized and restricted parametrization model forms are displayed. It is assumed that all cells are filled and that i # i' and j # j ' Computing form The computing form for R(a* I p * , P * , r * ) is Hypothesis 218 BYRON K. WILLIAMS where j I.  CjYij./b l/Wi = ( C jl/nij)/b2 Though it is somewhat obscure theoretically, this test statistic is computationally straightforward. If necessary, it could be computed without the aid of sophisticated computing equipment. Average main effects of B The hypothesis for average main effects of B is given in expression (9), with an equivalent form for the RPM in expression (10). It is tested in a way that is quite analogous to the test for average main effects of A. The appropriate test statistic is ~ ( p Ip*,a*,r*) * =~(~*,~*,p*,r*)~(~*,~*,r*) where R ( p * , a*, r*)is obtained by the deletion of the p* and the corresponding columns of X* in the matrix form of the RPM. As with the A effect, this procedure works with the RPM but not with the FPM. Overparametrization of the FPM results in R(P I p , a , r ) being identically zero. I' In addition, neither R(P I p ) nor R ( 0 1 p , a ) provides valid tests of average main effects of B if the design is unbalanced. Hypotheses corresponding to these statistics are shown in Table IV. The computing form for R(p* 1 p * , a*, r*)is where Interactions Hypothesis ( 1 1) of no interactions among cell means is incorporated into the RPM by deleting the r*and the corresponding columns of X* from the matrix form of the RPM. The resulting model is simply an RPM including factors A and B but with no interaction between them. Thus the appropriate test statistic is R ( r * I p*, =R ( ~ *a*, , p*, r*) R ( ~ *a*, , p*) the standard regression statistic for testing the significance of The following points about this test are noteworthy: r*. (a) R ( r * I p * , a*, p * ) and R ( r I p , a , p ) for the RPM and the FPM are identical for the twofactor model. Such an equivalence obtains only for the highestlevel interactions in a crossed design. (b) If the FPM is used, then calculation of R ( r I p , a,p ) is not straightforward. Nonestimable side conditions must be used to compute the appropriate forms, as described by GraybilL6 Searle12 and others. A derivation of R ( p , a , 0, r ) is provided in Appendix 111. (c) The computing form for R(r*I p * , a*,p*) is where r is a ( b  l ) x l vector with rj=j.jEinijjji,, j = l , ..., b  1 and C is a ( b  1) x ( b  1 ) matrix with Cjj = n.j  Cin'ij/ni. and C j j , =  Cin;jnij,/ni.,j # j ' . Without a computer, the calculation of this nonintuitive and rather complex expression is feasible only for very simple designs. THE USE OF ANALYSIS OF VARIANCE PROCEDURES 219 MISSING CELLS In the developments above all factor combinations in the design are assumed to be represented in the design. Under this assumption reasonable hypotheses can be tested with standard computing formulae, as outlined in the preceding section. It is not uncommon, however, for there to be missing cells in biological data, for essentially the same reasons that designs are unbalanced. The problems caused by missing cells can be illustrated by a simple example. Consider the 2 x 3 design described earlier, for which there are no observations for cell (2,3): 1 [ ii[ 1 1 2 iT3            F22    _  _        Assume, for example, that because of some unforeseen occurrence all the males on diet 3 die. Then the ANOVA includes no data for cell (2,3). However, cell means for this design may still be expressed in terms of the parameters for each of the statistical models. The Crestrictions now yield * =  Q 1* = a * rT2= r&= r:* = rTl = r* r t =o p:=  p :  p : Q2 so that parameter associations between the CMM and the RPM are given by i; ; y ; j[] I p= 1  1 0  1 0 I r* Solving for parameters in the RPM in terms of p (see Appendix I) yields + p12)/2  (p21 + p 2 2 ) / 2 0: = (5p11+ 3p21) ( p i 2 +4p13 + 3p22) 0; = (5p12 + 3p.22)  (pi1 + 4p13+ 3p21) r*= p l l  p12 p21+ p22 Q* = (PI1 Analogous associations can be derived for the RPM and the FRM: (Y* = (a1 a2)/2 + (rll+ rI2 rZl r22)/2 6.+ (5rll r12 4rI3+ rZl 3r22)/i2 = p2  6.  (rll  5rI2 + 4r13+ 3r21 3r22)/i2 r*= (rll rI2 rZl+ r22/4 P: p: = p1 The following points are noted: (a) The pattern of missing cells determines the associations of parameters among models. Therefore the interpretation of parameters for either the RPM or the FPM differs for each pattern. This is a serious limitation to their use in analyses with missing cells. (b) It often is unclear what structural characteristics of the cell means are being tested when there are missing cells. That is, it is unclear how to interpret such hypotheses as Ho :a* = 0 220 BYRON K . WILLIAMS and Ho : [$] =0 This lack of interpretability becomes more severe as the design increases in size and complexity. In this circumstance the advantages of the cell means model, with which hypotheses can be structured to conform to the pattern of available data, become obvious. As an example, assume again that data are missing for cell (2,3) of diet study. While the standard test for diet effects is not directly interpretable, a simplified hypothesis is: Ho: + P21 = + P22 PI1 = P13 Note that cells are ‘matched’ in this hypothesis (e.g., the same rows are used within each column when testing for column effects), so that it is easily interpretable as a measure of column effects. There are essentially three costs associated with this gain in interpretability: The hypotheses are no longer unique. For example, HOfor column effects could as easily have been The hypotheses are inefficient, in the sense that they d o not make as complete use of the data as other hypotheses could. Thus the corresponding tests are not as powerful as other tests could be. The statistics used for testing main effects and interactions are no longer independent (whereas in balanced designs the standard regression statistics for main effects and interactions are). Despite these limitations, however, the advantages of specialized hypotheses are obvious: tests that are meaningful are always preferred over tests that are irrelevant, irrespective of independence and power considerations. Specification of appropriate hypotheses is not, strictly speaking, a statistical problem. Of the many possible hypotheses that may be formulated, some are apt to make more sense to the researcher than others. It remains for him or her to specify the hypotheses of interest, at which point the statistical methods outlined above can be exercised. CONCLUSIONS An important source of confusion in the application of ANOVA procedures has to do with the associations among model form, hypothesis form and computing form. I have attempted here to describe the methods of ANOVA in terms of three common model formulations and in terms of the usual parameter estimation and hypothesis testing procedures of general linear model theory. It is important to realize that the flow of logic can operate in either of two directions: (1) from hypothesis formulation to computation of the corresponding test statistics (2) from computation of a test statistic to formulation of the corresponding hypothesis. The first approach corresponds to the usual application of the scientific method, with hypotheses stated at the outset and data analysed according t o an appropriate sampling or experimental design. The second approach is a matter of computational convenience, wherein available computing procedures essentially determine the hypotheses to be tested. As long as THE USE OF ANALYSIS OF VARIANCE PROCEDURES 22 1 a design is balanced, there is no confusion between these approaches: the 'usual' hypotheses are tested by the standard computing forms. Whenever the design is unbalanced, however, standard computing forms can test very unusual hypotheses indeed. These hypotheses often are left unstated and can differ considerably among ANOVA computer programs. It is ultimately the responsibility of the researcher to insure that the hypothesis tested is in fact the hypothesis which is intended (see Milliken and Johnson, l 1 Searle' and Searle et d 4for hypotheses tested in several comercially available programs). That this correspondence is not assured indicates at least a minimal amount of inspection of the testing procedure is necessary. It furthermore cautions against any attempt to generalize from balanced designs, for which test statistics are independent, computations are simple and hypotheses are unambiguous, to unbalanced designs for which none of these attributes is assured. APPENDIX I In this appendix examples are provided of the three models described in the text, for the data set shown in Table I. Associations among parameters in these models also are developed. Specification of the cell means model (CMM) involves a system of 20 equations (one for each observation) in six parameters (the six cell means). A matrix form for this specification is 1 8 22 57 63 76 78 82 84 78 80 82 58 60 62 16 17 19 ' 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 o o o o o i 0 0 0 0 0 1 Ell1 El12 €121 El22 El31 El32 El33 El 34 E211 f212 E213 E22l E222 E223 T23 1 E232 E233 E234 €235 E236 The general matrix fo m for this model is where y is an n x 1 vector of observations, p is a p x 1 vector of cell means, E is an n x 1 vector of random errors and W is an n x p matrix of zeros and ones corresponding to the appropriate cell means. OLS estimates of cell means pij are the corresponding sample means given in equation (1). The OLS estimate of variance is given in equation (2). The fully parametrized model (FPM) includes parameters for main effects and interactions, so that a system of 20 equations in 12 parameters is specified. They are expressed in matrix form BYRON K . WILLIAMS by 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8' 22 57 63 76 78 82 84 78 80 82 58 60 62 16 17 19 21 23 _:24 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 e111 e112 321 c122 p131 e132 €133 El 34 c211 €212 &213 &221 €222 €223 &23I f232 f233 E23.1 t235 €2315 The general matrix form of the model is y=x/3+e where p is a vector of model parameters, X is a 'design matrix' of zeros and ones6 and y and E are as before. Unique estimates of individual parameters in the model cannot be obtained. Finally, a parametrization that includes the Crestrictions is given by a system of 20 equations including six linearly independent parameters. This is the restricted parametrization model (RPM), representing the sample data by 18' 12, 5;' 53 16 ia 62 54 78 80 82 58 60 62 16 17 19 21 23 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 8 1 1 1 I I 1 0 0 0 1 1 1 1 1 1 1 1 1. € 1 I I' € 1 12 El21 €122 El31 El32 *' El33 El34 f211 i E212 c213 E22 1 p222 &223 f231 E232 E233 E234 E235 €236 THE USE OF ANALYSIS OF VARIANCE PROCEDURES 223 The matrix form of the RPM is y =x*g*=E (14) where g* includes linearly independent parameters and X* is the corresponding design matrix. Since X* is full column rank, unique OLS estimates of the parameters can be obtained. These are given by the standard computing formulae6 fj* = ( x * ’ x * )  ’ x * ’ y and s2 = (y’y  B*’x*’y)/(n  p ) The expression B * ‘ X * ’ y , denoted by R(fl*), is the reduction in the sum of squares y ’ y ‘due to’ the estimator 8*.’The resulting variance estimate is identical to that given in equation (2), as demonstrated in Appendix 111. Associations among model parameters follow from the equivalence of cell means model expressions. For the CMM and RPM these equivalences are expressed by p = u*g* where g* contains independent parameters only. Since U* is invertible, g* may be written as fl* = u*lp For designs that include all factor combinations this reduces to equations (3). For the RPM and FPM the equivalences among cell means yields u*fl*= ug By inverting U* we have g* = u*qJg which, on condition that all factor combinations occur in the design, reduces to equations (4). Finally, the parameters in the FPM and CMM are related by p= ufl Because of the overparametrization of the FPM, it is not possible to uniquely express fl in terms of p . APPENDIX I1 Constraints on the general linear model typically are expressed in terms of a set of linear equations involving model parameters. For example, constraints for the cell means model are given by Gp=O where G is a g x ab matrix with g the number of constraints. By an appropriate permutation of components, the vector p of cell means can always be partitioned so that G p = Gipi where G2 is a g x g invertible matrix. l9 + Gzp2 This factorization allows one to express p~ in terms of 224 BYRON K . WILLIAMS PI by p2 =  G t 'Gipi which in turn can be used to reduce the number of model parameters: y=Wp+E = I:[ [ w l :Wt] +E = (WI  WzGt'G1)pi = w*p1 +E +& OLS estimates for cell means in the constrained model then are given by jil = ( w * ' w * )  ' w * ' y with the standard variance estimator s2 = ( Y ' Y  j i l ' w * l Y )/ 1n  ( P  &?>I Once j i ~is obtained, the remaining cell means are estimated by equation (15). APPENDIX I11 In this appendix the expressions for sample variance arising from the three models are shown to be equivalent. This equivalence is first established between the variance estimates of the CMM and the RPM. From equations (13) and (14) we have w p = X*@* which can be rewritten as ( w u * ) ( u *  ' p ) = X*O* But U*'p = /3* from equation (8), so that wu* = x* By the invariance property of maximum likelihood estimators Is we have = u*'ji s* so that R ( @ * )= B*'X*'Y = (u*lji)' (WU*)'Y  ji I (U* I  1u*I )WIy = ji'W'y =R(p) Since variance estimates for the two models come directly from reduction of the sum of squares y'y by R ( @ * )and R h ) respectively, this establishes the equivalence of estimates for the CMM and the RPM. T H E USE O F ANALYSIS OF VARIANCE PROCEDURES 225 For the FPM a set of a + b + 1 nonestimable side conditions GB=O must be imposed in order to obtain an estimate of variance. Nonestimability obtains whenever XG‘ =o and for purposes of variance estimation any full rank set of such conditions is acceptable. The elements of may be permuted to write Cror + G2B2 = 0 where G2 is an invertible matrix of dimension a + b + 1. Then we have 8 2 =  GY’GrBr so that X B = XlBr + X2B2 = (XI  XzG2Gr)Br =XrBr where X r is full column rank and Br has dimension ab. Thus W p = XrBr and by the same argument as with the RPM we have R ( P ) = R(Br) It follows immediately that the variance estimates for all three models are identical. REFERENCES I . F. M. Speed, R. R. Hocking and 0. P. Hackney, ‘Methods of analysis of linear models with unbalanced data’, J . Amer. Stat. Assoc., 73, 105112 (1978). 2. R. R. Hocking and F. M. Speed, ‘A full rank analysis of some linear models problems’, J. Amer. Stat. Assoc., 70, 706712 (1975). 3. R. R. Hocking, 0. P . Hackney and F. M. Speed, ‘The analysis of linear models with unbalanced data’, in H . A. David (ed.), Contributions to Survey Sampling and Applied Statistics, Academic Press, New York, 1978. 4. S. R. Searle, F. M. Speed and H. V. Henderson, ‘Some computational and model equivalences in analysis of variance of unequalsubclassnumbers data’, Amer. Stat., 35, 1633 (1981). 5 . N. S. Urquhart, and D. L. Weeks, ‘Linear models in messy data: some problems and alternatives’, Biometrics, 34, 695705 (1978). 6. F. A. Graybill, Theory and Application of the Linear Model, Duxbury Press, North Scituate, Massachusetts, 1976. 7. S. R. Searle, ‘Annotated computer output for analysis of variance of unequalsubclassnumbers data’, Amer. Stat.. 33, 222223 (1979). 8. G. W. Snedecor and W. G. Cochran, Statistical Methods, Iowa State University Press, Ames, 1980. 9. H. Scheffe, The Analysis of Variance, Wiley, New York, 1959. 10. M. B. Brown and A. 8 . Forsythe, ‘Robust tests for the equality of variances’, J. Amer. Stat. Assoc., 69. 364367 (1974). 1 1 . G. A. Milliken and D. E. Johnson, Analysis of Messy Data, Vol. I : Designed Experiments, Wadsworth Inc., Belmont, California, 1984. 12. S. R. Searle, Linear Models, Wiley, New York, 1971. 13. I. Francis, ‘Comparison of several analysis of variance programs’, J. Amer. Stat. Assoc., 68, 860865 (1973). 14. M. H. Kutner, ‘Hypothesis testing in linear models (Eisenhart model l)’, Amer. Stat., 28, 98100 (1974). 226 BYRON K. WILLIAMS 15. A. M. Mood, F. A. Graybill and D. C. Boes, Introduction to the Theory of Statistics, McGrawHill, New York, 1974. 16. M. G. Kendall, and A. Stuart, The Advanced Theory of Statistics, Vol. 3: Design and Analysis, and Time Series, Hafner Press, New York, 1964. 17. C . R. Rao, Linear Statistical Inference and Its Applications, Wiley, New York, 1965. 18. F. M. Speed, and R. R. Hocking, ‘The use of the R(.)notation with unbalanced data’, Amer. Stat., 30, 3033 (1976). 19. D. W. Smith, and L. W. Murray, ‘A simplified treatment of the estimation of parameters and tests of hypotheses in constrained design models with unbalanced data’, Amer. Stat., 37, 156158 (1983). 20. W. J. Hemmerle, ‘Balanced hypotheses with unbalanced data’, J. Amer. Stat. Assoc., 74, 794798 (1979). 21. R. R. Hocking, F. M. Speed and A. T. Coleman ‘Hypotheses to be tested with unbalanced data’, Comrnun. Statis.Theor. Meth. 9, 117127 (1980). 22. S . R. Searle, F. M. Speed and G. A. Milliken, ‘Population marginal means in the linear model: an alternative to least squares means’, Amer. Statist., 34, 216221 (1980).