Consider the data in Table 30.2 (Stokes, Davis, and Koch, 2000).
Table 30.2: Colds in Children
Periods with Colds 


Sex 
Residence 
0 
1 
2 
Total 
Female 
Rural 
45 
64 
71 
180 
Female 
Urban 
80 
104 
116 
300 
Male 
Rural 
84 
124 
82 
290 
Male 
Urban 
106 
117 
87 
310 
For male and female children in rural and urban counties, the number of periods (of two) in which subjects report cold symptoms are recorded. So 45 subjects who are female and in rural counties report no cold symptoms, and 71 subjects who are female and from rural counties report colds in both periods.
The question of interest is whether the mean number of periods with colds reported is associated with gender or type of county. There is no reason to believe that the mean number of periods with colds is normally distributed, so a weighted least squares analysis of these data is performed with PROC CATMOD instead of an analysis of variance with PROC ANOVA or PROC GLM.
The input data for categorical data are often recorded in frequency form, with the counts for each particular profile being
the input values. For the colds data, the input SAS data set colds
is created with the following statements. The variable count
contains the frequency of observations that have the particular profile described by the values of the other variables in
that input line.
data colds; input sex $ residence $ periods count @@; datalines; female rural 0 45 female rural 1 64 female rural 2 71 female urban 0 80 female urban 1 104 female urban 2 116 male rural 0 84 male rural 1 124 male rural 2 82 male urban 0 106 male urban 1 117 male urban 2 87 ;
In order to fit a model to the mean number of periods with colds, you have to specify the response function in PROC CATMOD. The default response function is the logit if the response variable has two values, and it is generalized logits if the response variable has more than two values. If you want a different response function, then you specify that function in the RESPONSE statement. To request the mean number of periods with colds, you specify the MEANS option in the RESPONSE statement.
You can request a model consisting of the main effects and interaction of the variables sex
and residence
just as you would in the GLM procedure. Unlike the GLM procedure, PROC CATMOD does not require you to use a CLASS statement
to treat a variable as a classification variable. In the CATMOD procedure, all variables in the MODEL statement are treated
as classification variables unless you specify otherwise with a DIRECT statement. To verify that your model is specified correctly, you can specify the DESIGN option in the MODEL statement to display the design matrix.
The PROC CATMOD statements needed to model mean periods of colds with a maineffects and interaction model are as follows:
proc catmod data=colds; weight count; response means; model periods = sex residence sex*residence / design; run;
The results of this analysis are shown in Figure 30.1 through Figure 30.3.
In Figure 30.1, the CATMOD procedure first displays a summary of the contingency table you are analyzing. The “Population Profiles” table lists the values of the explanatory variables that define each population, or row of the underlying contingency table, and labels each group with a sample number. The number of observations in each population is also displayed. The “Response Profiles” table lists the variable levels that define the response, or columns of the underlying contingency table.
Figure 30.1: Model Information and Profile Tables
Data Summary  

Response  periods  Response Levels  3 
Weight Variable  count  Populations  4 
Data Set  COLDS  Total Frequency  1080 
Frequency Missing  0  Observations  12 
Population Profiles  

Sample  sex  residence  Sample Size 
1  female  rural  180 
2  female  urban  300 
3  male  rural  290 
4  male  urban  310 
Response Profiles  

Response  periods 
1  0 
2  1 
3  2 
The “Response Functions and Design Matrix” table in Figure 30.2 contains the observed response functions—in this case, the mean number of periods with colds for each of the populations—and
the design matrix. The first column of the design matrix contains the coefficients for the intercept parameter. The second
column contains the coefficients for the sex
parameter. (Note that the sumtozero constraint of the default fullrank parameterization PARAM=EFFECT implies that the coefficient for males is the negative of that for females; the parameter is called the differential effect for females.) The third column is similarly set up for residence
, and the last column is for the interaction.
Figure 30.2: Observed Response Functions and Design Matrix
Response Functions and Design Matrix  

Sample  Response Function 
Design Matrix  
1  2  3  4  
1  1.14444  1  1  1  1 
2  1.12000  1  1  1  1 
3  0.99310  1  1  1  1 
4  0.93871  1  1  1  1 
The modelfitting results are displayed in the “Analysis of Variance” table (Figure 30.3), which is similar to an ANOVA table. The effects from the right side of the MODEL statement are listed in the Source column.
Figure 30.3: ANOVA Table for the Saturated Model
Analysis of Variance  

Source  DF  ChiSquare  Pr > ChiSq 
Intercept  1  1841.13  <.0001 
sex  1  11.57  0.0007 
residence  1  0.65  0.4202 
sex*residence  1  0.09  0.7594 
Residual  0  .  . 
You can see in Figure 30.3 that the interaction effect is nonsignificant, so the data are reanalyzed using a maineffects model. Since PROC CATMOD is an interactive procedure, you can analyze the maineffects model by simply submitting the new MODEL statement as follows. The resulting tables are displayed in Figure 30.4 and Figure 30.5.
model periods = sex residence / design; run;
From the ANOVA table in Figure 30.4, you can see that the goodnessoffit chisquare statistic is 0.09 with one degree of freedom and a pvalue of 0.7594; hence, the model fits the data. Note that the chisquare tests in Figure 30.4 check whether all the parameters for a given effect are zero. In this model, each effect has only one parameter and therefore only one degree of freedom.
Figure 30.4: MainEffects Model
Data Summary  

Response  periods  Response Levels  3 
Weight Variable  count  Populations  4 
Data Set  COLDS  Total Frequency  1080 
Frequency Missing  0  Observations  12 
Population Profiles  

Sample  sex  residence  Sample Size 
1  female  rural  180 
2  female  urban  300 
3  male  rural  290 
4  male  urban  310 
Response Profiles  

Response  periods 
1  0 
2  1 
3  2 
Response Functions and Design Matrix  

Sample  Response Function 
Design Matrix  
1  2  3  
1  1.14444  1  1  1 
2  1.12000  1  1  1 
3  0.99310  1  1  1 
4  0.93871  1  1  1 
Analysis of Variance  

Source  DF  ChiSquare  Pr > ChiSq 
Intercept  1  1882.77  <.0001 
sex  1  12.08  0.0005 
residence  1  0.76  0.3839 
Residual  1  0.09  0.7594 
The “Analysis of Weighted Least Squares Estimates” table in Figure 30.5 lists the parameters and their estimates for the model, as well as the standard errors, Wald statistics, and pvalues. These chisquare tests are onedegreeoffreedom tests that the individual parameter is equal to zero. They are equal to the tests shown in Figure 30.4 since each effect is composed of exactly one parameter.
Figure 30.5: Parameter Estimates for the MainEffects Model
Analysis of Weighted Least Squares Estimates  

Parameter  Estimate  Standard Error 
Chi Square 
Pr > ChiSq  
Intercept  1.0501  0.0242  1882.77  <.0001  
sex  female  0.0842  0.0242  12.08  0.0005 
residence  rural  0.0210  0.0241  0.76  0.3839 
You can compute the mean number of periods with colds for the first population (Sample 1, females in rural residences) from Table 30.2 as follows:

This is the same value reported in the Response Function column for Sample 1 in the “Response Functions and Design Matrix” table displayed in Figure 30.4.
PROC CATMOD is fitting a model to the mean number of colds in each population as follows:

where the design matrix is the same one displayed in Figure 30.4, is the mean number of colds averaged over all the populations, is the differential effect for females, and is the differential effect for rural residences. The parameter estimates are shown in Figure 30.5; the expected number of periods with colds for rural females from this model is computed as

and the expected number for rural males from this model is

Notice also, in Figure 30.5, that the differential effect for residence is nonsignificant (p = 0.3839). If you continue the analysis by fitting a singleeffect model (sex
), you need to include a POPULATION statement to maintain the same underlying contingency table:
population sex residence; model periods = sex; run;