Repeated

<jats:p>-</jats:p>


INTRODUCTION
In this article we describe generalized linear latent and mixed models (GLLAMMs) and illustrate their potential in epidemiology.
We begin by briefly describing 'generalized linear models' (1) which encompass common epidemiological tools such as linear regression, dichotomous logistic regression and Poisson regression.Subsequently, we outline 'generalized linear mixed models' (2) where random effects are included in generalized linear models.These models are called mixed since both fixed and random effects are incorporated.Although generalized linear mixed models are undoubtedly useful, it turns out that this framework is too limited for some applications, including many in epidemiology.
This was our motivation (3,4) for extending generalized linear mixed models to the class of 'generalized linear latent and mixed models' (GLLAMM).GLLAMMs include many types of latent variables varying at different levels such as random effects, common factors and latent classes.Latent variables can be regressed on covariates and other latent variables.A wide range of response types are accommodated including continuous, dichotomous, ordinal and nominal responses as well as counts and survival times (discrete and continuous).Multivariate responses can furthermore be of mixed type, some responses may for instance be continuous and others dichotomous.
The utility of GLLAMMs is illustrated in three applications: logistic regression for repeated measurement data (course of illness in schizophrenia), logistic regression with covariate measurement error (diet and coronary heart disease), and multilevel modelling of nominal data (abuse of antibiotics).

GENERALIZED LINEAR MODELS
Let y i be the response and x i explanatory variables or covariates for unit i, and define the conditional expectation of the response given the covariates as µ i , i.e.
Generalized linear models are specified as where the linear combination ν i = β 0 + β 1 x i1 + • • • = x i β is called the 'linear predictor' and β are fixed effects.g is a 'link function', linking the expected response µ i to the linear predictor ν i .The specification is completed by choosing a conditional distribution for the responses y i given the covariates x i , f (y i |x i ), from the exponential family of distributions.

Links and distributions
Some of the links and distributions that can be combined for generalized linear models are:

Logistic regression
Dichotomous responses are legion in epidemiology, the archetypical example being disease (1:present, 0:absent).The expectation of a dichotomous response y i is just the probability that y i = 1, i.e. µ i = Pr(y i = 1|x i ).The logistic regression model for dichotomous responses results from combining the bernoulli distribution (binomial with N = 1) with the logit link: An equivalent way of formulating the logistic regression model is often useful.This approach, which appears to be unfamiliar to most epidemiologists, express logistic regression as a latent response model (4).Here, we consider a linear regression model for a continuous latent response or underlying variable y * i (called 'liability' in genetics), where the residual i has a logistic distribution with zero mean and variance π 2 /3.The dichotomous observed response y i is simply related to the continuous latent response y * i via a threshold model We cannot resist pointing out that some epidemiologists tend to dichotomize any outcome variable, a practice that should be avoided.It should also be noted that other approaches than logistic regression, such as linear risk models with an identity link, have been advocated for dichotomous responses in epidemiology (5).

Poisson regression
Epidemiologists using cohort designs are often interested in studying the incidence rates of diseases or other outcomes.The conventional Poisson regression model for counts and rates results from combining the log link with the Poisson distribution:

GENERALIZED LINEAR MIXED MODELS
A crucial assumption of generalized linear models is that the responses of different units i are independent given the covariates x i .Unfortunately, this assumption is often unrealistic since data are frequently of a multilevel nature with units i nested in clusters j.
Examples of such two-level designs include repeated measurements (units) nested in subjects (clusters) or subjects (units) nested in families (clusters).There will often be unobserved heterogeneity at the cluster level representing confounders that are omitted either because they cannot be measured or because their existence is unknown.The unobserved heterogeneity induces dependence among the units, even after controlling for observed heterogeneity (covariates) at the unit and cluster levels.
The combined effect of all unobserved cluster-level covariates is modeled by including random effects eta (2) mj in the linear predictor which take on the same value for all units in the same cluster Here, M −1,j ) are random effects varying at level 2 and z ∼ N(0, Ψ (2) ).

Random intercept model
The simplest generalized linear mixed model is the random intercept model where M = 1 and z ( Here, η 0j is a random intercept, allowing the overall level of the linear predictor to vary between clusters j over and above the variability explained by the covariates x ij .

Random coefficient model
1ij be a unit-specific covariate, we obtain a model with a random coefficient in addition to a random intercept The model could of course include several random coefficients.Usually x 1ij , implying that β 1 represents the mean effect and η (2) 1j cluster-specific random deviations from this mean effect.While random intercepts can be thought of as main effects of omitted covariates, random coefficients represent interactions between omitted and included covariates.Here, η (2) 1j is a random coefficient, allowing the effects of z (2) 1ij to vary between clusters j.In repeated measurement modelling, z (2) 1ij would typically represent time, so that the rate of change of the response over time (the slope) can vary between subjects.The model can be further extended to include quadratic components (z (2) 1ij ) 2 and higher order polynomials.

Truly multilevel models
Data are often 'truly' multilevel with more than two hierarchical levels.For instance, children i (level 1) may be nested in doctors j (level 2) who are nested in hospitals k (level 3), as in an application considered later.In the repeated measurement setting we could have measurement occasions i (level 1) nested in subjects j (level 2) who are nested in families k (level 3).We would expect dependence between different subjects in the same family, which can be modeled by introducing family-level random effects.However, the dependence should be even higher among different responses for the same subject, which can be modeled by using subject-level random effects (in addition to the family-level effects).
A generalized linear mixed model for three-level data can be written as x ijk β Fixed part Level-2 random part

Level-3 random part
Here, ijk , z ijk , η jk , η k ] where η ijk and η The random effects at each level are multivariate normal, and assumed to be independent across levels.If required, the model can easily be extended to four or more hierarchical levels.

GENERALIZED LINEAR LATENT AND MIXED MODELS (GLLAMMs)
Although generalized linear mixed models are very useful, the framework is too limited for many problems in epidemiology.GLLAMMs therefore provide five extensions to generalized linear mixed models (where we refer to η as latent variables, including random effects, factors, etc.): where z (2) mij is a vector of covariates with corresponding vector of coefficients, called factor loadings, λ (2)  m .For identification, the first coefficient λ (2) mij is a scalar for all m.This extension of the generalized linear mixed model allows factor models to be incorporated in multilevel models.Here (some of) the level-one units are the response variables of the factor model and the z (2) mij are dummy variables that assign factor loadings to the appropriate responses.
The basic idea of factor models is that one or more unobserved variables, latent traits or factors 'explain' the dependence between different observed measurements for a subject, in the sense that the measurements are conditionally independent given the factor(s).We now consider a number of subjects j providing yes/no responses to a set of measurements or 'items' i, for instance to ascertain asthma among children with symptoms such as wheezing, heavy breathing and coughing at night.By regarding the symptoms as nested in children, a logit factor model relating the probability of having a symptom to a single factor η (2) 1j , interpretable as the childrens' unobserved severity of asthma, can be defined as where x ij and z (2) 1ij are vectors whose ith element equals 1 and all other elements equal 0. The 'factor' η (2) 1j represents the 'true asthma' of the jth child, β i reflects the relative prevalence of the ith symptom and λ (2) 1i , the factor loading, reflects how well the ith symptom discriminates between children with different severities of asthma.This model is also known as a two-parameter item response model, 'twoparameter' referring to β i and λ (2) 1i for each item i.If there are several children nested in families, further latent variables could be added at the family level (level 3).Another example where factor models are useful is for repeated fallible measurements of nutrients, an application we will return to in a later section.
A more general 'two-level' (here three-level due to variables being considered level 1 units) factor model would allow different factor structures at the two levels.Such a model has been used for attitudes to abortion (3), where questionnaire items were nested in subjects who were nested in polling districts.Note that the practice of substituting different kinds of 'scores' for factors in multilevel modeling should be abandoned since this will in general preclude valid statistical inference (6).

Multilevel structural equations
As an example of the use of structural equations, consider the item response model in equation ( 9).If there are several children nested in families k, 'true asthma' may depend on child-specific covariates (e.g.age, w 1jk ) and family-level covariates (e.g.presence of a pet, w 2k ), and there may be residual heterogeneity between families represented by η 1k , for instance of a genetic nature.Using indices i, j, k for symptoms, children and families, respectively, we can write the 'measurement model' as 1ijk , (10) and add the 'structural equation model' In general, GLLAMMs allow latent variables (random coefficients and/or factors) to be regressed on other latent variables (random coefficients and/or factors) and covariates (3,4).It is important to note that GLLAMMs extend conventional structural equation models (7) by permitting latent variables to be regressed on same or higher level latent variables.

Discrete latent variables
For two-level models, the latent variables can have a discrete distribution with non-zero probability at a finite number of points (of dimensionality equal to the number of random effects).This is useful if the level 2 units are believed to fall into a number of groups or 'latent classes' within which the latent variables do not vary (8).For instance, such a formulation seems natural for medical diagnosis of say myocardial infarction where it is presumed that the patient is either ill or not ill, but the physician must resort to several fallible indicators of disease.For the case of depression, however, it might be more natural to view the disease as having different degrees of severity, consistent with a continuous illness distribution.
If the number of latent classes, or masses, is chosen to maximize the likelihood, the nonparametric maximum likelihood estimator (NPMLE) can be achieved (9,10).Importantly, this approach enables us to relax the assumption of multivariate normal latent variables and thus makes our inferences more robust.We consider the use of NPMLE in our covariate measurement error application.

Additional response types
In addition to the response types typically considered in generalized linear mixed models, GLLAMMs accommodate several additional response types and links.
An ordinal response is one among a set of categories with a common a priori ordering.An example is severity of illness in terms of the categories 'absent', 'mild', 'moderate' and 'severe'.The following links can be used for ordinal responses: family: Ordinal responses: ordinal logit ordinal probit ordinal compl.log-log scaled ord.probit Nominal responses are in terms of categories which do not have a common a priori ordering.A polytomous response is one among a set of unordered categories.For instance, the subject may be asked which contraceptive was used (if any) during the last sexual intercourse.Rankings involve ordering of categories, for instance a patient's preference ordering of different drugs for a specific condition.The following link is available for nominal responses: Polytomous & Rankings: multinomial logit

Responses of mixed types
Different links and families can be combined for multivariate responses of mixed type.One example would be the modelling of asthma, where some symptoms could be ascertained as dichotomous (present/absent), some as ordinal (severe/mild/absent) and others as counts (number of episodes).Handling mixed responses is also necessary for modeling joint processes, for instance joint modeling of survival and repeated measurements (11) or hospital delivery and child mortality (12).The possibility of specifying models with mixed responses also turns out to be required for our covariate measurement error application.

IMPLEMENTATION
All models in the GLLAMM framework can be estimated using the program gllamm (13).This program, written in Stata (14), implements maximum likelihood estimation and empirical Bayes prediction for GLLAMMs.Numerical integration by adaptive Gauss-Hermite quadrature ( 15) is used to integrate out the latent variables and obtain the marginal log-likelihood.This log-likelihood is maximized by Newton-Raphson using numerical first and second derivatives.Empirical Bayes predictions are posterior means of the latent variables given the observed responses with the parameter estimates plugged in.These predictions have many uses, for instance in predicting 'true' exposure in covariate measurement error models, plotting growth trajectories for individual units in longitudinal data and in model diagnostics.Both posterior means and posterior standard deviations are obtained by numerical integration using adaptive quadrature.
Monte Carlo experiments ( 16) have been performed to investigate our maximum likelihood methodology.The performance of adaptive quadrature was good in all cases, larger numbers of quadrature points being required for more difficult situations (17).Comparing gllamm with other software (using e.g.IGLS, PQL, MCMC and quadrature) good agreement was found between parameter estimates, standard errors and log-likelihood values (13,15).The exceptions were cases where simulation or parametric bootstrapping demonstrated that PQL produced biased estimates in contrast to quadrature (13,18,19).

MISSING DATA
The GLLAMM framework treats the variables of a multivariate response as level-1 units, thereby automatically handling unbalanced designs, for instance due to missing data.Maximum likelihood produces valid inferences when data are missing at random (MAR), where the probability of a response being missing (given the covariates) may depend on other observed responses but not on missing responses (20).Importantly, the stronger assumption of data missing completely at random (MCAR), where the probability of missingness (given the covariates) neither depends on observed nor missing responses, need not be invoked.Furthermore, missing data that are not MAR can be handled in GLLAMM by explicitly modeling the missingness mechanism (21).

SOME APPLICATIONS IN EPIDEMIOLOGY
We describe some applications of GLLAMMs in order to illustrate the potential of this methodology for epidemiological research.However, we do not purport to exhaust the types of applications that are likely to be useful in epidemiology.Furthermore, it should be appreciated that our applications are simplified for didactic reasons.

Logistic regression for repeated measurements: Course of illness in schizophrenia
The Madras Longitudinal Schizophrenia Study (22) followed up 86 patients monthly after their first hospitalization for schizophrenia.An important ques-tion is whether the course of illness differs between men and women and between patients with early and late onset.Here we consider a subset of data previously analyzed (23), namely data on whether thought disorder was present or not at 0, 2, 6, 8 and 10 months after hospitalization.
Performing a complete-case or 'listwise' analysis would discard the information from 16 patients contributing a total of 45 responses (out of the scheduled 96).This will obviously lead to reduced power but more importantly produce biased estimates unless the measurements are missing completely at random (MCAR).
The variables are: .This model allows us to investigate the linear trend (on the logit scale) of time as well as differences between genders and between times of onset, not just in the overall odds of thought disorder but also in the trend over time.
Let y ij be the repeated measurement of thought disorder at occasion i for patient j.To model the dependence among the repeated measurements of thought disorder (given the covariates) we can include a patient-specific random intercept η 0j , giving the model ln Pr( where η 0j is normally distributed with zero mean.The inclusion of η 0j allows the overall logit of thought disorder to vary over patients, even after controlling for the covariates x ij (due to omitted patient-specific covariates).
The random intercept logistic regression model can equivalently be expressed as a latent response model, where the random intercept η 0j is now included.The dichotomous observed response is related to the latent response via a threshold model as previously specified for conventional logistic regression.The latent response formulation is useful for investigating the properties of logistic regression models with latent variables (24,4).For instance, the strength of the residual within-patient dependence can be expressed by the intra-class correlation for the repeated latent responses The estimates for the random intercept model are given in Table 1.The odds of thought disorder decrease over time in late-onset women with an estimated odds ratio of exp(−0.37)= 0.69 per month.Early onset patients seem to have a higher odds of thought disorder at first hospitalization (odds ratio exp(1.19)= 3.28).Interestingly, it also appears as if early onset patients have a greater decline in their odds of thought disorder over time.The intra-class correlation is estimated as ρ = 0.44, demonstrating that those having a higher than expected (lower) risk of thought disorder on one occasion tend to have a higher (lower) risk at other occasions, taking into account the covariates.
The random intercept model assumes that the logit of thought disorder declines at the same rate over time for all patients with the same covariate values.Since this may be unrealistic, we now allow these rates to vary randomly between patients by including a random slope η 1j of [Month] in the model, giving the random coefficient model ln Pr( where z 1ij represents [Month].The random intercept η 0j and slope η 1j are bivariate normally distributed with zero means. Estimates for the random coefficient model are reported in Table 1.The change in log-likelihood suggests that the random slope should be included in the logistic regression model.Overall, the fixed effects estimates are quite similar to those for the random intercept model.The random slope variance is estimated as 0.18 and the covariance between intercept and slope as −0.97, corresponding to a correlation of −0.78.Therefore those at higher risk of thought disorder at the time of hospitalization experience a greater reduction in their risk over time than those at lower risk.It is important to note that the random intercept variance and the correlation between the random intercept and coefficient are interpreted at [Month]=0.(Subtracting 5 months from [Month] yields an estimated correlation close to zero).
To gain more insight into the model, we have plotted the conditional or subject-specific probabilities of thought disorder given various values of the random intercept (±3) and slope (±0.4) for women with early onset.These are shown as dashed curves in Figure 1 where the dotted curve is the conditional probability for random intercept and slope both equal to their population means of zero, thus representing a 'typical' individual.Also shown as a solid bold curve  is the population average or marginal probability of thought disorder obtained by integrating the conditional probability over the random effects distribution.Note that the population average curve is considerably flatter than that of a typical patient.Such attenuation of the effects of covariates in marginal models compared with conditional models is a wellknown phenomenon for dichotomous responses (25).
Generalized estimating equations (GEE), see e.g. ( 26), are often used for estimating marginal relationships in longitudinal data.Using GEE with an 'exchangeable' correlation structure, the estimated effect of [Month] becomes −0.26, attenuated com-pared with −0.37 and −0.46 for the random intercept and random coefficient models, respectively.Figure 2 shows the population average probability curves for GEE and the random intercept and random coefficient models.As might be expected, all  Finally, some comments on random effects modeling versus GEE are in order.GEE is an estimation method for clustered data where the dependence within clusters is treated as a nuisance.The merit of GEE is that valid inferences are produced for population average effects as long as the mean structure is correctly specified, even if the dependence structure is misspecified.It should be noted that random effects models can also be made more robust by using nonparametric maximum likelihood estimation (NPMLE), see the following section.GEE has a number of severe limitations as compared to random effects modeling that are often not recognized.No insight is gained regarding individual trajectories of change in contrast to random effects modeling.In fact, longitudinal information is not exploited at all since estimation proceeds as if the data were repeated cross-sections (27).It is also evident that causal processes necessarily operate at the subject level so that subject-specific effects are required for etiological inference.In contrast, population average effects are merely descriptive and largely determined by the degree of heterogeneity in the population.Finally, GEE is not based on a statistical model which precludes likelihood based inference and can lead to logical inconsistencies.
Commands for estimating the random effects models in gllamm are given in the Appendix.More details on how to obtain the estimates, predictions and plots shown in this section are provided in (28).

Logistic regression with covariate measurement error: Diet and heart disease
We consider data from an investigation of the relationship between diet and coronary heart disease (30).At the time of recruitment, 337 middle-aged men weighed their food intake over a 7-day period, allowing food constituents to be derived.A subsample of 76 of the men repeated this 6 months later, and all the men were then followed up for heart disease.We will estimate the effect of dietary fibre intake on heart disease, controlling for occupation.The relevant variables are: • [Chd]: dummy variable for coronary heart disease (1: present, 0: absent) Following Clayton (30), we view covariate measurement error models as composed of three submodels: [1] a disease model, [2] a measurement model and [3] an exposure model.Since we have different response types for unit j; continuous measurements of log dietary fibre intake y 1j and y 2j and dichotomous coronary heart disease y 3j , we need a model handling multivariate responses of mixed types.
As disease model, we consider a logistic regression model for [Chd] (y 3j ) with a latent ('unobserved' or 'true') covariate η j , ln Pr( One of the covariates, [Bus] (x j ), is perfectly measured or 'observed' and has a direct effect β 1 on the logit.The other covariate, 'true log fibre intake' (η j ), is latent and measured with error.The factor loading λ represents the regression coefficient for true log fibre intake.We specify a classical measurement model relating measured log fibre intake y ij to true log fibre intake η j : where the measurement error ij has a normal distribution with mean zero.True log fibre intake was measured by y 1j for all men at the first occasion.At the second occasion measures y 2j were missing for many of the men.The repeated measurements for a man are assumed to have the same mean as the true covariate for that man.The measurement errors ij are specified as independently normally distributed with zero mean and constant measurement error variance and are independent of the true covariate η j .It follows that the measurements are conditionally independent of one another given the true unit j x r r r r r r j γ 1 covariate.Furthermore, the measurements are conditionally independent of the outcome y 3j given the true covariate, a property known as nondifferential measurement error.
To complete specification of the covariate measurement error model, we need to define an exposure model where the covariate [Bus] affects true log fibre intake and the residual or 'disturbance' ζ j has a normal distribution with mean zero.The quality of measurements is often quantified in terms of the reliability of the measures.Here, we define the reliability as the proportion of the total variance that is due to variability between the units' true covariate values, given the observed covariate The structure of the model is perhaps best conveyed in a path diagram as shown in Figure 3.Here circles represent latent variables and rectangles observed variables.Long arrows represent linear relations in the linear predictor and short arrows represent residual variability.For the exposure and measurement models, this residual variability is represented by an additive error term, but for the disease model it represents bernoulli variability.
Importantly, the covariate measurement error model specifies true log fibre intake as an intermediate variable in the causal pathway from occupation [Bus] to disease [Chd].It follows that [Bus] may have an indirect effect γ 1 λ in addition to the direct effect β 1 .The total effect, the sum of the direct and indirect effect, then becomes β 1 + γ 1 λ.Note that true log fibre intake can be viewed as the latent variable in a factor model with mixed responses; continuous responses for the two measurements of log fibre intake and a dichotomous response for disease.The factor loading for [Chd] is λ, whereas the loadings Estimates for the logistic regression model with covariate measurement error are presented in Table 2. Estimates for the model presented above are shown under 'Normality' in the table.Here, λ = −1.96represents the estimated effect of true log-fibre on coronary heart disease with corresponding odds ratio exp(−1.96)= 0.14.This extremely large estimated protective effect of log-fibre is probably due to omitting important confounding variables such as exercise which is protective of heart disease and increases food intake, including fibre.Note that the 'naive' estimate, treating the first log-fibre measurement as perfect and discarding the second measurement, is −1.63, which is attenuated as expected.True logfibre has an estimated mean of γ 0 = 2.86 among bank staff and γ 0 + γ 1 = 2.74 for transport staff.
It is interesting to investigate the direct, indirect and total effects of occupation on CHD.The reduced fibre intake among transport staff results in an increased odds of CHD, the odds ratio for this indirect effect of occupation (not shown in the table) being estimated as 1.27 (95% CI from 1.02 to 1.57).The protective direct effect of being transport staff exp(−0.19)= 0.83 (95% CI from 0.43 to 1.61) counteracts this, the odds ratio for the total effect of occupation (not shown in the table) being estimated as 1.05 (95% CI from 0.55 to 2.01).
The residual variance of true log fibre intake is estimated as var(ζ j ) = 0.07 and the measurement error variance as var( ij ) = 0.02.Using equation (15), we obtain an estimated reliability R = 0.77.
Instead of assuming a normal distribution for the true covariate η j , we can make the analysis more robust by letting the distribution be unspecified.The nonparametric maximum likelihood estimator (NPMLE) of the distribution is discrete (32,33) with probability masses at a finite number of locations.The number of masses is determined to achieve largest possible likelihood.NPMLE for covariate measurement error models has been discussed in several papers, e.g.(34,35).NPMLE for the logistic regression model with covariate measurement error required 8 mass points, the resulting estimates are presented under NPMLE in Table 2.We note that the estimates are very similar to those assuming normality, indicating that the normality assumption is tenable for this application.
Although commonly used, the classical measurement error model in equation ( 13) has a number of limitations.It assumes that the fallible measures have the same mean (no relative bias) and measurement error variance, which is reasonable if the measures are essentially exchangeable replicates.However, if the measurements are separated in time there may be a 'drift' in the mean measurement (36).More importantly, if the fallible measures were obtained by different methods, we should allow the measures to have different means, scales, and measurement error variances.These limitations can be rectified by using a congeneric measurement model (37).
There are often no replicate measures available to identify and estimate the covariate measurement error models considered above.In this case it is useful to perform a sensitivity analysis, investigating how the regression estimate for a fallible covariate depends on the magnitude of its measurement error variance.Suppose that we only had measures of logfibre intake at the first occasion.Figure 4 shows a plot of the estimated odds-ratio exp(β u ) for different assumed values of the measurement error variance var( i1 ).We see that the difference between the  'naive' estimated odds-ratio exp(−1.63)= 0.20 (assuming no measurement error) and the 'corrected' estimated odds-ratio increases as the measurement error variance increases.The diet and heart disease application has been discussed in more detail elsewhere (35,37), where several extensions of conventional covariate measurement error modelling are also described.
It is demonstrated in the Appendix that the cme wrapper for gllamm described in (37) makes estimation of the model with a normal true exposure distribution extremely easy.Commands for producing nonparametric maximum likelihood estimates in gllamm are also given in (37).

Multilevel modelling of nominal data: Abuse of antibiotics
Acute respiratory tract infection (ARI) is a common disease among children, pneumonia being a leading cause of death in young children in developing countries.In China the standard medication for ARI is antibiotics, which has led to concerns about antibiotics misuse and resultant drug resistance.As a response, the WHO introduced a program of case management for ARI in children under 5 years old in China in the 1990's.
We will analyze data on physicians prescribing behavior of antibiotics in two Chinese counties, one of which was in the WHO program whereas the other was not.These data have previously been analyzed by Yang (38).The multilevel structure of the prescribing data is displayed in Figure 5.
Medical records were examined for medicine prescribed and a 'correct' diagnosis determined from symptoms and clinical signs.Based on the WHO case management criteria, the antibiotic prescription for child i by doctor j in hospital k, y ijk , was classified into three categories (denoted 'abuse' if there were no clinical indications): 1: Correct use of antibiotics (reference-category) 2: Abuse of one antibiotic 3: Abuse of several antibiotics Prescription behavior is obviously a complex process, and the present analysis is merely intended to shed some light on basic issues such as the impact of the patient's status at arrival and the physician's experience.
We consider the following covariates (x ijk ) at different levels: • Child-level: • Doctor-level: -[DRed]: doctor's education (ordinal with six categories from self-taught to medical school) • Hospital-level: -[WHO]: dummy for hospital in WHO program (yes=1, no=0) Disregarding the multilevel structure of the data for a moment, the natural model for a multicategory response variable is polytomous logistic regression (39).The probability of the realized category, say a, can be written as where V a ijk = g a 0 + x ijk g a is a linear predictor with category-specific intercepts g a 0 and category-specific effects of covariates that do not vary over categories g a .Since the first category serves as reference, g 1 0 = 0 and g 1 = 0 for identification.
An alternative specification of the polytomous logistic regression model, based on so-called random utility models, is often used in econometrics and psychometrics but is unfamiliar among epidemiologists.Random utilities U a ijk are introduced for each category a, where we emphasize that the term utility should be broadly construed as 'attractiveness' of the category in some sense.The utilities are modelled as where a ijk is a random term, assumed to be independently distributed across children, doctors, hospitals and categories with a Gumbel distribution The realized category is then viewed as arising from utility maximization where the utility of the realized category U a * i is larger than the utilities of the other categories.Remarkably, the familiar logistic regression model ( 16) arises if and only if the random utilities are Gumbel distributed (40).
In an effort to handle the multilevel nature of the prescription problem, we specify a three-level random intercept polytomous regression model Comparing this linear predictor with that for the conventional model, we see that the terms γ 0jk and γ a(3) 0k are also included.γ a (2) 0jk is a category-specific random intercept varying at level 2, letting the overall 'attraction' of category a differ among doctors over and above the variability explained by the included covariates.Retaining the first category as reference, we let the doctor-level intercepts for categories 2 and 3, γ ) and var(γ is a category-specific random intercept varying at level 3, permitting the overall 'attraction' of category a to differ among hospitals.The hospital-level intercepts, γ ).Thus, the category-specific random intercepts at a given level are dependent whereas the intercepts are specified as independent across levels.
Estimates for the multilevel polytomous regression model, including only covariates at the child level, are presented in Table 3. Estimates also including covariates varying at the doctor and hospital levels are given in Table 4.The change in log-likelihood indicates that inclusion of these covariates improves the fit considerably.Note that the variances of the intercepts are considerably reduced at the hospital level whereas the variances of the intercepts at the doctor level are fairly similar.In general, the fixed effects do not appear to change appreciably.
The estimates for the fixed effects at the child level all seem to have reasonable signs and magnitudes.For instance, the higher the child's temperature, the lower the risk of abuse.Self-medication also appears to reduce the risk of antibiotics abuse, particularly of several antibiotics.On the other hand, a wrong diagnosis increases the risk.We note that doctor's education appears to reduce the risk of abusing several antibiotics but not the risk of abusing one antibiotic.Importantly, the WHO program seems to have a beneficial effect on antibiotics abuse, most pronounced for several antibiotics.
It could be argued that it would be more parsimonious to treat the antibiotic response as ordinal  instead of nominal, since the categories seem to have an a priori ordering (correct, abuse one, abuse several).However, models for ordinal response are considerably more restrictive than their nominal counterparts.
The multilevel polytomous regression model used above is a special case of a general modeling framework for multilevel polytomous regression (41), given a more introductory treatment in (42).The framework includes category-specific covariates such as the cost or other 'attributes' of the categories, useful for instance in health utilization studies.Multilevel and possibly multidimensional common factors can be included as well as different kinds of random coefficients.
Commands for producing the reported estimates in gllamm are given in the Appendix.

DISCUSSION
The purpose of this article has been to convey the usefulness of the general and flexible GLLAMM framework in epidemiology.We have considered three applications; logistic regression for repeated measurement data (course of illness in schizophrenia), logistic regression with covariate measurement error (diet and coronary heart disease), and multilevel modelling of nominal data (abuse of antibiotics).
Importantly, the applications considered do not in any way exhaust the potential of GLLAMMs in epidemiology.Types of applications not covered include: • Multilevel and multivariate survival analysis with frailties, for discrete time (43,4) and continuous time (4) • Discrete random growth curve models or latent trajectory models (44) • Disease mapping (4) • Endogenous treatment models and joint models of survival and repeated measurements (4) • Genetic epidemiology, for instance investigating association between a genetic marker and the dichotomous phenotype 'atopy' (asthma, eczema or hayfever) (45) Although the incidence of gllamm use is increasing in epidemiological research, see for instance (46)(47)(48)(49)(50), we hope that this article will further enhance the popularity of this powerful research tool.

( 2 )
ij corresponding covariates.Specifically, η (2) mj is a random effect of covariate z (2) mij for cluster j.It is typically assumed that the random effects are multivariate normal, η (2) j

( 2 )
m1, is set to one.Note that the model reduces to a generalized linear mixed model if z

Figure 1 :
Figure 1: Conditional and marginal predicted probabilities of thought disorder for women with early onset.Dotted curve for conditional probability from random coefficient model when both random intercept and slope are zero.

Figure 2 :
Figure 2: Marginal predicted probabilities of thought disorder for women with early onset from random effects models and GEE.

© 1 y 1 T y 2 TFigure 3 :
Figure 3: Path diagram with direct and indirect effects of [Bus] on [Chd] via true fibre intake.

Figure 5 :
Figure 5: Three-level structure of antibiotics data.

Table 1 :
Repeated measurements of thought disorder -

Table 2 :
Diet and CHD -Estimates for dichotomous logistic regression with covariate measurement error based on normal and nonparametric exposure distributions (31)pirical variance of estimated discrete distribution.arefixedatone for the fibre measures.Since the latent variable is regressed on a covariate, the specified covariate measurement error model represents a generalization of the conventional Multiple-Indicator Multiple-Cause (MIMIC) model(31)to include direct effects and non-continuous responses.
Sensitivity analysis of odds-ratio for different assumed measurement error variances (supposing there are no replicate measurements).

Table 3 :
Abuse of antibiotics -Estimates for multilevel random intercept polytomous regression.No observed covariates at the doctor and hospital levels

Table 4 :
Abuse of antibiotics -Estimates for multilevel random intercept polytomous regression.Including observed covariates at the doctor and hospital levels