Application of different statistical methods to estimate relative risk for self-reported health complaints among shoe factory workers exposed to organic solvents and plastic compounds

Objectives: Prevalence odds ratio (POR) is commonly used as a surrogate for relative risk (RR) in crosssectional studies. When prevalences are high, POR may be a poor approximation for RR. Prevalence ratios (PRs) are more easily interpretable when evaluating exposure effects. Our objectives were to compare estimates of PRs and corresponding 95% confidence intervals (CIs) using three different statistical methods on a real data set, furthermore, to report possible practical problems in applying the methods. Methods: Two statistical methods were compared: log-binomial regression and Cox regression. We examined selected high prevalence symptoms: headache, tingling of limbs, and breathing difficulty, and their association with solvent-exposed work tasks in 164 Hebron shoe factory workers. Results: The two methods estimated identical crude point PR estimates and quite similar adjusted estimates. CIs were wider in Cox regression than in log-binominal regression, as exemplified by adjusted estimates for the association between participation in cleaning tasks and tingling of limbs in log-binomial regression (PR=1.78; CI=1.25–2.54), Cox regression (PR=1.76; CI=1.01–3.06). When we used Cox regression with robust variance we obtained narrower CIs (PR=1.76; CI=1.19–2.60). In the log-binomial regression analysis we had to exclude a few subjects with a predicted risk exceeding one. Conclusions: Log-binomial regression is appropriate from a theoretical viewpoint. However, some individuals had a predicted risk larger than one, which caused the computation to abort. Cox regression could produce heavy ties when adjusted for confounders and yielded rather wide CIs, however, by using robust variance we will obtain narrow CIs. In conclusion, the two suggested methods have certain limitations and difficulties. However, Cox regression encountered less serious problems than in the other methods, and is also widely available.


INTRODUCTION
The prevalence odds ratio (POR) is commonly used in cross-sectional studies to assess associations between exposures and outcome.PORs can be estimated by logistic regression whenever the health outcome is dichotomous and the data needs covariate adjustment.
POR can be used as an approximation of prevalence ratio (PR) and interpreted as a relative risk (RR) in the case of rare diseases assumption (e.g.prevalence of outcomes below 0.1) (1-3).However, since many health outcomes are common, the interpretation of an odds ratio as a relative risk is often questionable (4). Lee and Chia (5) proposed the use of prevalence ratio (PR) instead of POR in cross-sectional studies of common diseases.According to Lee (6) PR is easier to communicate than POR and its meaning is more transparent.Others point out that the POR is com-monly interpreted incorrectly as a relative risk in cross sectional studies dealing with common diseases such as for example musculoskeletal complaints (4) and other high prevalence outcomes (7).
Skov et al. (7) applied these methods to simulated data sets and concluded that the point estimates of the models were close to the true parameters, but Cox regression produced too wide confidence intervals.However, Cox regression with robust variance can produce more appropriate confidence intervals (8).For the other methods, confidence intervals were generally considered to be correct (7).Zochetti et al. (13) concluded that the log-binomial model is preferable.
Our objective was to compare estimates of PRs and corresponding 95% confidence intervals, using the different methods (Cox regression, and log-binomial regression) on a real data set with high prevalence outcomes.The data set contained information about health complaints among shoe factory workers exposed to organic solvents and plastic compounds (14).PORs will also be presented to illustrate differences compared with PR estimates using the two methods.Finally, we will report possible practical problems in applying the methods.

Study Population
A sample of 164 male shoe factory workers in Hebron City who had worked more than one year were interviewed in 1996-97.The study population and methods are described in more detail elsewhere (14).

Questionnaire
Health complaints among shoe factory workers were collected in a structural interview (14).Questions relating to neuropsychiatric symptoms were obtained from a Swedish neuropsychiatric symptom questionnaire (Q16) (15).
Other questions included symptoms representing potential peripheral nervous system effects (tingling of limbs), mucous membrane irritation (sore eyes and breathing difficulty), in addition to work tasks, cumulative exposure, age, socio-demographic characteristics (smoking, marital status, and education) (14).

Exposure
The workers were exposed to organic solvents and plastic compounds, depending on work tasks and the type of production.Cumulative exposure was estimated for workers by calculating total months of work in four tasks (gluing, cleaning, varnishing and plastic molding).Adhesive work was categorized into four exposure subgroups (0, 1-12, 13-72, >72 months), cleaning into three subgroups (0, 1-24, >24 months), whereas varnishing and plastic molding was dichotomized (0, ≥1 month).

Statistical analysis
We applied two methods using the S-Plus 2000 software: Cox regression, and log-binomial regression.We estimated PRs for associations between exposed work tasks and selected high prevalence outcomes: headache, tingling of limbs and breathing difficulty.We also compared the PRs with the corresponding PORs in a standard logistic regression analysis (SPSS package for Windows, version 8).
In Cox regression the time variable was set to the same value (unity) for all individuals.An individual who reported a symptom was coded as a "death", the others as "censored".According to the recommendation of Skov et al. (7) the Breslow method for ties was used.In this model, when b 1 is the estimated coefficient corresponding to exposure, exp (b 1 ) is an approximation to the relative risk (RR) associated with that exposure.To obtain more correct confidence intervals for PRs estimated by Cox regression, we applied the robust variance option in STATA software (STATA/SE 8.0).
The log-binomial model is similar to logistic regression in assuming a binomial distribution of outcome.However, instead of using a logit link function, as is customary in standard logistic regression, a log link is applied.Hence, for a particular individual the relation between the risk p of an adverse outcome and the covariate values … are the parameters to be estimated.If, say, x 1 is a dichotomous exposure then RR = exp (b 1 ) for that exposure.The log link binomial model is available in several statistical software packages, for example S-Plus.

RESULTS
In the log-binomial regression analysis we had to exclude a few subjects (between 1 and 8 for different work tasks and symptoms) because of predicted risks exceeding unity, causing the software to abort computation.
The prevalence of headache was 0.65.In the logbinomial regression, headache was moderately associated with exposure for >24 months in the cleaning task (adjusted PR=1.57;CI=1.17-2.10)(Table 1).Cox regression yielded a similar point estimate but a wider CI (PR=1.58;CI=0.98-2.54).It is possible to improve the situation using the robust variance estimates for the Cox regression (PR=1.58;CI=1.24-2.00).As expected, the crude PRs were identical in the two methods, but the CIs showed the same pattern as for the adjusted estimates.
The prevalence of tingling of limbs among shoe factory workers was 0.46.Association with cleaning activities showed the same pattern with similar adjusted PRs, and a wider CI in Cox regression than in logbinominal regression (Table 2).
The prevalence of breathing difficulty was 0.28.Breathing difficulty was found to be associated with exposure to adhesives and varnishing compounds in the two statistical methods (Table 3).Again, the same pattern was observed concerning crude and adjusted PRs and their corresponding CIs.POR estimates were invariably stronger (more distant from unity) than PRs, as expected (Tables 1-3).

DISCUSSION
In cross-sectional studies PORs are often presented and interpreted as relative risks, on the rare disease assumption.The reason could be that PORs are easily computed in logistic regression.However, the high overall prevalences of certain outcomes make POR a poor replacement for the RR.To overcome this problem, many authors have suggested directly estimating the PR, which is more easily interpreted than POR (16).We used two suggested methods for this, namely Cox regression and log-binomial regression.The logbinomial model yields the "correct" likelihood structure under the assumptions of multiplicative effects, and is thus the most appropriate method to estimate PR and corresponding CIs directly (1).However, the logbinomial model might produce prevalences greater than one (17).Although Cox regression produced approximately the same PR point estimates, it suffered from other shortcomings.Cox regression introduced heavy ties, that sometimes were difficult to correct for in the model, and the Breslow method is not particularly well suited for this, whereas other methods may produce bias (7).Even though the log-binomial regression model is preferable from a theoretical point of view, it encountered numerical problems.Although the model itself may generally be appropriate, one may occasionally encounter a few individuals for whom the predicted risk is larger than one, due to a rare combination of covariates.Apart from being illogical, a predicted risk above one will often cause the software to abort computations, giving only slight clues as to the nature of the problem.The higher the prevalence, the more frequent this problem will be.To avoid this problem we rewrote the software to locate and remove those few individuals that caused the computation to crash.Clearly, this strategy is not tenable in situations with more frequent predictions above one.Ultimately, of course, if many of the predicted risks exceed one, this is a sign of a mis-specified model rather than of just a few deviating individuals.
It is worth noting that ordinary logistic regression does not suffer from any of the shortcomings of the other models.But when comparing the PORs with the PRs for high prevalence outcomes, it is clear that they differ substantially, as the POR typically overestimates the PR.
Also, there are different assumptions underlying the logistic model as compared to the log-binominal model.Whereas the logistic model assumes a constant exposure over all covariate levels, the log-binominal model assumes a constant PR over all levels of adjustment.If the log-binominal model was correct, the logistic regression model should include interaction terms, and vice versa.Again, at low prevalences the difference may not be substantial, but it becomes considerable at high prevalences.
To illustrate differences within a real data set, we selected outcomes with different, high prevalences.Preferably, the estimated PRs should be similar for the two methods (7).This was true for unadjusted PRs but when we adjusted PRs for potential confounding factors, slight differences were obtained.
Confidence intervals of unadjusted and adjusted PRs obtained by Cox regression were too wide compared with those obtained by log-binomial analysis.
However, when we used robust variance estimates for Cox regression we obtained appropriate confidence intervals.
As a conclusion, the two suggested methods have certain limitations and difficulties.The log-binomial model is appropriate from a theoretical viewpoint.However, Cox regression with robust variance may be a suitable method since we obtained point PR estimates with less serious problems than we experienced with the other method.

Table 1 .
PORs, PRs and corresponding CIs between different work tasks and headache (prevalence = 65%) among Hebron shoe factory workers (n = 164), estimated by ordinary logistic regression, log-binomial regression, Cox regression, and Cox regression with robust variance.
* PR and POR adjusted for categories of age, smoking, marital status, and education.

Table 2 .
PORs, PRs and corresponding CIs between work tasks and tingling of limbs (prevalence = 46%) among Hebron shoe factory workers (n = 164), estimated by ordinary logistic regression, log-binomial regression, Cox regression, and Cox regression with robust variance.
* PR and POR adjusted for categories of age, smoking, marital status, and education.

Table 3 .
PORs, PRs and corresponding CIs between work tasks and breathing difficulty (prevalence = 28%) among Hebron shoe factory workers (n = 163), estimated by ordinary logistic regression, log-binomial regression, Cox regression, and Cox regression with robust variance.
* PR and POR adjusted for categories of age, smoking, marital status, and education.