Record linkage in occupational cancer studies

Record linkage studies have been used for decades to study occupational diseases and mortality. While low response rates represent an increasing problem in connection with sample surveys, response rates at censuses in the Nordic countries are close to 100%. The linking of census data with data about mortality, emigration and cancer by the unique personal identification numbers, ensures complete information on relevant events. Results from a large census-based Nordic study on cancer incidence by occupation are presented, and advantages and problems related to such type of studies are discussed. In record linkage studies occupational group, socio-economic status or any other characteristic may be used as the exposure indicator. The results of such studies usually must be interpreted in light of established etiologic associations. The two main purposes with record linkage studies are that they may be used as reference material in the general description of cancer incidence and distribution in a population, and that they serve as a basis for generation of hypotheses to be tested in analytical studies with more detailed information. The potential for using death certificate data for the surveillance of occupational hazards was envisioned nearly 150 years ago when the first occupational mortality statistics were published for England and Wales in 1855 (Registrar General, 1855). The earliest study of occupational mortality based on the linkage of individual death certificates and census records was done in the United States based on a 4-month follow-up of a sample of the 1960 census population (Kitagawa & Hauser, 1973). In Norway a similar study based on deaths in 1960-64 was published in 1974 (Tønnesen, 1974), and later a 10-year follow up of the Swedish 1960 population was published (Statistiska Centralbyrån, 1982). More recently, a joint Nordic study on work-related cancer was published in a comprehensive analysis based on more than 10 million persons (Andersen et al, 1999). WHY REGISTRY BASED LINKAGE STUDIES? During the last decades, falling response rates has hampered survey-studies in many countries. A low response rate may lead to serious bias. An indication of selection bias is that the lifestyle of respondents often differs from that of the non-respondents. However, at census the response rate is close to 100% because it is compulsory for all heads of a household to fill out self-administered questionnaires. Exposure to physical and chemical agents at the workplace are often heavier than elsewhere and often extends over relatively long periods of time. Thus, if a substance is carcinogenic to humans, the best opportunity of observing an association may be in an occupational setting (Decoufle, 1982). In addition to assumed higher exposures, occupational groups are favourable to study as they more easily may be classified according to exposure periods and levels than general population samples. Beside workplace exposures, the occupational groups also vary in their exposures to other risk factors. Such major risk factors are tobacco smoking, alcohol consumption, and dietary habits, but information on these risk factors are not included in the census data. In future studies it may be possible to adjust for such factors if there is an access to biological data from relevant sources. The existence of a national cancer registry, a population registry, and the performing of censuses giving information on occupational activity for the whole population constitute an important infrastructure for occupational cancer research in Norway as well as in the other Nordic countries. This infrastructure has been utilised in a series of studies of cancer in selected occupations as well as overviews of occupational groups (Andersen et al, 1989; Tynes et al, 1992; Pukkala et al, 1983; Auvinen et al, 1995; Lynge et al, 1995; Engholm et al, 1996; Carstensen, 1987; Gerhardsson et al, 1986). The linkage between the census data, the mortality and emigration data, and cancer incidence data has always been based on the unique personal identification numbers. This method ensures a complete ascertainment of relevant events. The present text is aimed at a discussion of advantages and disadvantages, weaknesses and problems related to linkage studies in Norway/the Nordic countries. WORK RELATED CANCER IN THE NORDIC COUNTRIES Among the latest and largest of such studies is a Nordic study published in 1999 (Andersen et al, 1999). Here, cancer incidence data was presented by occupational group for a population of 10 million people aged 25-64 years in 1970 during approximately 20 years of observation and including a total of 1 million cancer 92 AA. ANDERSEN ET AL. cases. Individual records were linked using the personal identification number used in four Nordic countries. In 1970 censuses were held in Norway, Sweden, Denmark and Finland, and information on occupation for each economically active family member was collected in free text using self-administered questionnaires. This information was coded according to the Nordic Classification of Occupations, NYK (Nordisk yrkesklassifisering, 1965), in Norway, Sweden and Finland, and according to a national coding scheme in Denmark. For the joint Nordic study, all data were grouped into 53 occupational groups plus one group of economically inactive persons. The data were analysed as a cohort study, and person-years were accumulated from 01.01.71 until date of emigration, death, or 31.12.87 (Denmark), 31.12.89 (Sweden), 31.12.90 (Finland) and 31.12.91 (Norway). National cancer registration was performed in all participating countries during the whole observation period, and all incident cancer cases during the individual risk periods were included. The International Classification of Diseases, 7 revision, formed the basis of the classification of cancer cases into 35 diagnostic groups. For each country and gender the observed number of new cancer cases in each of the defined occupational groups was compared with the expected number calculated from the age-, gender-, and period-specific person-years and the incidence rates for each of the participating countries. The results were presented as standardised incidence ratios, SIRs, defined as the ratio of observed on expected cases multiplied by one hundred. The SIRs for the four countries combined were calculated in the same way for each gender and occupational group. For men, the risk of any cancer disease was very variable across the 53 occupational groups, with SIRs ranging from 79 for farmers to 159 for waiters. High risks for all cancers combined were also found among seamen, beverage and tobacco producing workers and cooks. For women the risk for any cancer varied less across occupational group. SIRs ranged from 83 for gardeners to 129 for tobacco workers. Lung cancer was the most frequently occurring cancer among men during the observation period. While tobacco smoking is the most important risk factor for lung cancer, occupational exposures also contribute substantially. Waiters and tobacco workers had the highest risks of lung cancer, and in these groups tobacco smoking should be viewed as a work related risk factor. Elevated risks were also found among miners and quarry workers, possibly related to silica dust and radon exposure. Risk of lung cancer among men according to occupational group is shown in table 1. Pleural cancer is considered to be exclusively linked to asbestos exposure. Accordingly, plumbers, welders, mechanics, and seamen were the groups with the highest risks of pleural cancer. Indication of asbestos exposure was also found in the occupational group of technical, physical and biological workers, including, among others, engineers, chemists, and architects. In wood workers, an expected elevated risk of cancer of the nose was identified. In Norway, an elevated risk of nasal cancer was also found among the smelters and foundry workers, caused by the inclusion of nickel-refinery workers in this occupational group. In the Nordic countries a large proportion of liver cancer cases can be attributed to alcohol consumption, since exposure to other liver carcinogens, such as hepatitis B virus and aflatoxins, is rare. High risk of liver cancer was found in occupational groups with easy access to alcohol or with cultural traditions for high alcohol consumption, such as waiters, journalists, cooks, beverage workers, and seamen. Some cancer diseases are more often found in the higher social strata in the population. This is often seen for instance for breast cancer and colon cancer, where fertility pattern and dietary habits, respectively, are thought to play important roles. Accordingly, elevated breast cancer risk was found in groups characterised with long education, and presumably late age at first birth, such as physicians, dentists, journalists, and other groups with university education. It is known that also dietary factors, including alcohol consumption, may contribute to breast cancer risk. Risk of breast cancer among women according to occupational group is shown in table 2. The fifty-fourth group investigated was constituted by people not economically active in 1970, approximately 422 000 men and 2.4 million women. Among the men, cancer risk was significantly elevated for 22 of the 32 cancer diagnoses included (table 3). For all sites combined, SIR was 118. The same high SIRs were also observed in 1980 and 1990. For two cancer sites, prostate cancer and malignant melanoma, risk was significantly lower than in the general population. In the much larger group of women, a considerable proportion probably was housewives. Elevated risks were found for cancer of the oesophagus, stomach, liver, gall bladder, kidney, thyroid, and connective tissue and for unknown/undefined cancer sites, the SIRs for these sites varied from 103 to 108.

The potential for using death certificate data for the surveillance of occupational hazards was envisioned nearly 150 years ago when the first occupational mortality statistics were published for England and Wales in 1855 (Registrar General, 1855).
The earliest study of occupational mortality based on the linkage of individual death certificates and census records was done in the United States based on a 4-month follow-up of a sample of the 1960 census population (Kitagawa & Hauser, 1973).In Norway a similar study based on deaths in 1960-64 was published in 1974 (Tønnesen, 1974), and later a 10-year follow up of the Swedish 1960 population was published (Statistiska Centralbyrån, 1982).More recently, a joint Nordic study on work-related cancer was published in a comprehensive analysis based on more than 10 million persons (Andersen et al, 1999).

WHY REGISTRY BASED LINKAGE STUDIES?
During the last decades, falling response rates has hampered survey-studies in many countries.A low response rate may lead to serious bias.An indication of selection bias is that the lifestyle of respondents often differs from that of the non-respondents.However, at census the response rate is close to 100% because it is compulsory for all heads of a household to fill out self-administered questionnaires.
Exposure to physical and chemical agents at the workplace are often heavier than elsewhere and often extends over relatively long periods of time.Thus, if a substance is carcinogenic to humans, the best opportunity of observing an association may be in an occupational setting (Decoufle, 1982).In addition to assumed higher exposures, occupational groups are favourable to study as they more easily may be classified according to exposure periods and levels than general population samples.Beside workplace exposures, the occupational groups also vary in their exposures to other risk factors.Such major risk factors are tobacco smoking, alcohol consumption, and dietary habits, but information on these risk factors are not included in the census data.In future studies it may be possible to adjust for such factors if there is an access to biological data from relevant sources.
The existence of a national cancer registry, a population registry, and the performing of censuses giving information on occupational activity for the whole population constitute an important infrastructure for occupational cancer research in Norway as well as in the other Nordic countries.This infrastructure has been utilised in a series of studies of cancer in selected occupations as well as overviews of occupational groups (Andersen et al, 1989;Tynes et al, 1992;Pukkala et al, 1983;Auvinen et al, 1995;Lynge et al, 1995;Engholm et al, 1996;Carstensen, 1987;Gerhardsson et al, 1986).The linkage between the census data, the mortality and emigration data, and cancer incidence data has always been based on the unique personal identification numbers.This method ensures a complete ascertainment of relevant events.The present text is aimed at a discussion of advantages and disadvantages, weaknesses and problems related to linkage studies in Norway/the Nordic countries.

COUNTRIES
Among the latest and largest of such studies is a Nordic study published in 1999 (Andersen et al, 1999).Here, cancer incidence data was presented by occupational group for a population of 10 million people aged 25-64 years in 1970 during approximately 20 years of observation and including a total of 1 million cancer cases.Individual records were linked using the personal identification number used in four Nordic countries.In 1970 censuses were held in Norway, Sweden, Denmark and Finland, and information on occupation for each economically active family member was collected in free text using self-administered questionnaires.This information was coded according to the Nordic Classification of Occupations, NYK (Nordisk yrkesklassifisering, 1965), in Norway, Sweden and Finland, and according to a national coding scheme in Denmark.For the joint Nordic study, all data were grouped into 53 occupational groups plus one group of economically inactive persons.The data were analysed as a cohort study, and person-years were accumulated from 01.01.71 until date of emigration, death, or 31.12.87 (Denmark), 31.12.89(Sweden), 31.12.90 (Finland) and 31.12.91 (Norway).National cancer registration was performed in all participating countries during the whole observation period, and all incident cancer cases during the individual risk periods were included.The International Classification of Diseases, 7 th revision, formed the basis of the classification of cancer cases into 35 diagnostic groups.For each country and gender the observed number of new cancer cases in each of the defined occupational groups was compared with the expected number calculated from the age-, gender-, and period-specific person-years and the incidence rates for each of the participating countries.The results were presented as standardised incidence ratios, SIRs, defined as the ratio of observed on expected cases multiplied by one hundred.The SIRs for the four countries combined were calculated in the same way for each gender and occupational group.
For men, the risk of any cancer disease was very variable across the 53 occupational groups, with SIRs ranging from 79 for farmers to 159 for waiters.High risks for all cancers combined were also found among seamen, beverage and tobacco producing workers and cooks.For women the risk for any cancer varied less across occupational group.SIRs ranged from 83 for gardeners to 129 for tobacco workers.
Lung cancer was the most frequently occurring cancer among men during the observation period.While tobacco smoking is the most important risk factor for lung cancer, occupational exposures also contribute substantially.Waiters and tobacco workers had the highest risks of lung cancer, and in these groups tobacco smoking should be viewed as a work related risk factor.Elevated risks were also found among miners and quarry workers, possibly related to silica dust and radon exposure.Risk of lung cancer among men according to occupational group is shown in table 1.
Pleural cancer is considered to be exclusively linked to asbestos exposure.Accordingly, plumbers, welders, mechanics, and seamen were the groups with the highest risks of pleural cancer.Indication of asbestos exposure was also found in the occupational group of technical, physical and biological workers, including, among others, engineers, chemists, and architects.In wood workers, an expected elevated risk of cancer of the nose was identified.In Norway, an elevated risk of nasal cancer was also found among the smelters and foundry workers, caused by the inclusion of nickel-refinery workers in this occupational group.
In the Nordic countries a large proportion of liver cancer cases can be attributed to alcohol consumption, since exposure to other liver carcinogens, such as hepatitis B virus and aflatoxins, is rare.High risk of liver cancer was found in occupational groups with easy access to alcohol or with cultural traditions for high alcohol consumption, such as waiters, journalists, cooks, beverage workers, and seamen.
Some cancer diseases are more often found in the higher social strata in the population.This is often seen for instance for breast cancer and colon cancer, where fertility pattern and dietary habits, respectively, are thought to play important roles.Accordingly, elevated breast cancer risk was found in groups characterised with long education, and presumably late age at first birth, such as physicians, dentists, journalists, and other groups with university education.It is known that also dietary factors, including alcohol consumption, may contribute to breast cancer risk.Risk of breast cancer among women according to occupational group is shown in table 2.
The fifty-fourth group investigated was constituted by people not economically active in 1970, approximately 422 000 men and 2.4 million women.Among the men, cancer risk was significantly elevated for 22 of the 32 cancer diagnoses included (table 3).For all sites combined, SIR was 118.The same high SIRs were also observed in 1980 and 1990.For two cancer sites, prostate cancer and malignant melanoma, risk was significantly lower than in the general population.In the much larger group of women, a considerable proportion probably was housewives.Elevated risks were found for cancer of the oesophagus, stomach, liver, gall bladder, kidney, thyroid, and connective tissue and for unknown/undefined cancer sites, the SIRs for these sites varied from 103 to 108.

DISCUSSION
As the above brief review of some of the main results indicate, linkage studies do not usually give direct information on risk associated with specific occupational exposures.The results rather give indications of the consequences in terms of cancer risk from the combined effect of exposures, lifestyle factors and socio-economic characteristics of each of the constructed occupational groups.The classification of occupational groups is therefore important, in many cases the need for large groups with robust risk estimates must be balanced against the need for groups with high specificity in exposure and/or equality in socioeconomic status.As in the case with the Nordic study, a long period of follow-up in combination with large populations makes finer occupational distinctions possible.Such classification of occupations may focus on either of the three determinants occupational exposures, lifestyle factors or socio-economic status, depending on the intentions of the specific study, but these will always be more or less intertwined.Record linkage studies of the type presented briefly above first of all require high quality registers in terms of correctness and completeness of information.Secondly, the use of a secure and unique personal identifier, such as the personal identification number used in the Nordic countries, is essential both for the quality of registration and for the linkage procedure.A major strength with this type of study is the possibility of obtaining a cohort of large size and close to 100% completeness of participation, i.e. no selection bias.Information bias is also considered to be a minor problem, since the information and classification of the  dependent and independent variables come from different and independent sources, and any misclassification is therefore expected to be non-differential.There are, however, also major obstacles in the interpretation of record linkage studies.One problem concerns the validity of information on occupational affiliation.A control survey to check the quality of the 1970 census in Norway showed that the distribution by main occupational groups for those recorded as economically active were fairly similar to those given in the census (Statistisk Sentralbyrå, 1970).Similar analyses with similar results have been done in Sweden and Finland.Further, the interpretation of the above described record linkage study requires a certain stability in the occupationally active population.To what extent the occupational affiliation in 1970 reflects lifetime work experience has not been studied directly in the Norwegian data.In Finland, however, a very stable labour market was indicated in a comparison of the 1970the , 1980the and 1985the censuses (Kolari 1989)).In some studies, stating the same occupational group in two subsequent censuses has been required for inclusion in the study group of interest, in an attempt to reduce dilution by a high prevalence of short-term workers (Andersen, 1989).In the future it is assumed that occupational mobility will be of increasing concern.It should be noted, however, that change of occupation does not necessarily indicate social mobility with lifestyle changes or significant changes in occupational exposures.Still, data on dynamic occupational history in combination with data on educational level would increase the value of record linkage.
The data forming the basis for cancer registration in the Nordic countries are clinical reports, histopathological reports, and, to a lesser degree, death certificates.This system with multiple sources of information ensures the quality and completeness of the database.Cancer reporting has been compulsory in the Nordic countries since 1953 (Finland andNorway), 1958 (Sweden), and1987 (Denmark).In Denmark, physicians were paid for each case of cancer reported from 1943 until 1987, and validation studies indicate good coverage of registration from the very beginning.Thus, in the period 1943-66 it was found that less than 1% was missing (Storm, 1988), and in a sample born 1881-1920 97% of cancer cases were confirmed with either histology or surgery (Holm et al, 1982).A validation of the Norwegian cancer registry in 1976 showed an overall completeness of 98% (Lund, 1981), with the deficit coming mainly from leukaemia and multiple myeloma.In Finland the agreement between the cancer register and the national hospital discharge register for the period 1985-1988 was about 99% (Teppo et al, 1994).Deficiencies were mainly found for benign neoplasms of the central nervous system, chronic lymphatic leukaemia and multiple myeloma.In Sweden, death certificates have not been used as data source.Linking of the cancer register with the cause of death register showed a dropout in completeness of maximum 4.5%, with the deficits mainly found for leukaemia, cancers of the prostate and stomach, and myeloma (Mattson, 1984).Overall for the Nordic countries, the quality of cancer registration is very good, and for solid tumours, especially, completeness is close to 100%.
Multiple significance testing gives a high probability of finding a significant difference just by chance (Altman 1991).In this study we present SIR values for men and women, 4 countries, 54 occupational groups, and 35 cancer sites.Although some of these possible combinations are not relevant, we present data for a very large number of combinations.Inevitably, many of these combinations will, by chance, emerge with significantly high or low SIR values.In the interpretation of the findings, it is therefore important to pay attention not only to the size of the SIR values or the confidence intervals, but also to the consistency across countries, to the biological plausibility, and to previous epidemiological studies.
As mentioned, no classification of occupation can ensure that the calculated SIRs reflect the effects of occupational exposures alone.Instead of occupational cancer risk, the term "work related cancer" was therefore used in the title of the Nordic study (Andersen et al, 1999), to imply the combined effects of factors related to occupation, education and social class, including important cancer determinants such as habits of eating, drinking and smoking, physical activity, and reproductive pattern.The interpretation of studies based on classification by occupational group depends on knowledge both of the distribution of the most important determinants across occupational groups and on the knowledge of (causal) associations between exposures and cancer risk.For some risk factors, adjustment at the aggregate level is possible, for instance as done in a recent study of lung cancer among Norwegian men.Survey-data about smoking was used to adjust the SIRs for lung cancer for different occupations.It was estimated that about 20% of all lung cancer among men could be related to occupation, after adjusting for the effect of smoking.(Haldorsen et al, 2004).Though adjustment for risk factors at an individual level is preferable, the use of aggregated data on such factors will be useful where individual information is lacking.

FUTURE POSSIBILITIES
As performed until today, the interpretation of record linkage studies are often restricted by rather unspecific exposure indicators based on one point in time registration.In the future, it may be possible to use this design in studies with more specific analytical or etiological ambitions.We believe that the development of a national job exposure matrix (JEM), as has been done in Finland, would greatly add to the value of this type of study.A JEM may be regarded as a tool by which information on jobs collected in epidemiologic studies may be automatically converted into information on potential exposures (Kauppinen et al, 1998).With the use of biobanks, e.g. the Janus serum bank, it would be possible to search for biomarkers of for instance smoking, alcohol consumption, or of different occupational exposures.Another important change during the last 2-3 decades has been the labour market participation of women.Studies on occupation and cancer in women have so far included few person years at risk.In future studies of women, data on fertility is often of great importance, and information such as age at first birth, or number of births, should be added to the file.

CONCLUSION
The results of record linkage studies often require to be interpreted in light of established etiologic associations.Knowledge of changes in the level and distribution of risk factors in the population will aid in this process.Record linkage studies have a two-fold purpose: to be used as reference material in the general description of cancer incidence and distribution in a population, and to serve as a basis for generating hypotheses to be tested in specifically designed studies.In all the Nordic countries linking of data from the censuses and cancer registries has proved to be useful in both aspects.

Table 1 .
Observed number (Obs) and standardized incidence ratio (SIR) of lung cancer in Nordic* men by occupational group.CI = confidence interval.
*Denmark, Finland, Norway and Sweden

Table 2 .
Observed number (Obs) and standardized incidence ratio (SIR) of breast cancer in Nordic* women by occupational group.CI = confidence interval.