Cohort and case-control studies

Training course in research methodology, research protocol development and scientific writing 2023

COHORT AND CASE-CONTROL STUDIES

O. Meirik
Unit for Epidemiological Research
Special Programme of Research, Development and Research Training in Human Reproduction,
World Health Organization, 1211 Geneva 27, Switzerland

Cohort and case-control methodologies are the main tools for analytical epidemiological research. Other important types of epidemiological studies mainly for generating hypotheses include cross-sectional and ecological, or correlation studies. The conclusions that can be drawn from findings of these types of studies are, however, much weaker compared to those of cohort and case-control studies. This is not to say that findings from cohort and case-control studies always reflect true associations which can be universally generalized. Epidemiological research is, to a large extent, of an observational character as opposed to experimental research. Experimental research provides data from which firmer conclusions can be made as compared with epidemiological studies. In experimental research, investigators can manipulate one factor while controlling others, and the main research question can be broken down into subquestions with comparatively simple causal assumptions (12). Through repeated manipulation of one or more factors in a series of experiments concerned with subquestions, the main research question can be resolved. The experimental approach allows control of the effect of extraneous factors that may have an effect on the outcome under study, but are not under investigation. Such extraneous factors may, if not under control, distort the results of the research and lead to false conclusions about cause and effect. In biomedical research on human beings the randomized clinical trial is the closest option to experimental research methodology. Observational epidemiological research has the disadvantage that extraneous factors cannot be manipulated by the investigators. Although information of such extraneous factors is collected and quantitatively adjusted for when they are known to be present, findings from observational epidemiological studies are generally less conclusive than those from experimental studies because of the less strict control of extraneous factors.

The two epidemiological methodologies to study disease causation outlined in this chapter have different approaches. The cohort study starts with the putative cause of disease, and observes the occurrence of disease relative to the hypothesized causal agent, while the case-control study proceeds from documented disease and investigates possible causes of the disease. The methodological principles of cohort and case-control studies are briefly outlined. For a more detailed account of design, conduct and analyses of epidemiological studies, the reader is referred to textbooks and methodological articles given in the list of references.

Cohort study

The starting point of a cohort study is the recording of healthy subjects with and without exposure to the putative agent or the characteristic being studied. Individuals exposed to the agent under study (index subjects) are followed over time and their health status is observed and recorded during the course of the study. In order to compare the occurrence of disease in exposed subjects with its occurrence in non-exposed subjects, the health status of a group of individuals not exposed to the agent under study (control subjects) is followed in the same way as that of the group of index subjects.

Measures of disease and association

The measure of disease in cohort studies is the incidence rate, which is the proportion of subjects who develop the disease under study within a specified time period. The numerator of the rate is the number of diseased subjects and the denominator is usually the number of person-years of observation. The incidence rates for exposed and non-exposed subjects are calculated separately.

The measure of association between exposure and disease in cohort studies is the relative risk. The relative risk is the ratio of the incidence rate of index subjects to that of control subjects. A relative risk of 1.0 signifies that the incidence rate is the same among exposed and non-exposed subjects and indicates a lack of association between exposure and disease. A relative risk of less than 1.0 provides evidence for a protective effect of exposure (the incidence rate of disease among exposed is lower than non-exposed) whereas a relative risk above 1.0 suggests that exposed people are at higher risk of disease than non-exposed persons.

Current and historical cohort studies

Depending on the time when the cohort study is initiated relative to occurrence of the disease(s) to be studied, one distinguishes between current and historical cohort studies. In a current cohort study, the data concerning exposure are assembled prior to the occurrence of disease—the current cohort design thus representing a true prospective study. In a historical cohort study, data on exposure and occurrence of disease are collected after the events have taken place—the cohorts of exposed and non-exposed subjects are assembled from existing records, or health care registries. In recent years, historical cohort studies have been referred to as retrospective cohort studies by some authors, because data are collected retrospectively. The methodological principle of historical cohort studies is, however, the same as those of prospective studies, and the term retrospective cohort study is a misnomer.

An example of a current cohort study is the Oxford Family Planning Association Study in the United Kingdom, which aimed to provide a balanced view of the beneficial and harmful effects of different methods of contraception (14). In collaboration with the Family Planning Association, the investigators recruited 17,032 women between 1969 and 1974 in 17 of the largest and best clinics run by the Family Planning Association. Of these 17,032 women, 56.6% were users of oral contraceptives (OCs), 24.8% were users of a diaphragm and 18.6 were users of an intrauterine device (IUD) at admission. The women met specific eligibility criteria for enrolment into the study. They were scheduled for visits on an annual basis. Those who failed to keep the appointment were sent a postal version of the follow-up form. If this was not answered the women were contacted by phone or visited at home. All events of ill health were recorded, including hospitalizations, as well as changes in contraceptive use. In the event of hospitalization, the hospital discharge reports were requested. Ascertainment of death and diagnosis of cancer were obtained from national death and cancer registries. A coordinating centre was set up to check and computerize the data. This study, which was originally anticipated to be conducted for about ten years, is still ongoing and has provided a large amount of information on the efficacy and safety of contraceptive methods, and in particular OCs, the diaphragm and IUDs. The methodology of the study is comprehensively described in the paper referred to in reference number 14.

A study of the outcome of delivery subsequent to induced abortion provides an example of a historical cohort study (8). This study aimed to examine if an induced abortion increases the risk of pre-term birth or low birthweight in pregnancies following the abortion. From 1970 to 1975, the investigators assembled information on the date and type of abortion, and the personal identification number of women having had an induced abortion in one hospital in Sweden. Sources of information were a computerized hospital discharge registry, and ledgers kept in the surgical unit of the Department of Obstetrics and Gynecology of the hospital. Information was obtained on 95% of the 5,292 induced abortions performed during the period studied. The computerized data on women having had a previous abortion were linked by means of the personal identification number to a national Medical Birth Registry which contains information on the outcome of all births in Sweden, including gestational duration and infant birthweight. Through this procedure, the investigators could identify women who gave birth after having had an induced abortion and were provided with information on the outcome from the Medical Birth Registry. A control group was selected from the Medical Birth Registry. The abortion history of women in the control group was checked from their antenatal care records. In this cohort study, the data collection was carried out from 1978 through to 1981, whereas the abortions (exposure) had taken place from 1970-1975 and the deliveries (outcomes) from 1970-1978.

Case-control study

The starting point of a case-control study is subjects with the disease or condition under study (cases). The cases’ history of exposure or other characteristics, or both, prior to onset of the disease, is recorded through interview and sometimes by means of records and other sources. A comparison group consisting of individuals without the disease under study (controls) are assembled, and their past history is recorded in the same way as for the cases. The purpose of the control group is to provide an estimate of the frequency and amount of exposure in subjects in the population without the disease being studied. Whereas the cohort study is concerned with frequency of disease in exposed and non-exposed individuals, the case-control study is concerned with the frequency and amount of exposure in subjects with a specific disease (cases) and people without the disease (controls).

Measure of association

In case-control studies, data are not available to calculate the incidence rate of the disease being studied, and the actual relative risk cannot be determined. The measure of association between exposure and occurrence of disease in case-control studies is the so-called odds ratio: the ratio of odds of exposure in diseased subjects to the odds of exposure in the non-diseased. The following table exemplifies the basic method of calculating the odds ratio in a case-control study.

Exposure	Disease
Exposure	Yes (cases)	No (controls)
Yes	a	b
No	c	d
Odds of exposure	a/c	b/d

The odds ratio (OR) or the ratio of odds of exposure is thus given by a/c:b/d (or ad/bc). The odds ratio is generally a good estimate of the relative risk. The terms odds ratio and relative risk are in fact interchangeable when used in case-control studies.

Population and hospital-based case-controls studies

Depending mainly on the infrastructure of the health care services and health and other information systems in the setting where research is undertaken, investigators can choose to undertake a population or a hospital-based case-control study. The population-based study requires full coverage of cases occurring in the population being studied. Either all ascertained cases during a given time period, or a sample of them, are included in the study. Controls can be selected from population registers, electoral rolls or similar rosters which include all subjects in the population. The pre-requisites for population-based studies are often at hand in developed countries for most diseases requiring hospitalization. In developing countries, full ascertainment of cases of a specific disease is difficult to obtain. This is not only due to limited access to health care and lack of knowledge of how to access health care facilities, but also to financial barriers to health care for certain segments of the population. In such circumstances, when only an unknown proportion of cases of disease will be ascertained by investigators and case ascertainment is governed by behavioural, social and economic factors, the hospital-based case-control study is the best choice (15). Cases in a hospital-based study are identified in hospitals participating in the study, and controls are selected from the same hospital to which the case was first admitted. When cases and controls are selected from among subjects in a cohort study the term " nested case-control study " is used.

An example of a population-based case-control study is a joint Swedish-Norwegian study of the association between the use of OCs and breast cancer in young women (9). In this study, in Sweden, during 13 months from 1984 to 1985, newly diagnosed cases of breast cancer in women of less than 45 years of age were identified from the National Cancer Registry, and in Norway they were traced in collaboration with all 71 surgical departments in the country. Age-matched controls were selected from the two countries’ National Central Bureaus of Statistics which carry continuously updated population registers. The material of this study involved 422 cases of breast cancer and 722 controls. Practically all interviews took place in the home of the women. As an aide to recall contraceptive history, a calendar was used in which life events such as menarche, cohabitation, marriage, divorce, childbirth, and abortions were recorded. The contraceptive history was then recalled relative to life events. To facilitate recall of the name of various OC brands, the interviewers had a binder with photographs of the different packages of OCs having been used in the two countries.

The World Health Organization’s Collaborative Study of Neoplasia and Steroid Hormone Contraceptives is an example of a hospital based case-control study (13). The study was initiated to explore the possible associations between use of steroid hormonal contraceptives and cancers of the breast, cervix, endometrium, liver, gallbladder, and ovary. There were 12 participating centres in Australia, Chile, China, Colombia, Germany, Israel, Kenya, Mexico, the Philippines and Thailand. Data collection took place from 1979 to 1986. In each hospital, cases were detected by monitoring all new admissions to wards where women with cancer were treated, and by checking outpatient gynecological and tumor clinics, and records of hospital pathology laboratories. Cases included all women diagnosed locally as having a malignant tumor of the six sites mentioned, who were born either after 1924 or after 1929 (depending on when hormonal contraceptives were first locally available), and who resided during the preceding year in a defined geographical area served by the hospital.

Controls were selected from women admitted to other than obstetric and gynecological wards, who met the same age and residential criteria for eligibility as the cases, and who were not admitted for treatment of conditions considered a priori possibly to alter contraceptive practices, such as, for example, circulatory and cardiovascular diseases, diabetes, chronic renal disease, benign breast disease, a previously diagnosed malignancy, chronic liver disease, or any obstetrical or gynecological condition.

About two controls were selected per case. A list of wards from which controls were to be selected was developed for each hospital. Each week, wards were visited in the order listed. At the time of a visit, all women eligible as controls who were admitted to the ward within the past 24 hours were selected as controls. The next ward on the list was then visited, and this procedure was repeated until sufficient controls were selected to give a cumulative ratio of two controls per case from the hospital. A standardized questionnaire was administered to all study subjects by specially trained female interviewers, to obtain information on the known and suspected risk factors for the neoplasms under study, and a complete obstetric and contraceptive history. Nearly all interviews were conducted in hospitals.

A calendar and samples of locally available oral contraceptives were used to facilitate recall of times of use and products taken. In addition, the medical records of women who gave a history of oral contraceptive use were reviewed when available, and in such instances, information from both interviews and these records were utilized by the interviewers to record details of the women’s use.

Confounding and bias

As mentioned above, cohort and case-control studies are observational studies and are potentially subject to the effect of extraneous factors which may distort the findings of these studies. The term confounding—or confounding factor—used in this context, refers to an extraneous variable that satisfies both of two conditions: it is a risk factor for the disease being studied, and it is associated with the exposure being studied but is not a consequence of exposure (12). For example, a large number of studies have demonstrated that smoking during pregnancy reduces fetal growth, and infants of mothers who smoke have lower weight at birth than infants of mothers who do not smoke during pregnancy. It is also shown that women with a history of induced abortion are more often cigarette smokers than are women who have not had an abortion. In a study of the effect of induced abortion on the outcome of childbirth after induced abortion, there is an apparent risk of attributing a finding of lower birthweight of infants of women with a previous abortion to the induced abortion, although the lower birth weight may just as well have been caused by smoking during pregnancy. Adjusting for the effects of confounding factors is evidently important in observational epidemiological studies, and can be dealt with in the study design by matching or stratifying sampling of study subjects, or in the data analysis by stratified or multivariate analyses (4,10,12).

Another potential complicating factor of not only observational but practically all types of research, is bias. Bias has been defined as any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of disease (12). Sackett has provided an extensive discussion of various types of bias (11). One type of bias frequently referred to in epidemiological research is " recall bias ", namely the propensity of diseased subjects (cases) when interviewed, to scrutinize their memory and report more accurately on past exposure and possible causes of their disease than non-diseased subjects (controls) would do. Such recall bias has been documented (3,5,7). In one study of the relationship between the history of induced abortion and risk of breast cancer, data from objectively validated sources gave a relative risk of 0,6 whereas the relative risk was 0,9 or 50% higher when it was based on data from interviews of cases and controls (7).

Advantages and disadvantages of cohort and case-control studies

When faced with a research question concerning the association between a possible etiologic factor and disease, the epidemiologist has to choose an appropriate strategy to resolve the matter. A number of circumstances have to be considered, such as the incidence rate of disease, time elapsing between exposure and clinical manifestation of the disease, whether the exposure is associated with only one or more diseases, the urgency of the research question, ethical issues, and funding available for the research, to name a few. In taking such factors into account the investigator may find that other research strategies than cohort or case-control methodology are appropriate. The following table lists some of the major advantages and drawbacks of cohort and case-control studies and can serve as a quick guide for choice of research strategy.

Cohort studies

Advantages.

Allow complete information on the subject’s exposure, including quality control of data, and experience thereafter.
Provide a clear temporal sequence of exposure and disease.
Give an opportunity to study multiple outcomes related to a specific exposure.
Permit calculation of incidence rates (absolute risk) as well as relative risk.
Methodology and results are easily understood by non-epidemiologists.
Enable the study of relatively rare exposures.

Disadvantages.

Not suited for the study of rare diseases because a large number of subjects is required.
Not suited when the time between exposure and disease manifestation is very long, although this can be overcome in historical cohort studies.
Exposure patterns, for example the composition of oral contraceptives, may change during the course of the study and make the results irrelevant.
Maintaining high rates of follow-up can be difficult.
Expensive to carry out because a large number of subjects is usually required.
Baseline data may be sparse because the large number of subjects does not allow for long interviews.

Case-control studies

Advantages.

Permit the study of rare diseases.
Permit the study of diseases with long latency between exposure and manifestation.
Can be launched and conducted over relatively short time periods.
Relatively inexpensive as compared to cohort studies.
Can study multiple potential causes of disease.

Disadvantages.

Information on exposure and past history is primarily based on interview and may be subject to recall bias.
Validation of information on exposure is difficult, or incomplete, or even impossible.
By definition, concerned with one disease only.
Cannot usually provide information on incidence rates of disease.
Generally incomplete control of extraneous variables.
Choice of appropriate control group may be difficult.
Methodology may be hard to comprehend for non-epidemiologists and correct interpretation of results may be difficult.

Assessment of causality

One of the more difficult tasks in epidemiological research is to assess whether associations between exposure and disease derived from observational epidemiological studies are of a causal nature or not. It has been underlined above that observational epidemiological studies are subject to the influence of factors over which the investigators most often do not have full control, and that findings from these studies are less reliable than those of studies with an experimental research design. It is therefore imperative that findings from analytical epidemiological studies are critically scrutinized before any judgement of causality is made. Furthermore, findings from one single epidemiological study only exceptionally provide conclusive evidence of a causal relationship between exposure and disease. Discussions and reasoning concerned with which criteria to apply for the assessment of causality have been given by several authors (1,2,6).

Bradford Hill has listed nine aspects concerned with the association between exposure and disease which need to be considered. The first of these is the strength of the association. A strongly elevated relative risk is more likely to reflect a causal association than is a slightly or moderately increased risk. Consistency of findings across studies conducted with different methodologies and in different settings, is another aspect. A third characteristic is specificity, that the exposure causes a particular disease, e.g. the observation that cigarette smoking is associated with squamous cell carcinoma of the respiratory tract. An important condition is the sequence of events: the potentially causative factor must precede the effect, which in this context is disease. The dose-response relationship, or biological gradient, is another aspect. For example, massive exposure to sunlight is more likely to cause melanoma in susceptible individuals than is little or moderate sunlight. Biological plausibility is an aspect which is important, but depends on the biological knowledge of the day. The association should be consistent with what is generally known about the occurrence of the disease, its natural history and pathophysiology, and should not conflict with this knowledge. The causal interpretation of an association is furthered if there is experimental evidence in support of it, for example if elimination of exposure reduces the incidence of the disease. The ninth aspect is analogy. For example, if a virus is shown to be oncogenic in animal studies, we are more prone to accept that the human papilloma virus may be the cause of cervical cancer in humans. In his essay on association and causation, Bradford Hill notes that " none of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non ". The challenge of assessing causation is one of many fascinating aspects of epidemiological research.

References

Evans, A.S. (1978): Am. J. Epidemiol., 108:249-258.
Hill, A.B. (1965): Proc. Royal Soc. Med., 58:7-12.
Hogue, C.L. (1975): Am. J. Obstet. Gynecol., 123:675-681.
Kleinbaum, D.G., Kupper, L.L., and Morgenstern, H. (1982): Epidemiologic Research. Principles and Quantitative Methods. Wadsworth, Belmont.
Klemetti, A., and Saxen, L. (1967): Am. J. Publ. Health, 57:2071-2075.
Lilienfeld, A.M., and Lilienfeld, D.E. (1980): Foundations of Epidemiology. Oxford University Press, London.
Lindefors-Harris, B-M., Eklund, G., Adami, H-O., and Meirik, O. (1991): Am. J. Epidemiol., 134:1003-1007.
Meirik, O., and Bergstrom, R. (1983): Acta Obstet. Gynecol. Scand., 62:499-509.
Meirik,O., Lund, E., Adami, H-O, Bergstrom, R., Christoffersen, T., and Bergsjo, P. (1986): Lancet, ii:650-654.
Rothman, K.J. (1986): Modern Epidemiology. Little, Brown and Company, Boston.
Sackett, D.L. (1979): J. Chron. Dis., 32:51-63.
Schlesselman, J.J. (1982): Case-Control Studies. Design, Conduct, Analysis. Oxford University Press, New York.
The WHO Collaborative Study of Neoplasia and Steroid Contraceptives. (1990): Br. J. Cancer, 61:110-119.
Vessey, M., Doll, R., Peto, R., Johnson, B., and Wiggins, P. (1976): J. Biosoc. Sci., 8:373-427.
Wacholder, S., Silverman, D.T., McLaughlin, J.K., and Mandel, J.S. (1992): Am. J. Epidemiol., 135:1029-1041.

Contents