# Introduction

Traditionally, case-control studies have been viewed as an alternative to cohort studies in which individuals were selected on the basis of whether or not they had the disease outcome of interest, with investigators then comparing exposure history between those with the disease (the cases) and those free of the disease (the controls). More recently, the theory behind the case-control study has been re-imagined as a method of selecting a subset of an underlying cohort giving rise to the cases in the study. The case-control study design is an excellent choice of study when the disease is rare, has a long induction period, the exposure data are difficult to obtain, or very little is known about the disease. In these situations a cohort study is generally prohibitively expensive or the time frame of the study required for data collection is impractical. Unlike cohort studies, case-control studies identify cases of the disease of interest in their population and then compare their exposure experience to sampled controls. As you will learn in the following exercise, this method is deceptively simple, and if not planned carefully, can lead to spurious findings.

## Faculty Highlight: Dr. Alfred Neugut

"The case-control study's simplicity sometimes makes us forget just how elegant and revolutionary a creation it was of 20th century chronic disease epidemiology."

Dr. Neugut is a Professor of Medicine and the Myron M. Studner Professor of Cancer Research. His research interests span the epidemiology and screening of colorectal neoplasia, breast cancer etiology and treatment, racial disparities in cancer, and cancer in the elderly. He serves as co-PI of the Long Island Breast Cancer Study, a large population-based case-control study.

# Learning Objectives

A. Learn to apply the main features of case-control design:

1. Formulate research hypotheses
2. Define your source population and the eligibility criteria for cases and controls
3. Define your exposure and outcome of interest
4. Describe methods of accrual for cases and controls

B. Employ steps in data analysis of case-control studies to analyze the data:

1. Calculate disease odds ratio
2. Calculate exposure odds ratio
3. Explain why exposure odds ratio is identical to the disease odds ratio

C. Explain your findings and discuss problems in data analysis:

1. Discuss the principles of proper selection of cases and controls
2. Discuss the relationship between odds ratio and risk ratio

# Student Role

Susser Syndrome, a rare and debilitating neurological disease, is striking the people of Epiville!

 BREAKING NEWS

"You are watching WEPI Channel 1 news. From our Health and Medicine Desk - doctors at the Epiville General Hospital are reporting a dramatic rise in the number of patients suffering from Susser Syndrome, a rare and debilitating neurological disease with symptoms ranging from dizziness, double-vision, fainting spells, and difficulty in concentration to more severe symptoms such as loss of smell, facial tics, and loss of conscious control of bodily movements. Susser Syndrome came to prominence during an apparent outbreak in London immediately following World War II, although the cause of the disease was never discovered. Until recently, only sporadic cases have been found in the United States. Hospital officials report a steady rise in new cases over the past two years with a rather dramatic upswing since March of this year.

"The Epiville Health Department has deployed a team of public health officials consisting of medical epidemiologists, biostatisticians, and research assistants to determine the magnitude of the outbreak of Susser Syndrome, as well as to identify possible causes and preventive measures.

"Doctors warn that all residents of Epiville may be at risk for developing the disease. However, Channel 1 has exclusively discovered that many of the infected individuals are members of the Superfit Fitness Center, located in the Epiville industrial park. Superfit is owned and operated by Glop Industries. When asked about this information, representatives from Glop declined comment."

Based on your own research and the newscast, you decide to gather more information about both the Superfit Fitness Center and Glop Industries. Visit the following sites to obtain more information on the situation:

You decide to visit the Superfit Fitness Center. Upon entering, you are immediately greeted by Mr. Abe Crunch who is very responsive to your questions.

 Interview - Abe Crunch

"I've been the manager here at Superfit since we opened. As you probably know, Superfit is actually owned by SUPERCLEAN Industries. As you can see, we have nothing but state-of-the-art facilities, state-of-the-art equipment, and all of our trainers are certified. Frankly, this place is awesome. We even have a gift shop! SUPERCLEAN Industries has a licensing agreement with Glop Industries and as a result we only carry products manufactured by Glop. In fact, we carry some things that you can't even get outside of here. For example, all of our members exclusively drink the sports drink Quench-It and eat the energy bar EnduroBrick - those two products cannot be found in retail stores. And rather than have them walking around with money while they're working the free weights, we issue each member a fitness credit card they use to keep track of their food and drink and other purchases. As for a number of members being sick, well, the only thing I can say is, we all get sick from time to time. I mean, who hasn't been sick?"

Your next goal is to meet with Hank Lockjaw, the production floor foreman at Glop Industries.

 Interview - Hank Lockjaw

"My grand-daddy was a Glop man, my daddy was a Glop man, and I've been a Glop man my whole life. I was promoted to foreman a few years back, just before we started making SUPERCLEAN. Right behind you is the production line of Quench-it. Those plastic bottles shoot down the conveyer belt and then are filled up with the drink. The hard part is keeping the bottles nice and clean. Just last week we installed a brand new sterilization system. A couple of years ago we used to rinse the empty bottles with SUPERCLEAN to sterilize them before adding Quench-it. Before that we just used hot water. Now, we only use this gamma radiation trick to sterilize the bottles. It's supposed to be more efficient and cheaper in the long run than using the SUPERCLEAN. Like I said, we only switched last week so time will tell. Let's see - over in the other corner is the EnduroBrick line. The bars ride on the conveyer belt through that big oven which serves to not only bake them but to sterilize them as well. My job is to make sure everything runs nice and smooth and I'll tell you, there's been not a single breakdown on my watch."

# Study Design

Now that you have thoroughly assessed the situation, you have enough information to generate some hypotheses. The two suspected causal agents of the outbreak of Susser Syndrome are Quench-It and EnduroBrick. Use the case-control method to design a study that will allow you to compare the exposures to these products among your cases of Susser Syndrome and healthy controls of your choice. From all of your class work, you know that you want your hypotheses to be as explicit and detailed as possible.

1. Based on the information you gathered, which of the following hypotheses is the most appropriate for your case-control study?

Answer (a) — correct: These hypotheses clearly state the expected causal factors (EnduroBrock and Quench-it), and the expected direction of effect. They are sufficiently explicit to allow the hypotheses to be tested after collecting data. In addition, we know that, although a case-control methodology selects individuals based on disease status and compares the exposure distribution between cases and controls, a properly conducted case-control design can compare the disease experience of the exposed to the disease experience of the unexposed.
Answer (b) — incorrect: This hypothesis is not specific enough. Affiliation with the Superfit Fitness Center is associated with many different factors. A study hypothesis should clearly define exposures and outcomes.
Answer (c) — incorrect: This hypothesis does not state a potentially causal link between exposures and outcome, and is not specific enough to allow for collected data to support or refute it.

Now that you have hypotheses, the next step is to prepare the case definition. This requires us to understand how Susser Syndrome is diagnosed. The more certain you are about your diagnosis the less error you will introduce into your study by incorrectly specifying cases. Based on information from the EDOH website, you decide that your case definition will be based on a clinical diagnosis of Susser Syndrome.

After you establish your case definition, you need to decide on the population from which the cases for your study will be obtained. Since the majority of cases from the recent outbreak were active members of the Superfit Fitness Center, you decide to base your study on this population.

Next you need to decide how you will classify your cases and controls based on exposure status. Remember, we are actually operating under two hypotheses here, each with its own unique exposure variable. Scientists working on the possible causal connection between consumption of EnduroBrick or Quench-It and the development of Susser Syndrome suggest that both exposures may have an Induction time of at least 6 months. Under this hypothesis, any cases of Susser Syndrome that occurred within 6 months of initial consumption of either EnduroBrick or Quench-It could not have plausibly been caused by the exposure. Thus, you stipulate that at least 6 months are required to have elapsed since the initial exposure, before your individual will be considered "'exposed".

Once all of these decisions have been made, it is time to create appropriate eligibility criteria for your cases and controls.

2. Which of the following do you think are the best eligibility criteria for the cases? [Aschengrau & Seage, pp. 239-243]

Answer (a) — incorrect: It may seem obvious but it is important to state that cases need to have been correctly diagnosed with Susser Syndrome; additionally, we are interested in determining the exposure status and thus should not require that all cases had to have been exposed. Finally, it is important to separate two exposures and to consider their effects separately.
Answer (b) — incorrect: We are interested in possible exposures at the Superfit Fitness Center and not at Glop Industries. Therefore, employment at Glop Industries should not be among the eligibility criteria.
Answer (c) — correct: We want our eligibility criteria to be as specific as possible. You have already stated the desire to limit the study to Superfit members (which will make the selection of the control group easier...see below!!); it is crucial to incorporate time elements in our study design to ensure that our exposures of interest could be plausibly associated with the development of Susser Syndrome. Without these criteria we cannot proceed to statistical analysis of the suspected associations.

Now you need to decide who is eligible to be a control.

You recall from your wonderful learning experience in P6400 that valid controls in a case-control study are individuals that, had they acquired the disease under investigation, would have ended up as cases in your study. The best way to ensure this is to sample controls from the same population that gave rise to the cases. To ensure that the controls accurately represent a sample of the distribution of exposure in the population giving rise to the cases, they should be sampled independently of exposure status.

3. Which of the following do you think are the best eligibility criteria for the controls?

Answer (a) — incorrect: While it is correct that controls should not have the diagnosis of Susser Syndrome, it is also essential that cases and controls come from the same source population. In this scenario, cases come from the population of people who attend Superfit Fitness Center, while controls come from the general population of Epiville. Since the exposures of interest are only available at the Superfit Fitness Center, selecting controls from the general Epiville population will artificially create an association between these exposures and the outcome, since the controls do not represent the exposure distribution of the source population giving rise to the cases. More simply, the controls under this scheme would not necessarily become cases if they were diagnosed with the disease, since cases are restricted to Superfit members.
Answer (b) — incorrect: Controls should not be diagnosed with Susser Syndrome. Additionally, since we are interested in the effects of exposure variables, we should not select our controls based on their exposure to EnduroBrick or Quench-It. Selection of controls must always be independent of exposure.
Answer (c) — correct: An important point is that controls should be classified as cases if they develop Susser Syndrome; in other words, controls should meet the eligibility requirements for cases except for their disease status.

Now that the eligibility criteria have been set, you must determine the specifics of the case-control study design.

## How many cases and controls should you recruit?

The answer to this question obviously depends on your time and resources. However, an equally important consideration is how much power you want the study to have. Conventionally, we want a study's power to be at least 80 percent in being able to find a significant difference between the groups. Generally, if the study has less than 80 percent power, we conclude that the study is underpowered. This does not mean our results are incorrect; but if we observe an insignificant result in an underpowered study we may not be able to tell whether this is because there truly is no association or whether this is due to the lack of power in the study.

## Intellectually Curious?

After crunching the numbers, you determine that the study will require the following size to achieve a desired power of 80 percent:

Number of cases: 112
Number of controls: 224
Total number of subjects: 336

Bear in mind that the study is voluntary. Subjects, even when eligible, are in no way required to participate. Furthermore, subjects may drop out of the study before completion, further decreasing your sample size. Study participation depends in large part on the methods of recruitment. In-person recruitment is generally regarded as the most effective, followed by telephone interviews, and then mail invitations. The participation rate that you expect to achieve, given your method of recruitment, will help you to calculate approximately how many individuals you will need to contact in order to meet your sample size.

Should you recruit cases and controls simultaneously or cases first and then all controls? Learn more here.

# Data Collection

With the design of the case-control study complete, you now begin planning the protocol for data collection.

4. What is the best source for ascertaining cases? [Aschengrau & Seage, pp. 236-239]

Answer (a) — correct: Recall from the Epiville Department of Health website that a diagnosis of Susser Syndrome requires a neurological consult. In addition, according to the EDOH website, the neurological symptoms of Susser Syndrome are so severe that all cases will end up at the local hospital. Therefore, a review of the hospital charts should offer an excellent source of valid data on every case in Epiville.
Answer (b) — incorrect: We might miss individuals with the Syndrome who never saw the staff nurse and might erroneously include those whose symptoms are not due to Susser Syndrome.
Answer (c) — incorrect: While some cases will see a neurologist for their symptoms, available information on the Susser Syndrome (Epiville Department of Health website) indicates that some will be very ill and will be admitted directly to the hospital emergency room before having visited a specialist. Thus, we might miss persons with the Syndrome, or we might erroneously include persons whose symptoms are not Susser Syndrome.

5. What is the best method of ascertaining the cases? [Aschengrau & Seage, pp. 236-239]

Answer (a) — incorrect: Since we defined our source population as those who attend the Superfit Fitness Center, we need to limit our cases to those arising from this population.
Answer (b) — incorrect: Although logical, this is not the best method of ascertaining cases since one's gym membership is not normally part of a medical workup. In addition, cases that terminated their club membership as a result of their illness would be missed.
Answer (c) — correct: Modern computer linkage techniques allow for correct identification of approximately 95% of people based on their name and birth date. Because the local hospital treats all potential cases, the only cases you are likely to miss are those who moved out of the area before treatment or those patients discharged after the study end date. Because of the setup, all cases would come from the population attending the Superfit Fitness Center.

6. What is the best way to accrue the controls? [Aschengrau & Seage, pp. 239-243]

Answer (a) — incorrect: This method would be unlikely to produce enough controls since the majority of Superfit members would not be admitted to the hospital, because they are otherwise a young and healthy population. In addition, inclusion of controls with other diseases may introduce bias because other diseases may also be related to the exposure of interest and thus controls would not be sampled independently of exposure.
Answer (b) — incorrect: Given Susser Syndrome's induction time of 6 months, some of the new members will not have the opportunity to be classified as "exposed," meaning that this sampling technique selects controls dependent on exposure status. In addition, this is not a random sample as new members may not be representative of the entire population of those attending the Fitness Center.
Answer (c) — correct: Because of the membership at the club, these controls will be representative of the underlying source population from which the cases emerged. Also, you must make sure that they do not have the disease.

Now you must decide how you will assess exposure to Quench-It and Endurobrick in your case-control study.

7. Given the study design, what is the best way to define the exposure variable?

Answer (a) — incorrect: This is a case-control study in which the exposure has already occurred. Study participants are enrolled based on their disease status, after which exposure is compared between those with the disease and those without the disease.
Answer (b) — incorrect: This is commonly employed and it is probably the fastest and cheapest method; however, it is problematic here as individuals may have difficulty recalling the amounts consumed over a 2- year period. In addition, now that the news is reporting a connection between Susser Syndrome and Superfit Fitness Center, the interviewee's responses may be differentially affected by their concerns over what they believe may be the cause of the condition
Answer (c) — correct: This will provide the most accurate assessment of exposure since it does not depend on the ability of the subject to recall their exposure. Furthermore, we know EnduroBrick and Quench-It are not available in retail stores, so there is no possibility of subjects consuming more than is recorded by the fitness center. (It is not perfect, however, as purchasing EnduroBrick and Quench-It is not a guarantee that the individual actually consumed it!!!)

Having received the necessary IRB approval and addressed various administrative details, the study may now commence. The data begin to file back to the Department of Health and must now be carefully entered into the computer database. Once all of the data are entered, you can proceed to the analysis stage where the associations proposed in your two hypotheses are characterized and tested.

# Data Analysis

The data collected yield the following counts:

Total number of Cases: 112
Total number of Controls: 224
Number of Cases who ingested Endurobrick: 28
Number of Controls who ingested Endurobrick: 56
Number of Cases who consumed Quench-it: 50
Number of Controls who consumed Quench-it: 56

8. How would you set up the classic 2x2 table using the above information to test the hypothesis that cases are more likely to have ingested EnduroBrick than controls?

none:
Case Control Total
Exposed (Endurobrick) 28 56 84
Unexposed (No Endurobrick) 84 168 252
Total 112 224 336
none:

Odds of exposure among cases (# of Cases exposed) / (# of Cases unexposed)

28 / 84 = 0.333

none:

Odds of exposure among controls (# of Controls exposed) / (# of Controls unexposed)

56 / 168 = 0.333

none:

OR = (Odds of Exposure among Cases) / (Odds of Exposure among Controls)

(28/84) / (56/168) = 1.0 or 0.33 / 0.33 = 1.0

none:

Odds of disease among exposed (# of Cases Exposed) / (# of Controls Exposed)

28 / 56 = 0.50

none:

Odds of disease among unexposed (# of Cases Unexposed) / (# of Controls Unexposed)

84 / 168 = 0.50

none:

OR = Odds of disease among Exposed / Odds of disease among Unexposed

(28/56) / (84/168) = 1.0 or 0.50 / 0.50 = 1.0

none:

Individuals with Susser Syndrome (cases) have the same odds of having ingested EnduroBrick as those without Susser Syndrome (controls). Conversely, individuals who ate EnduroBrick have the same odds of developing Susser Syndrome as those who did not eat EnduroBrick. An Odds Ratio = 1.0 suggests that there is no association between Susser Syndrome and EnduroBrick ingestion.

9. How would you set up the classic 2x2 table using the above information to test the hypothesis that cases are more likely to have consumed Quench-It than controls?

none:
Case Control Total
Exposed (Quench-it) 50 56 106
Unexposed (No Quench-it) 62 168 230
Total 112 224 336
none:

Odds of exposure among cases (# of Cases exposed) / (# of Cases unexposed)

50 / 62 = 0.806

none:

Odds of exposure among controls (# of Controls exposed) / (# of Controls unexposed)

56 / 168 = 0.333

none:

OR = (Odds of Exposure among Cases) / (Odds of Exposure among Controls)

OR = (50/62) / (56/168) = 2.4 or 0.806 / 0.333 = 2.4

none:

Odds of disease among exposed (# of Cases Exposed) / (# of Controls Exposed)

50 / 56 = 0.893

none:

Odds of disease among unexposed (# of Cases Unexposed) / (# of Controls Unexposed)

62 / 168 = 0.369

none:

OR = Odds of disease among Exposed / Odds of disease among Unexposed

OR = (50/56) / (62/168) = 2.4 or 0.893/0.369 = 2.4

none:

Individuals with Susser Syndrome (cases) have 2.4 times higher odds of having consumed Quench-It than those without Susser Syndrome (controls). Conversely, individuals who drank Quench-It have a 2.4 times higher odds of developing Susser Syndrome than those who did not drink Quench-It. The OR = 2.4 supports a positive association between Susser Syndrome and Quench-It consumption.

Answer (a) — incorrect: The OR of 2.4 supports an association between Susser Syndrome and Quench-It consumption. The OR of 1.0 suggests no association between Susser Syndrome and EnduroBrick ingestion.
Answer (b) — incorrect: You must not confuse association with causation. The data suggest that Quench-It is associated with Susser Syndrome development whereas EnduroBrick is not. However, as detailed in Aschengrau & Seage (pp. 383-405), to move from association to causation requires a substantial amount of epidemiological evidence as well as biological plausibility. At this stage in the investigation, we are far from having enough data to conclude that Quench-It is the cause of Susser Syndrome.
Answer (c) — correct: The data do suggest that Quench-It is associated with the development of Susser Syndrome while EnduroBrick is not. However, we need to check the statistical significance of these findings as they may be due to chance. Furthermore, it is important to rule out alternate explanations for the association (such as bias and confounding) before we make a causal claim.

After reporting your results, you decide to do a little bit more detective work. You head over to the Public Health Laboratory records department and check the log file on Quench-It. Since its production, the Health and Food Safety Inspector has taken random samplings of Quench-It back to the lab to analyze for any possible contamination. This is a routine surveillance procedure. Looking over the file you notice something interesting . Since 2002, a substantial amount of SUPERCLEAN has been found in Quench-It, probably a result of the bottle sterilization process. Following the Department of Health's repeated and stern demands, Glop Industries has recently changed its production techniques, and now Quench-It is now completely free of SUPERCLEAN. Those who consumed Quench-It prior to these changes, however, might have been exposed to trace amounts of SUPERCLEAN

# Discussion Questions

Carefully consider the following questions. Write down your answers (1 - 2 paragraphs) for question # 1 within a word document and submit your answers to your seminar leader. Be prepared to discuss all questions during the seminar section.

1. Why is the selection of controls important? What methods of control selection do you know? What principles should we follow when selecting controls?
2. What are the limitations of using questionnaires to assess exposure status in study subjects? What kind of bias could be introduced by the use of questionnaires in determining exposure status? If present, what could be the effects of this bias on the study findings? Does our study design limit or avoid this bias?
3. Epidemiologic case-control studies often report increased risk of an event given exposure, but we know that we can only calculate the odds ratio in a case-control study as opposed to a risk ratio. Is it important to distinguish between a risk ratio and an odds ratio? When does Odds Ratio approximate the Risk Ratio? When does it approximate the Rate Ratio?