Epiville

Cohort Study

Data Analysis

Now that you have collected the data, you quickly glance over the information and realize that there are a number of ways to analyze it. The most appropriate analysis of the data collected in this study employs the use of person-time as a way of taking into account the fact that subjects may have been followed for varying amounts of time (Please see Aschengrau & Seage pp. 220-221).

Learn more about person-time calculations . In our retrospective cohort study, all individuals will enter the study at the same moment in time (September 1, two years ago). However, not all will exit at the same time. How can they exit the study? Any number of ways, including:

  1. The development of Susser Syndrome (once they have the disease, they are no longer at risk of developing it);
  2. Death from other competing causes;
  3. Loss to follow-up (Please see Aschengrau & Seage pg. 219-220).

Loss to follow-up presents a unique challenge in epidemiological studies. Clearly, without regular contact with study participants, it may not be possible to estimate when, and if, a person developed the disease of interest. In these situations, your calculations may be severely compromised. Epidemiologists employ two different estimates of effect to assess exposure-disease relationships in cohort studies: the risk ratio and the rate ratio (Please see Aschengrau & Seage pp. 67-69). Since this is your first real work as a budding epidemiologist, you decide to analyze the data using both measures of effect and later on compare them.

6. Calculation of the risk ratio from person-time information. [Aschengrau & Seage, Chapter 3]

The data collected by your team yield the following information:

  • Number of cases among exposed - 74
  • Number of cases among unexposed - 120
  • Total number of exposed individuals - 1,900
    • Low exposure group - 1,000
    • Medium exposure group - 650
    • High exposure group - 250
  • Total number of unexposed individuals - 7,400
  1. How would you present the data in the 2x2 table?
  2. Calculate the risk of disease among the exposed. The formula for calculating risk is: (Number of exposed cases per 2-yr time period) / (Total number of exposed persons per 2-yr time period)
  3. Calculate the risk of disease among unexposed
  4. Calculate risk ratio
  5. Interpret your findings
Answer (a) —
none:
Disease + Disease - Total
Exposed 74 1,900-74 1,900
Unexposed 120 7,400-120 7,400

Disease + Disease - Total
Exposed 74 1,826 1,900
Unexposed 120 7,280 7,400
Answer (b) —
none:

The formula for calculating risk is: (Number of exposed cases per 2-yr time period) / (Total number of exposed persons per 2-yr time period)

= 74/1,900

= 0.0389 (or 39 cases per 1,000 exposed per 2-yr time period)

The risk of developing Susser Syndrome among those exposed to SUPERCLEAN (for at least 6 months) is 39 cases per 1,000 exposed per 2 years.

Answer (c) —
none:

(Number of unexposed cases during 2-yr time period) / (Total number of unexposed persons during 2-yr time period)

= 120/7,400

= 0.0162 (or 16 cases per 1,000 unexposed per 2-yr time period)

The risk of Susser Syndrome among those unexposed to SUPERCLEAN (for at least 6 months) is 16 cases per 1,000 exposed per 2 years.

Answer (d) —
none:

(Risk of disease among the exposed) / (Risk of disease among the unexposed)

= 0.0389/0.0162

= 2.40

Answer (e) —
none: Those who were exposed to chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.40 times higher risk of developing Susser Syndrome than those who were not exposed to SUPERCLEAN production.

Intellectually curious?

In the preceding example, you estimated the magnitude of risk due to exposure to SUPERCLEAN by comparing those with exposure to those without exposure. However, the exposure data could be characterized more accurately by dividing into three exposure categories, i.e., low, medium and high exposure. If the risk increases with the increase in exposure level, then one can conclude that there is a dose-response relationship in the data, i.e. biological dose gradient. The presence of the dose-response relationship strengthens our conviction that the relationship is causal.

Please calculate the incidence risk in the three exposure groups using the following data:

Level of Exposure Disease + Disease - Total
Low 20 980 1000
Medium 30 620 650
High 24 226 250
Unexposed 120 7280 7400
  1. Check your answers here.
Answer —
none:

number of exposed cases pre 2-year time period
total number of exposed persons per 2-year time period

= low exposure group

= 20/1000 = 0.0200 (or 20 cases per 1,000 exposed per 2-yr time period))

Low-dose group = 20/1000 = 0.0200 (or 20 cases per 1,000 exposed per 2-yr time period))
Medium exposure group = 30/650 = 0.046
High exposure group = 24/250 = 0.096
Unexposed group = 120/7400 = 0.0162

Risk ratio calculations:
Relative risk in the low exposure group = 0.020/0.0162 = 1.23
Relative risk in the medium exposure group = 0.046/0.0162 = 2.84
Relative risk in the high exposure group = 0.096/0.0162 = 5.92

What is your conclusion with regard to dose-response relationship in these data?

7. Calculation of the rate ratio [Aschengrau & Seage, Chapter 3].

The data collected by your team yield the following information:

  • Number of cases among exposed - 74
  • Number of cases among unexposed - 120
  • Number of exposed person-time of observation (PYO)- 3,675
    • Low exposure group- 2,000 PYO's
    • Medium exposure group- 1,225 PYO's
    • High exposure group- 450 PYO's
  • Number of unexposed PYO's- 14,550
  1. How would you present the data in the 2x2 format?
  2. Calculate the incidence rate among the exposed. The formula for calculating incidence rate is: (Number of exposed cases during 2-yr time period) / (PYO's among exposed persons during 2-yr time period)
  3. Calculate the incidence rate among the unexposed.
  4. Calculate the rate ratio.
  5. Interpret your findings.
Answer (a) —
none:
Disease + Total PYO's over 2-yr time period
Exposed 74 3,675
Unexposed 120 14,550
Answer (b) —
none:

(Number of exposed cases during 2-yr time period) / (PYO's among exposed persons during 2-yr time period)

= 74/3,675

= 0.0200 (or 20 cases per 1,000 PYO's)

The rate of Susser Syndrome among those exposed to SUPERCLEAN (for at least 6 months) is 20 cases per 1,000 PYO's.

Answer (c) —
none:

(Number of unexposed cases during 2-yr time period) / (PYO's among unexposed persons during 2-yr time period)

= 120/14,550

= 0.0082 (or approximately 8 cases per 1,000 PYO's)

The rate of Susser Syndrome among those unexposed to SUPERCLEAN is 8 cases per 1,000 PYO's.

Answer (d) —
none:

(Rate of disease among the exposed) / (Rate of disease among the unexposed)

= 0.0200/0.0082

= 2.44

Answer (e) —
none: Those who were exposed to chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.44 times higher rate of developing Susser Syndrome than those who were not exposed to SUPERCLEAN production.

8. Calculation of rate ratio in different age strata.

The data collected by your team yield the information:

Age Group Exposed Unexposed
Number of Cases PYO Number of Cases PYO
< 30 43 2,188 75 9,249
≥ 30 31 1,487 45 5,301
Total 74 3,675 120 14,550
  1. Calculate the rate of disease among the exposed in each age group
  2. Calculate the rate of disease among the unexposed in each age group
  3. Calculate the rate ratio in each age group
  4. Interpret your findings
  5. Does the association between SUPERCLEAN and Susser Syndrome seem to vary by age group?
Answer (a) — none:

Age Group 20 - < 30 yrs: 43/2,188 = 0.0197 (or 20 cases per 1,000 PYO's)

Age Group 30 - 40 yrs: 31 / 1,487 = 0.0208 (or 21 cases per 1,000 PYO's)

Answer (b) — none:

Age Group 20 - < 30 yrs: 75 / 9,249 = 0.0081 (or 8 cases per 1,000 PYO's)

Age Group 30 - 40 yrs: 45 / 5,301 = 0.0085 (or 9 cases per 1,000 PYO's)

Answer (c) — none:

Age Group 20 - < 30 yrs: (0.0197 / 0.0081) = 2.43

Age Group 30 - 40 yrs: (0.0208 / 0.0085) = 2.45

Answer (d) — none:

Among persons aged 20 to <30 years of age, those who are exposed to the chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.43 times higher rate of developing Susser Syndrome than those who are not exposed to SUPERCLEAN production.

Among persons aged 30-40 years of age, those who are exposed to the chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.45 times higher rate of developing Susser Syndrome than those who are not exposed to SUPERCLEAN production.

Answer (e) —
none:

No, the rate of Susser Syndrome is similar in both age categories.

If you had chosen instead to compare the rate of Susser Syndrom in the exposed workers at Glop Industries to the rate of Susser Syndrome in the general population (e.g. the city of Epiville), the resulting rate ratio would be called the Standardized Incidence Ratio (SIR).

Learn more on how to calculate the standardized incidence ratio (SIR) here.

9. After putting exhaustive effort into data analysis, you present your findings to your supervisor. What should you tell her?

  1. The elevated estimates (both risk and rate ratios) do not support your hypothesis that exposure to SUPERCLEAN production is associated with Susser Syndrome.
  2. The exposure to SUPERCLEAN production is the definite cause of Susser Syndrome. Those elevated rates are very convincing.
  3. The data clearly suggest an association between exposure to SUPERCLEAN at Glop Industries and successive development of Susser Syndrome. I think we might want to explore other potential exposure sources as well and try to improve exposure measurement.
Answer (a) — incorrect: The elevated estimates (both risk and rate ratios) lend support to your hypothesis that SUPERCLEAN production is associated with Susser Syndrome.
Answer (b) — incorrect: SUPERCLEAN appears to be associated with the development of Susser Syndrome. However, as detailed in Aschengrau (Chapter 15), to move from association to causation requires a substantial amount of epidemiological evidence as well as biological plausibility. At this stage in the investigation, we do not have enough data to arrive at such a conclusion.
Answer (c) — correct: The data do suggest an association; however, we need to check the statistical significance of these findings (i.e., look at the confidence intervals of effect estimates) as they may be due to chance. Furthermore, it is important to rule out other potential exposures as they may confound the findings.