Data Analysis
Now that you have collected the data, you quickly glance over the information and realize that there are a number of ways to analyze it. The most appropriate analysis of the data collected in this study employs the use of persontime as a way of taking into account the fact that subjects may have been followed for varying amounts of time (Please see Aschengrau & Seage pp. 220221).
Learn more about persontime calculations . In our retrospective cohort study, all individuals will enter the study at the same moment in time (September 1, two years ago). However, not all will exit at the same time. How can they exit the study? Any number of ways, including:
 The development of Susser Syndrome (once they have the disease, they are no longer at risk of developing it);
 Death from other competing causes;
 Loss to followup (Please see Aschengrau & Seage pg. 219220).
Loss to followup presents a unique challenge in epidemiological studies. Clearly, without regular contact with study participants, it may not be possible to estimate when, and if, a person developed the disease of interest. In these situations, your calculations may be severely compromised. Epidemiologists employ two different estimates of effect to assess exposuredisease relationships in cohort studies: the risk ratio and the rate ratio (Please see Aschengrau & Seage pp. 6769). Since this is your first real work as a budding epidemiologist, you decide to analyze the data using both measures of effect and later on compare them.
6. Calculation of the risk ratio from persontime information. [Aschengrau & Seage, Chapter 3]
The data collected by your team yield the following information:
 Number of cases among exposed  74
 Number of cases among unexposed  120

Total number of exposed individuals  1,900
 Low exposure group  1,000
 Medium exposure group  650
 High exposure group  250
 Total number of unexposed individuals  7,400
 How would you present the data in the 2x2 table?
 Calculate the risk of disease among the exposed. The formula for calculating risk is: (Number of exposed cases per 2yr time period) / (Total number of exposed persons per 2yr time period)
 Calculate the risk of disease among unexposed
 Calculate risk ratio
 Interpret your findings
none:
Disease +  Disease   Total  

Exposed  74  1,90074  1,900 
Unexposed  120  7,400120  7,400 
Disease +  Disease   Total  

Exposed  74  1,826  1,900 
Unexposed  120  7,280  7,400 
none:
The formula for calculating risk is: (Number of exposed cases per 2yr time period) / (Total number of exposed persons per 2yr time period)
= 74/1,900
= 0.0389 (or 39 cases per 1,000 exposed per 2yr time period)
The risk of developing Susser Syndrome among those exposed to SUPERCLEAN (for at least 6 months) is 39 cases per 1,000 exposed per 2 years.
none:
(Number of unexposed cases during 2yr time period) / (Total number of unexposed persons during 2yr time period)
= 120/7,400
= 0.0162 (or 16 cases per 1,000 unexposed per 2yr time period)
The risk of Susser Syndrome among those unexposed to SUPERCLEAN (for at least 6 months) is 16 cases per 1,000 exposed per 2 years.
none:
(Risk of disease among the exposed) / (Risk of disease among the unexposed)
= 0.0389/0.0162
= 2.40
none: Those who were exposed to chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.40 times higher risk of developing Susser Syndrome than those who were not exposed to SUPERCLEAN production.
Intellectually curious?
In the preceding example, you estimated the magnitude of risk due to exposure to SUPERCLEAN by comparing those with exposure to those without exposure. However, the exposure data could be characterized more accurately by dividing into three exposure categories, i.e., low, medium and high exposure. If the risk increases with the increase in exposure level, then one can conclude that there is a doseresponse relationship in the data, i.e. biological dose gradient. The presence of the doseresponse relationship strengthens our conviction that the relationship is causal.
Please calculate the incidence risk in the three exposure groups using the following data:
Level of Exposure  Disease +  Disease   Total 

Low  20  980  1000 
Medium  30  620  650 
High  24  226  250 
Unexposed  120  7280  7400 
none:
number of exposed cases pre 2year time period
total number of exposed persons per 2year time period
= low exposure group
= 20/1000 = 0.0200 (or 20 cases per 1,000 exposed per 2yr time period))
Lowdose group
= 20/1000 = 0.0200 (or 20 cases per 1,000 exposed per 2yr time period))
Medium exposure group
= 30/650 = 0.046
High exposure group
= 24/250 = 0.096
Unexposed group
= 120/7400 = 0.0162
Risk ratio calculations:
Relative risk in the low exposure group = 0.020/0.0162 = 1.23
Relative risk in the medium exposure group = 0.046/0.0162 = 2.84
Relative risk in the high exposure group = 0.096/0.0162 = 5.92
What is your conclusion with regard to doseresponse relationship in these data?
7. Calculation of the rate ratio [Aschengrau & Seage, Chapter 3].
The data collected by your team yield the following information:
 Number of cases among exposed  74
 Number of cases among unexposed  120
 Number of exposed persontime of observation (PYO) 3,675
 Low exposure group 2,000 PYO's
 Medium exposure group 1,225 PYO's
 High exposure group 450 PYO's
 Number of unexposed PYO's 14,550
 How would you present the data in the 2x2 format?
 Calculate the incidence rate among the exposed. The formula for calculating incidence rate is: (Number of exposed cases during 2yr time period) / (PYO's among exposed persons during 2yr time period)
 Calculate the incidence rate among the unexposed.
 Calculate the rate ratio.
 Interpret your findings.
none:
Disease +  Total PYO's over 2yr time period  

Exposed  74  3,675 
Unexposed  120  14,550 
none:
(Number of exposed cases during 2yr time period) / (PYO's among exposed persons during 2yr time period)
= 74/3,675
= 0.0200 (or 20 cases per 1,000 PYO's)
The rate of Susser Syndrome among those exposed to SUPERCLEAN (for at least 6 months) is 20 cases per 1,000 PYO's.
none:
(Number of unexposed cases during 2yr time period) / (PYO's among unexposed persons during 2yr time period)
= 120/14,550
= 0.0082 (or approximately 8 cases per 1,000 PYO's)
The rate of Susser Syndrome among those unexposed to SUPERCLEAN is 8 cases per 1,000 PYO's.
none:
(Rate of disease among the exposed) / (Rate of disease among the unexposed)
= 0.0200/0.0082
= 2.44
none: Those who were exposed to chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.44 times higher rate of developing Susser Syndrome than those who were not exposed to SUPERCLEAN production.
8. Calculation of rate ratio in different age strata.
The data collected by your team yield the information:
Age Group  Exposed  Unexposed  

Number of Cases  PYO  Number of Cases  PYO  
< 30  43  2,188  75  9,249 
≥ 30  31  1,487  45  5,301 
Total  74  3,675  120  14,550 
 Calculate the rate of disease among the exposed in each age group
 Calculate the rate of disease among the unexposed in each age group
 Calculate the rate ratio in each age group
 Interpret your findings
 Does the association between SUPERCLEAN and Susser Syndrome seem to vary by age group?
Age Group 20  < 30 yrs: 43/2,188 = 0.0197 (or 20 cases per 1,000 PYO's)
Age Group 30  40 yrs: 31 / 1,487 = 0.0208 (or 21 cases per 1,000 PYO's)
Age Group 20  < 30 yrs: 75 / 9,249 = 0.0081 (or 8 cases per 1,000 PYO's)
Age Group 30  40 yrs: 45 / 5,301 = 0.0085 (or 9 cases per 1,000 PYO's)
Age Group 20  < 30 yrs: (0.0197 / 0.0081) = 2.43
Age Group 30  40 yrs: (0.0208 / 0.0085) = 2.45
Among persons aged 20 to <30 years of age, those who are exposed to the chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.43 times higher rate of developing Susser Syndrome than those who are not exposed to SUPERCLEAN production.
Among persons aged 3040 years of age, those who are exposed to the chemicals involved in the SUPERCLEAN production for at least 6 months have a 2.45 times higher rate of developing Susser Syndrome than those who are not exposed to SUPERCLEAN production.
No, the rate of Susser Syndrome is similar in both age categories.
If you had chosen instead to compare the rate of Susser Syndrom in the exposed workers at Glop Industries to the rate of Susser Syndrome in the general population (e.g. the city of Epiville), the resulting rate ratio would be called the Standardized Incidence Ratio (SIR).
Learn more on how to calculate the standardized incidence ratio (SIR) here.
9. After putting exhaustive effort into data analysis, you present your findings to your supervisor. What should you tell her?
 The elevated estimates (both risk and rate ratios) do not support your hypothesis that exposure to SUPERCLEAN production is associated with Susser Syndrome.
 The exposure to SUPERCLEAN production is the definite cause of Susser Syndrome. Those elevated rates are very convincing.
 The data clearly suggest an association between exposure to SUPERCLEAN at Glop Industries and successive development of Susser Syndrome. I think we might want to explore other potential exposure sources as well and try to improve exposure measurement.