Epiville: Bias -- Study Design

1. Investigators recruited both cases and controls from a defined catchment area in the general population. This is often difficult to do in the absence of a comprehensive case registry. Suppose that investigators had recruited all cases of breast cancer from a comprehensive national registry, between August 1, 1996 and July 31, 1997. What type of bias might be introduced if investigators kept the same set of controls as were reported in the Teitelbaum et al. (2007) study?

Selection bias
Recall bias
Self-selection bias

Answer (a) — correct: The control group should represent the exposure probability of the source population that gave rise to the cases (note: exposure probability does not mean the same as probability of exposure in cases and controls; instead, it refers to the exposure probability of the underlying source population of non-cases that gave rise to the cases). Given that the national registry includes cases from across the U.S., it makes defining our source population somewhat tricky. Since controls in this scenario were obtained from the general population of Nassau and Suffolk counties, it is certain that the controls and cases came from different geographic areas. This creates the potential for selection bias since cases and controls did not come from the same source population. Moreover, the controls may systematically differ from the cases in many important ways that are related to the exposure. Resident of Nassau and Suffolk counties have, on average, a higher socio-economic status than most counties in the U.S. If, controls were of a higher SES than cases in the national registry, and SES is believed to be related to both the exposure (lawn/garden pesticide use) and the disease outcome (breast cancer), this could bias the association.

Answer (b) — incorrect: This has nothing to do with recall bias because recall bias is a type of information bias. The problem in this situation is not with obtaining information from cases and controls but with selecting cases and controls.

Answer (c) — incorrect: Case-control studies with low to moderate participation rates are susceptible to self-selection bias arising from: (1) refusal or non-response by participants that is related to both the exposure and the disease, or (2) agreement to participate that is related to both the exposure and disease occurring when participation rates between cases and controls differ in a way that is linked to the exposure (Aschengrau & Seage, p. 266). While self-selection bias may be operating in this situation, the scenario given in the question refers to a different type of bias.

2. Suppose that you are designing a case-control study on the association between lawn/garden pesticide use and breast cancer using subject selection methods described in the interactive module. Which method of accruing cases and controls do you think is the most practical?

Situation 1: Cases were recruited at the Epiville General Hospital, and since it is a tertiary-care facility, they came from different areas, including outside of Epiville. Controls are selected from those attending the Fancypants ^tm weight loss center at the Star Hospital.
Situation 2: Cases were recruited at the Epiville General Hospital and came from different areas, including outside Epiville. Controls are selected from the admissions office of the Epiville General Hospital with diagnoses of any cancer other than breast cancer that are not related to lawn/garden pesticide use or with diagnoses of non-cancer diseases.
Situation 3: Cases were recruited at the Epiville General Hospital and came from different areas, including outside Epiville. Controls are selected from those attending the free clinic at the Epiville General Hospital

Answer (a) — incorrect: Controls are subjects who were drawn from the high-income areas of Epiville (see areas circled by the orange lines). At the same time, cases came from all over Epiville. It is likely that those attending the weight-loss program do not represent the exposure distribution of the population giving rise to the cases obtained from the tertiary care facility since they are on average from higher income areas than the cases (and income is related to lawn/garden pesticide use). Thus, drawing cases and controls from different source populations per this sampling strategy may produce a biased estimate of effect.

Answer (b) — correct: Controls came from many different areas of Epiville, as did cases. In general, this is a more representative sampling method and it will be most likely to ensure that the cases and controls come from the same source population. Teitelbaum et al. (2007) attempted this type of sampling scheme by recruiting both cases and controls for the whole geographical area of Suffolk and Nassau counties. However, the Epiville sampling scheme is not perfect, as cases also came from outside of Epiville while controls did not. This highlights the difficulty of imagining a correct source population.

Answer (c) — incorrect: If this is a relatively rare disease it is not likely that the source population around the hospital will give rise to a sufficient number of cases which would be treated at the Hospital. As such, the majority of cases are from areas other than the areas surrounding the Hospital. Areas surrounding the Epiville General Hospital are marked on the map as areas of comparatively low income. Thus, controls from this area who attend the free clinic could be different from cases on many important factors related to lawn/garden pesticide use in addition to income, creating selection bias.

3. What potential bias could have been introduced if you found out that those who interviewed cases took 30 minutes longer on average than those who interviewed controls?

Selection bias
Information bias
Volunteer bias
Loss to follow-up

Answer (a) — incorrect: Selection bias refers to the way cases and controls were selected. In this situation the problem is not with selection but with the collection of exposure information.

Answer (b) — correct: Information bias occurs when the means of obtaining information from cases and controls differ. This is potentially problematic since it suggests that there may be differences in the way in which exposures are reported and classified between the cases and controls (rather than actual exposure status). It could be that study investigators spent more time with cases to ensure that all exposures were correctly classified. If they interviewed the controls less thoroughly, misclassification may be greater in the controls than in the cases, i.e. misclassification tied to disease status. However, having a longer average interview for cases does not automatically imply that there is information bias, especially if the reason for the differing interview times has to do with the fact that cases reported more exposures than controls due to either the exposure having a true effect on case/control status or if cases were of poor health and needed more time to answer the questions.

Answer (c) — incorrect: Volunteer bias is a type of selection bias resulting from the tendency for one's health to influence his or her decision to participate in a study. One could argue either that healthy people are more likely to volunteer for a study than sick people because they are concerned about their health, or that sick people are more likely to volunteer for a study because they worry about the disease and are seeking a diagnosis, care, etc. Regardless, here we deal not with selection of cases and controls but with differential interviewing methods.

Answer (d) — incorrect: Loss to follow-up is a type of selection bias that pertains mostly to cohort studies. Differential loss to follow-up can bias the results of a cohort study when study subjects who can no longer be located or who no longer want to participate are more or less likely to be exposed and develop the disease than those subjects who remain in the study (Aschengrau & Seage, pp. 268-270).

4. Teitelbaum et al. (2007) reported a significant association between lifetime use of lawn/garden pesticides and breast cancer compared to no lifetime use of lawn/garden pesticides. The duration of exposure varied among both cases and controls. For example, within the cases, some individuals may have reported being exposed more than 5 or 10 years prior to diagnosis, whereas others may have reported current exposure to these pesticides. What potential problems could arise when trying to measure exposures that happened over different time periods?

Misclassification bias
Volunteer bias
Surveillance bias

Answer (a) — correct: Misclassification may be a problem because it is difficult for subjects to accurately recall what pesticides they may have been exposed to over the course of their entire lives, resulting in incorrect exposure classification. If misclassification of exposure is random, or non-differential between the cases and controls (i.e., not associated with disease status), then its effect will be to bias the estimate towards the null effect of 1.0. However, if misclassification of exposure is related to the outcome or differential between the cases and controls (e.g., cases more accurately recall their exposure experience because they are trying to figure out what could have possible caused their diagnosis), its effect could either increase or decrease estimates of effect.

Answer (b) — incorrect: Volunteer bias is a type of selection bias resulting from the tendency for one's health to influence his or her decision to participate in a study. One could argue either that healthy people are more likely to volunteer for a study than sick people, or that sick people are more likely to volunteer for a study than healthy people. Regardless, the problem here is not with selection of cases and controls but with difficulty in obtaining accurate information on exposures in the past.

Answer (c) — incorrect: Surveillance bias is a type of selection bias that pertains to disease ascertainment (Aschengrau & Seage, p. 267). The scenario in this situation pertains to exposure ascertainment.

5. What effect (if any) would you expect if the interviewers were aware of the disease status of the study subjects?

It would benefit the validity of the results since the interviewers would understand more precisely how the exposure is related to the disease and collect better data for the cases.
The results would likely not change.
It could damage the validity of the results by introducing interviewer bias.

Answer (a) — incorrect: One of the principles of conducting a case-control study is to keep those collecting the data blinded to study subject's case status. At the same time, study participants are not told about the specific hypotheses that are being investigated in the study. If the interviewers were aware of the case-control status of participants and decided to collect more complete exposure information from the cases, but not the controls, this would result in misclassification (information bias). While sometimes it is not possible to blind interviewers to a subject's case status, continued training of interviewers and insistence on adherence to interviewing protocols would ensure that the information is collected as similarly as possible from cases and controls.

Answer (b) — incorrect: If interviewers were aware of the subject's disease status, they could introduce misclassification (information bias) by potentially conducting interviews differently with cases and controls.

Answer (c) — correct: If the interviewer is aware of disease status, this might introduce misclassification (information bias), which is characterized by more vigilant data collection (inappropriate probing, leading the subject to "correct" answers, etc.) in the case interviews than in the control interviews.