Confounding: Print Module


Introduction

You have learned that random error and bias must be considered as possible explanations for an observed association between an exposure and disease. This week we will examine the role of confounding. Unlike random error or bias, confounding is a property of the study population, and occurs when the effect of an exposure on an outcome is mixed together with the effect of a third variable. The following exercise will examine the properties of confounders and describe methods to adjust for confounding through both study design and analysis. (Please see Aschengrau & Seage Chapter 11 for more information).

Faculty Highlight: Dr. Sandro Galea

Dr. Galea is the Gelman Professor and Chair of the Department of Epidemiology at Columbia University's Mailman School of Public Health. He is a physician and an epidemiologist. His primary research has been on the causes of mental disorders, particularly common mood-anxiety disorders and substance abuse, and on the role of traumatic events in shaping population health. In particular, his work seeks to uncover how determinants at multiple levels of influence--including policies, features of the social environment, molecular, and genetic factors--jointly produce the health of urban populations.

Read more about Dr. Galea work

Galea S, Riddle M, Kaplan GA. Causal thinking and complex system approaches in epidemiology. Int J Epidemiol. 2010 Feb;39(1):97-106. Epub 2009 Oct 9.

Uddin M, Aiello AE, Wildman DE, Koenen KC, Pawelec G, de Los Santos R, Goldmann E, Galea S. Epigenetic and immune function profiles associated with posttraumatic stress disorder. Proc Natl Acad Sci U S A. 2010 May 18;107(20):9470-5. Epub 2010 May 3.

Good luck and have fun!


Learning Objectives

A. Basic elements of confounding

  1. Define confounding and distinguish it from bias and chance error
  2. Identify three criteria a variable must fulfill to be a confounder in an epidemiological study
  3. Diagram the relationship of a confounder with exposure and outcome

B. Explain methods to adjust for confounding

  1. Describe ways of handling confounding at the design phase of a study
    1. Randomization
    2. Restriction
    3. Matching
  2. Describe ways of handling confounding at the analysis phase of a study
    1. Stratification
    2. Multivariate adjustment techniques

C. Describe how to evaluate potential confounding in epidemiological data

  1. Explain the difference between a crude and adjusted effect estimate
  2. Discuss what is meant by "residual" confounding

Student Role

For this exercise, we will pay special attention to evaluation of confounding in Teitelbaum et al. 2007. In their study of the association between residential pesticide use and breast cancer, they critically examined the potential confounding role of education. The causal model visually representing the investigators' hypothesis about the effect of education on the pesticide-breast cancer risk is presented below.

study7c_clip_image001_0000.jpg

Work through this interactive exercise which demonstrates the "mixing of effects" when confounding is present in the data, using the example of coffee drinking as a potential risk factor for low birth weight and the mixing of effects that occurs between coffee drinking and smoking.


Design Questions

1. Education, a marker of socioeconomic status, was one of the potential confounders considered in the study of pesticide use and breast cancer in Teitelbaum et al. (2007). Please explain why education was considered to be a potential confounder:

  1. Education is associated with high exposure to pesticide use. Those with higher education may have larger homes with more lawns and gardens in need of pesticide application to enhance their landscape's appearance. Further, more advanced education is associated with a higher risk of breast cancer (hypothesized to operate via a host of factors, including older age at having a first child), and is not in the causal pathway of interest between pesticide use and breast cancer (pesticide use is not hypothesized to cause education).
  2. Education is in the causal path between pesticide use and breast cancer. All variables in the causal pathway are potential confounders regardless of their association with exposure and outcome.
  3. Although education is not associated with pesticide use, advanced education is associated with a higher risk of breast cancer (hypothesized to operate via a host of factors, including older age at having a first child). Moreover, it is not in the causal pathway of interest between pesticide use and breast cancer (pesticide use is not hypothesized to cause education).
Answer (a) — correct: For a variable to be a confounder, it must fulfill three basic properties: 1) be associated with the exposure, 2) be a risk factor for the disease, and 3) not in the causal pathway of interest. Education, a marker of socioeconomic status meets these criteria. First, it is associated with pesticide use, hypothesized to operate via home ownership and the accompanying lawn and/or garden in suburban Nassau and Suffolk counties. Second, it is a risk factor for breast cancer, possibly related to delays in child bearing, which is associated with breast cancer. Finally, it is not hypothesized to be in the causal pathway between pesticide use and cancer.
Answer (b) — incorrect: A confounder cannot be an intermediate step in the causal pathway of interest. If a third variable is in the causal pathway of interest, it is not a confounder but a mediator.
Answer (c) — incorrect: A variable must fulfill three basic properties to be considered a confounder- one, that it is associated with the exposure of interest. The remaining criteria include: a risk factor for the disease, and not in the causal pathway of interest.

2. Would confounding due to education still be a problem if the investigators were able to conduct a cohort study instead of a case-control study?

  1. Confounding would not be a problem in a cohort study
  2. Confounding would still pose a problem in a cohort study
  3. Confounding would be minimal in a cohort study compared to a case-control study
Answer (a) — Incorrect: Confounding is a problem in all observational study designs. Remember, confounding is a "mixing of effects" between an exposure, outcome, and a third variable. Confounding results from the fact that risk factors are generally not evenly distributed between comparison populations (i.e., exposed and unexposed groups) in observational studies. In large experimental studies, randomization usually produces comparison populations that have nearly the same distribution of characteristics, thus eliminating (or minimizing) confounding.
Answer (b) — Correct: Confounding is a problem for all observational study designs. Because epidemiology research concerns human populations, we must always consider that certain characteristics (e.g., age, sex, education, income, etc.) may be unevenly distributed in our study populations.
Answer (c) — Incorrect: Confounding can be just as large in a cohort study as it is in a case-control study. Regardless of design, it is important to consider potential confounders in your work, both in the design and analysis stages of your study.

3. How was potential confounding by age handled in the design stage of the study?

  1. Randomization of subjects into cases and controls
  2. Restriction of cases and controls within a narrow age category
  3. Matching controls to cases on age
Answer (a) — incorrect: Subjects were not randomized in this study. Subjects can only be randomized in experimental designs (i.e., RCT's). The behavior of subjects cannot be manipulated by investigators conducting observational studies (e.g., cohort or case-control studies)
Answer (b) — incorrect: In this study cases and controls were not restricted to any specific age category.
Answer (c) — correct: Controls were frequency matched to cases by age (in 5-year intervals). See Ashengrau & Seage, pp. 295-296 for details on frequency matching. Matching in a case-control study is intended to created comparability in the underlying source population by creating "mini-studies" in which all individuals are within the same specific 5-year age range such that age can have no effect on the exposure disease relation in that particular mini-study.


Data Analysis

[ Follow this link to learn more about the evaluation of confounding. ]
(Note: In the popup window, be sure to scroll down after each correct answer.)

4. Age was a potential confounder in this study. Choose an appropriate diagram representing the relationship of this potential confounder with exposure and outcome.

  1. agediagram_b.jpg
  2. agediagram_c.jpg
Answer (a) — correct: In this diagram age meets the requirements to be a confounder because, as depicted in the diagram, age is a risk factor for breast cancer and is associated with pesticide use, but it is not a result of pesticide use.
Answer (b) — incorrect: This diagram illustrates that age is an intermediate in the pathway between pesticide use and breast cancer. If a factor is in the pathway between exposure and outcome, it is called a mediator.
Answer (c) — incorrect: In this diagram, age does not meet the necessary conditions for a confounder because it is not associated with the exposure.

5. Explain how you would assess whether a potential confounder alters an effect estimate after adjusting for it in a multivariate model.

  1. Look at the crude OR
  2. Look at the adjusted ORs
  3. Compare the crude OR to the adjusted OR
Answer (a) — incorrect: You must compare the crude and adjusted OR's to evaluate confounding. Remember, the crude estimate simply reflects the association between the exposure and outcome; it does not take into account the effect of potential confounders.
Answer (b) — incorrect: You must compare the crude and adjusted OR's to evaluate confounding. Remember, the adjusted estimate simply reflects the association between the exposure and outcome after controlling for a potential confounder. Without the crude to compare back to, we would not know what happened to the OR after taking other risk factors into account.
Answer (c) — correct: It is important to compare the adjusted OR with the crude OR to see the change in the effect estimate. To evaluate the magnitude of confounding, the rule of thumb is to look at the percent change in the adjusted estimate. If the adjusted estimate differs from the crude by 10% or more, then it is customary to consider that variable as a confounder. The adjusted odds ratio is reported to describe the exposure-disease association controlling for the confounder.

6. Lawn/garden pesticide use was significantly associated with breast cancer after adjusting for age, level of education, and other combined pest group (OR=1.34, 95% C.I. 1.11-1.63). Given that investigators also determined that a host of other factors (e.g., age of menarche, oral contraceptive use, and family history of breast cancer) did not meet the criteria for confounders, can the authors conclude that they have removed all sources of confounding in the examination of this association?

  1. Yes. Investigators have examined all potential risk factors for the outcome, adjusted for necessary confounders, and can be fully confident that their estimate is free of confounding.
  2. No. Authors did not adjust for family history of breast cancer, a known risk factor, and thus the association reported may be biased.
  3. Yes. Because authors conducted a case-control study, most of the confounding was removed in the design stage of the analysis because cases and controls are comparable on risk factors.
  4. No. Investigators can never be fully confident that confounding is eliminated in an observational study.
Answer (a) — incorrect: We can never be absolutely sure that our estimates are unbiased. First, it is unlikely that all measured confounders were measured without any error. For example, in the Teitelbaum et al. (2007) study, individuals self-reported on history of oral contraceptive use. It is likely that women were not 100% accurate in their memory of length of oral contraceptive use and exact dose, and thus the measurement of this variable is less than perfect. Further, there may be unmeasured confounding affecting the association. In an observational study such as this case-control study, we can never be sure that we have measured all confounders of the association. However, experimental studies such as RCT design, when sufficiently large, are capable of creating, on average, comparability between exposed and unexposed on all measured as well as unmeasured confounders by randomization of exposure.
Answer (b) — incorrect: The investigators reported that adjusting for history of breast cancer did not appreciably affect the results of the study. That is, among those with a history of breast cancer, the association between lawn/garden pesticide use and breast cancer was not different than: a) those without a history of breast cancer, and b) the crude estimate unadjusted for family history of breast cancer. Thus, family history of breast cancer (as measured) did not contribute to confounding of the association between lawn/garden pesticide use and breast cancer.
Answer (c) — incorrect: Cases and controls are never comparable on all risk factors for the outcome. Matching of cases to controls on age was used to remove confounding by age, but there may be many more measured and unmeasured risk factors which were not matched that need to be controlled.
Answer (d) — correct: We can never be absolutely sure that our estimates are unbiased. Residual confounding is confounding that remains even after many confounding variables have been controlled. It can occur if there is systematic error in the measurement of the confounders, if there are unmeasured confounders that have not been controlled, or if confounders were classified into categories that are too broad (Aschengrau & Seage, pp. 300-301).

7. What if, during data analysis, investigators found that use of vitamin supplementation was associated with pesticide use and was an independent risk factor for breast cancer? Should they attempt to control for this potential confounder?

  1. Yes, investigators should control for vitamin supplementation as they did for other potential confounders and add this variable to the list of hypothesized confounders in the Methods section of the paper.
  2. Yes, investigators should control for vitamin supplementation and describe the process of confounder selection in their Results section.
  3. No, it is inappropriate to control for variables if they were not hypothesized as confounders a priori.
Answer (a) — incorrect: Many variables may act as confounders in one study. While it is important to hypothesize which factors may confound an association, it is also important to evaluate other potential confounders during the analyses as well. In doing so, you must report the process of how you selected potential confounders (i.e., a priori confounders in the Methods section and a posteriori confounders in the Results section), and discuss your findings in the Discussion section.
Answer (b) — correct: It is important to report the selection process of confounding variables in your work. A priori confounders should be reported in the Methods section, a posteriori confounders in the Results section.
Answer (c) — incorrect: It is not always possible to know all potential confounders at the beginning of a study. This may happen when investigating an exposure-disease association which has not been studied well, or if cost and feasibility make it impossible to address all potential confounders at the design phase of a study. Therefore, it is necessary to consider confounding at the analysis phase of a study as well.

8. Suppose investigators wanted to control for education as a potential confounder in the design stage of the analysis. Which of the following would be appropriate to control for education as a potential confounder at the design stage?

  1. Create 2x2 tables of pesticide use and breast cancer separately for those with low education, and then for those with high education.
  2. Only enroll people in the study who have less than a high school education.
  3. Match cases to controls on high vs. low education.
  4. Answer choices A and B
  5. Answer choices B and C.
Answer (a) — incorrect: Stratified analysis is used in the analysis stage of the research process. Stratification means the effect of an exposure is evaluated within strata (levels) of a confounder (e.g., looking at the exposure-disease association among those with low education only and then among those with higher education only). Once you calculate the OR's for each stratum (and if they are similar to one another), you then compare them with the crude OR. If there is a large difference (a commonly used rule of thumb is >10%) between the stratified and crude OR's, you can conclude that the variable may be confounding the exposure-disease association.
Answer (b) — incorrect: This method of confounding control is called restriction. While it is a method to control confounding at the design stage, there is another answer choice that is also a method to control for confounding at the design stage.
Answer (c) — incorrect: This method of confounding control is called matching. While it is a method to control confounding at the design stage, there is another answer choice that is also a method to control for confounding at the design stage.
Answer (d) — incorrect: Stratified analysis is not a method to control confounding at the design stage.
Answer (e) — correct: Restriction and matching are two methods to control for confounding at the design stage. With restriction, entrance into the study is determined by whether the subject falls into a pre-determined category of the potential confounder. With matching, study subjects are selected so that the potential confounder is distributed identically across the comparison groups (Aschengrau & Seage, pp. 294-297).

9. Teitelbaum et al. (2007) matched cases to controls on age. A more recent study, Itoh et al., (2008), also examined the association between pesticides and breast cancer and matched cases to controls on both age and geographic location. Which study was better at controlling for confounding in the design phase of the study?

  1. Teitelbaum et al.'s 2007 study was better because it is best to match cases to controls on as few factors as possible.
  2. Itoh et al.'s study was better because it is best to match cases to controls on as many factors as possible.
  3. It is not possible to determine whether one study is better than the other at controlling confounding by the number of factors matched.
Answer (a) — incorrect: Although it is not good to match controls to cases on too many factors, matching on fewer factors in itself does not guarantee that confounding is accurately controlled. It is best to match on the minimally necessary number of factors. However, matching on as few factors as possible may miss important sources of confounding that could be controlled in the design stage.
Answer (b) — incorrect: Matching on as many factors as possible is not a good strategy. It may be difficult and expensive to find controls for each case. It is not possible to determine which study was better at controlling for confounding by looking at the number of matched factors.
Answer (c) — correct: What should be of foremost importance when controlling confounding is that the variables are (1) confounders based on your theory of the exposure-disease relation, (2) meet the necessary conditions to be a confounder, (3) were measured properly and, (4) their effects were removed at the analysis stage. The reason Itoh et al. (2008) matched cases to controls on geographic location is because their source population for this study was very large and there were reasons to believe that background rates of breast cancer differed by geographical area. Teitelbaum et al.'s geographical area was limited to Nassau and Suffolk counties only.


Discussion Questions

Carefully consider the following questions. Write down your answers (1 - 2 paragraphs) for question # 1 within a word document and submit your answers to your seminar leader. Be prepared to discuss all questions during the seminar section.

  1. Which study design offers the best opportunity to control for confounding -- randomized clinical trial, cohort study, or case-control study? Explain your reasoning and make examples to prove your point.
  2. Suppose that in the study of pesticide use and breast cancer you wanted to evaluate the hypothesis that pesticide use varies by geographic area. Would you match on geographic area? Please, explain your answer.
  3. A principle of case-control studies is that controls should be selected independently of exposure status. Under what circumstances would matching violate this principle? What can be done about this type of violation?

Questions for the Intellectually Curious:

Why is it important to distinguish between "confounding" and "confounders"? What does a 95% confidence interval assume about the presence of bias and confounding?