Ecological Study: Print Module


Introduction

While many consider the ecological study design methodologically inferior to the cohort and case control designs, this exercise will highlight some of its unique strengths (in addition to key limitations). Please see Aschengrau & Seage, pp. 160-164 for more information.

Good luck and have fun!

Faculty Highlight: Pam R. Factor-Litvak, PhD

Dr. Pam Factor-Litvak's current research interests concern the biological relationships between environmental exposures and development. She has recently completed two studies initiated due to public concerns: the first is a study of the possible associations between mercury derived from dental restorations (silver-mercury fillings) and neuropsychological, neurological and psychiatric symptoms in adults; and the second is an ecological study of the relationship between residential proximity to hazardous waste sites and autism identified from the rosters of special education programs of school districts.

Read more about Dr. Factor-Litvak's work in the following article:

Graziano J, Slavkovich V, Liu X, Factor-Litvak P, Todd A: A prospective study of prenatal and childhood lead exposure and erythropoietin production. Journal of Occupational and Environmental Medicine 46:924-929, 2004.

"There are times when ecological studies may be more appropriate than other designs; for example, when studying the impact of an exposure (or a prevention program) on a community level. Those who only preach the ecological bias do not fully understand the public health usefulness of this design. There are biases, for sure, but no more or less than other designs. Just be sure to make your inferences to the unit of analysis."


Learning Objectives

A. Learn to apply the main features of ecological studies:

  1. Formulate a research hypothesis
  2. Explain the difference between individual and group-level characteristics
  3. Define the source population for the study
  4. Describe possible sources of information for ecological studies

B. Employ steps in data analysis of ecologic studies to analyze the data:

  1. Calculate incidence proportions based on simple counts
  2. Plot incidence against exposure and interpret your plots

C. Explain your findings and discuss potential problems in your data analysis:

  1. Define and explain what is meant by the ecologic fallacy
  2. Describe ecological studies in terms of validity, cost-effectiveness, time requirements and ability to estimate measures of effect
  3. Describe ecological studies in comparison with cohort and case-control studies.

Student Role

The evidence you collected while conducting the investigation of the two preceding outbreaks of Susser Syndrome in Epiville pointed in the direction of Glop Industries as the main culprit (See Cohort and Case-Control modules). Unfortunately, a new wave of cases has appeared in recent weeks. With potentially half of the city at risk, Glop Industries has made a public statement saying that the new outbreak is not connected with SUPERCLEAN production.

Dr. Zapp's office is busy trying to avert another disaster. As one of her key team members, she commissions you to design and conduct an ecological study. You return to your office and immediately turn into the WEPI1 website to see if there are any news reports about the new outbreak.

logo_wepi1_small.gif BREAKING NEWS

"Good morning, I am Lynn Simmons reporting from the Epiville General Hospital. The rare and debilitating disease doctors call Susser Syndrome appears to be striking Epiville residents again. We spoke with hospital officials today, and they report that the number of Susser Syndrome patients has been slowly rising over the past 2 years. Since March, however, doctors have reported an unprecedented increase. The cause of this increase is unknown, although medical experts believe it to be linked to an environmental exposure.

"Stay tuned as Channel 1 News continues to investigate this Susser Syndrome outbreak. Channel 1 has learned that residents in the Epiville areas serviced by the Rothman Reservoir have reported a soapy flavor to their water. Those who use the Greenland Reservoir water have made no complaints, although officials are concerned about what appears to be increased algae growth."

Based on the newscast, you decide to gather more information about both Glop Industries and the Epiville water reservoirs. The City of Epiville has a nicely designed website of the Chamber of Commerce which provides you with the links to the following information:

* Glop Industries which produces the suspected causal agent SUPERCLEAN
* Porks-A-Lot Pig Farm located close to one of the water reservoirs

Based on your research, you decide to check out the Porks-A-Lot Pig Farm. You enter the pig farm and are greeted by Herman Murtz, the facilities leader.

Interview - D. Herman Murtz

"You never do get used to that smell, do you? With just over 1,000 pigs, waste is always a problem. Back in the old days, we would just hose everything out and let it run off into the Sludge River. Just a few months ago we revamped the system. Now, we first give all the pens a hosing, run it through a filter, and collect the initial runoff in evaporating pools. After the first hosing, we spray every inch of the pens with SUPERCLEAN to sterilize and disinfect everything. SUPERCLEAN doesn't evaporate well so we drain it into the Sludge River where it gets diluted. To the best of my knowledge, this two-step process is pushing the envelope of pig farm sterilization techniques."


Study Design

The first step of your investigation is to generate a solid hypothesis. Once again, you look over the information that you have gathered regarding the Susser Syndrome cases and the water reservoirs in Epiville. Because you are designing an ecological study, you need data at the population level rather than the individual level.

1. Based on the facts presented, which of the following would be the most appropriate hypothesis to investigate using an ecological study?

  1. Residents of Epiville are at a higher risk of developing Susser Syndrome than residents of the neighboring towns.
  2. Employees of the Porks-A-Lot pig farm have a greater risk of developing Susser Syndrome than the general population of Epiville.
  3. The population served by the Rothman Reservoir has a higher incidence proportion of Susser Syndrome than does the population served by the Greenland Reservoir.
  4. Those diagnosed with Susser Syndrome will have greater odds of being a Porks-A-Lot employee than those without Susser Syndrome.
Answer (a) — incorrect: We do not have data for the neighboring towns to compare with that of Epiville. This hypothesis is too broad.
Answer (b) — incorrect: This hypothesis is more characteristic of a cohort study. We do not have the necessary data to test it
Answer (c) — correct: Ecological studies involve comparison and analysis at the population or group level. In our study, we have chosen to hypothesize that the Rothman Reservoir population will have a higher incidence proportion of Susser Syndrome than the Greenland Reservoir. (Note: The alternative would also be an appropriate hypothesis to test).
Answer (d) — incorrect: This hypothesis is more appropriate for a case-control study

2. Given that you are conducting an ecological study, define the source population from which you intend to draw the two comparison groups.

  1. Individual residents of Epiville who were recently diagnosed with Susser Syndrome
  2. All Epiville residents
  3. One study population should be all Susser Syndrome cases residing in Epiville; the comparison population should be healthy Epiville residents.
  4. The study population should be comprised of Epiville residents who are serviced by either the Rothman or Greenland Reservoirs.
Answer (a) — incorrect: Recall that we are undertaking an ecological study and are thus concerned with populations and not individuals.
Answer (b) — incorrect: This is not specific enough and only defines one study population. We are looking for two populations that we could compare.
Answer (c) — incorrect: These definitions are too specific and more appropriate for a case-control design.
Answer (d) — correct: This definition provides us with two study groups coming from the same source population. This will allow us to compare the incidence proportion of Susser Syndrome in the two areas using the two different sources of reservoir water.

With your hypothesis and source population defined, you now need to determine where you will get the necessary data. You don't have the time or resources to collect your own data, so you have to resort to using the pre-existing sources of information. Luckily, Epiville conducts routine data collections on almost everything--from the number of packs of cigarettes sold per capita to water and electricity consumption.


Data Collection

Ecological studies involve comparison and analysis of variables at the population level. They may involve direct observations of individuals which are then aggregated or summarized (to give means or proportions) or they may rely on global population measures, such as population density. Oftentimes, ecological studies rely on data previously collected for other purposes (e.g., population censuses and disease registries).

3. Define the minimal information you need to collect in order to test your ecological study hypothesis.

  1. I will need to know the size of the population of Epiville and the number of new cases of Susser Syndrome
  2. I will need to know the total number of new cases of Susser Syndrome and the size of the populations serviced by the Rothman and Greenland Reservoirs.
  3. I will need to know the geographic areas serviced by the Rothman and Greenland Reservoirs and the population size of those areas during a given time period. I will also need to know the number of new cases of Susser Syndrome as well as their place of residence.
Answer (a) — incorrect: While this information is important, more data is needed to test your specific hypothesis.
Answer (b) — incorrect: This information is necessary, but not sufficient to test your hypothesis
Answer (c) — correct: This information will allow you to cross-reference water use (via specific reservoir) and the Susser Syndrome cases

In order to get the information on the reservoirs, you contact the Department of Water Works. They report that all of Epiville is serviced by either the Rothman or Greenland Reservoir. The geographic areas served by each reservoir are subdivided into 5 sectors consistent with the Epiville residential subdivisions (Epiville Town Map). The Water Works Department has a database recording the average daily water use for each sector (but not for each individual living in the sector) for each of the previous 5 years. You decide that the information regarding the most recent year is adequate.

You also have access to the most recent Epiville population census and, because Susser Syndrome is a reportable disease, the Epiville Department of Health has reports of each diagnosed case along with the requisite demographic and residential information.

4. How can all of this information be used to test your study hypothesis?

  1. I can cross-reference the three databases in such a way that I will know from which reservoir sector the Susser syndrome cases originated, the population size of that sector, and the average daily water use of that population.
  2. I can cross-reference the three databases in such a way that I will know the population size of each sector as well as the average daily water use.
  3. I can use the Water Works Department information to extrapolate the amount of water each individual in Epiville consumed in the recent past and then cross-reference these individuals with the Susser Syndrome registry.
Answer (a) — correct: By cross-referencing the databases with address/geographic location, we can compare the differences in Susser Syndrome incidence by reservoir use
Answer (b) — incorrect: This information does not provide information on the actual incidence of Susser Syndrome per sector. Without these data, you cannot test your hypothesis.
Answer (c) — incorrect: In an ecological study we are interested in population-based statistics and not individual-based statistics. Extrapolating population-level information to the individual-level is known as ecological fallacy.

5. Which of the following is the primary shortcoming when using a reportable disease registry to collect Susser Syndrome cases in our study?

  1. We may not capture all of the Susser Syndrome cases occurring in Epiville.
  2. We would only capture prevalence data, not incidence data
  3. We are relying on the diagnostic capabilities of the doctors and hospitals and are assuming that all reported cases were valid.
Answer (a) — incorrect: Although a valid shortcoming in most studies, as we have defined it, all reportable diseases (including Susser Syndrome) must be reported to the Department of Health. Therefore, in a perfect world, no cases would go unreported (In fact, not all SARS cases were reported to the Department of Health through passive surveillance during the most recent outbreak).
Answer (b) — incorrect: The registry will have a date of initial diagnosis. As long as we have a clearly defined time component to our study, we will be able to capture incident as well as prevalent cases.
Answer (c) — correct: We do not have the time nor the resources to independently verify that all reported cases have been correctly diagnosed.

Dr. Zapp commends you on a job well done and instructs you on all of the administrative work that must be completed before you begin data analysis. Before you can begin you need to:

  1. Get approvals of the proposed study from database owners--this will ensure that the study adheres to the ethical principles of conducting public health research.
  2. Design a data management plan.
  3. Design a data analysis plan and propose how you will publicize your findings.

Data Analysis

After running your database queries, you received the following printout detailing the reservoirs:
[Scatterplot 1] [Scatterplot 2] [Scatterplot 3]

6. What conclusions can you draw from the above scatterplots?

  1. The incidence proportion of Susser Syndrome in areas served by the Rothman and Greenland reservoirs appears to be identical
  2. Specifically, there is a negative correlation between distance from the Porks-A-Lot Farm and Susser Syndrome (as distance increases, incidence decreases)
  3. Individuals drinking Rothman Reservoir water have approximately twice the risk of developing Susser Syndrome than individuals drinking Greenland Reservoir water.
Answer (a) — incorrect: There is a clear difference in incidence proportions between the two populations
Answer (b) — correct: The line fitted to the incidence proportions of Susser Syndrome in areas served by the Rothman Reservoir suggests a strong correlation. Specifically, there is a negative correlation between increased distance from the Porks-A-Lot Farm and Susser Syndrome (as distance increases, incidence decreases).
Answer (c) — incorrect: This is an example of the 'ecological fallacy' where we are applying group-level characteristics to individuals within that group. Using the ecological study design, we can only draw conclusions concerning the groups or populations under analysis. We cannot draw conclusions about the individual members of the population because we do not have exposure and outcome data on each member.

Intellectually curious?

Analysis of the data presented in the tables accompanying scatterplots 1 and 2 shows that in both instances correlations between the distance from Porks-A-Lot Pig Farm and incidence of Susser Syndrome are very strong (-0.97 for Rothman Reservoir and -0.92 for Greenland reservoir.) This means that whenever distance increases, incidence always decreases. Since the relationships appear to be linear, we can fit a linear model using the least squares method. Regression coefficients which predict the average magnitude of the expected change in incidence given a change in the distance from the farm are quite different for the two reservoirs (-0.57 for Rothman and -0.09 for Greenland reservoir.)

Correlation coefficients measure the degree of linear dependence between two variables. That is, correlation coefficients use the standard deviation in two variables to determine the extent to which the standard deviations seem, on average, to vary linearly together. Correlation coefficients are useful for an initial characterization of the relationships among variables, but are sensitive to deviations from normality and extreme observations. Regression coefficients fit a line to the relationship between two variables that minimizes the distance between each observation and the prediction that the line would provide. Regression coefficients standardize the magnitude estimate of the relationship between two variables. The estimate that one gets from a linear regression represents the average change in the outcome variable given a one-unit change in the predictor variable. If the regression coefficient is equal to zero, it indicates that a one-unit change in the predictor does not provide any information about the outcome. While regression coefficients also assume a normally distributed outcome and are also affected by extreme observations, in general the regression format is more robust to deviations from assumptions as compared to correlation coefficients.


Discussion Questions

Carefully consider the following questions. Write down your answers (1 - 2 paragraphs) for question # 1 within a word document and submit your answers to your seminar leader. Be prepared to discuss all questions during the seminar section.

  1. What is the difference between individual- and population-based studies? How is this reflected in the ecological study hypotheses?
  2. Think of an exposure/disease relationship that you are interested in studying. Describe an individual-level study AND an ecological study that you could conduct to examine the associations between the exposure and the outcome. Specify hypotheses and design elements (individual and ecological).
  3. What is meant by ecological fallacy? Give two examples.