Using Secondary Data
As mentioned above, there are advantages and disadvantages to using secondary data. There are things you should consider when selecting a secondary data set and then once it is selected, there are steps to take before you start using the data.
How was the data collected? In person, group, online? Where there mixed data collection methods? Is that coded in the dataset?
- What was the population studied? How was it selected? Is the demographic data you need coded in the database (i.e. for your project is age important and is that included in the database)?
- Is all the data collected included?
- How was missing data dealt with?
- What was the response rate? Is the data representative of the population it is intended to represent?
- Was the data reviewed by an IRB? What the secondary data requirements for your IRB (some IRBs won’t allow use of secondary data if it didn’t originally go through an IRB process)?
Steps to Take:
Make sure you have the most up-to-date dataset
- Transfer the data into the dataset you will use and check to ensure data transferred correctly
- Address missing data issues
- Review the codebook and check the data to make sure it “makes sense” and the codes are correct.
- Determine response categories for each question asked—were any responses weighted?
- Do scales differ based on different demographic characteristics (i.e. scales for elementary vs. high school students).
A researcher wants to determine if there has been a decrease in pesticide use for a specific crop, after five years of increased IPM training for growers of that crop. The researcher could use California Pesticide Use Report Database to determine any changes in pesticide use.
Models used by IPM programs to predict arrival and developments of pest populations rely on secondary weather data (e.g. uspests.org).