Extracting Signal from Noise in Survey Data

-- Filtering and Parameter Estimation --

Using

Samplemiser 4.0

Donald Green, Department of Political Science, Columbia University
& Alan Gerber, Department of Political Science, Yale University

Department of Political Science, Yale University

ALL RIGHTS RESERVED

Copyright 1998-2014

Why Filter Survey Data?

Pollsters and survey analysts seek to track public opinion and other sorts of social phenomena over time. The difficulty, however, is that survey results fluctuate due to an unknown combination of real opinion change and random sampling variability.

Linear filtering enables survey analysts to improve the forecasting accuracy of tracking polls. Linear smoothing enables researchers to look back over a series of polls and reassess the true state of opinion at previous time points.

Why Filter Survey Data Here?

This website provides an easy-to-use interface with statistical routines that perform filtering and smoothing of survey percentages. Not many statistical packages contain filtering routines as sophisticated and useful as those presented here, and none to our knowledge readily handle complications posed by (1) unequal time intervals between polls and (2) sampling error that varies from one poll to the next. It helps to have some background in statistics, but anyone willing to fiddle around a bit can get the hang of what's going on.

Users enter pertinent information about their tracking polls (e.g., how they are spaced in time, sample sizes, and results). If users input several tracking polls, they may filter these polls while at the same time estimating the (1) variability in true opinion from one period to the next and (2) estimating the autoregressive parameter that links opinion from one point in time to the next. Users with fewer just a handful of polls are advised to stipulate one or both of these parameters.

Why is it called "Samplemiser"?

Current techniques for analyzing polls waste information. Either old polls are discarded or they are lumped together with current polls through averaging. Both approaches are wasteful and lead to needless inaccuracy. By correctly weighting current and past polling information, Samplemiser enables poll-readers to make more efficient use of tracking polls. This is where the "miser" part comes in: Pollsters could interview fewer respondents and have the same degree of accuracy if they used Samplemiser.

Click here to view an example dataset that may be cut-and-pasted: Gallup poll tracks Al Gore's vote percentage among likely voters.

Click here to view an example dataset that may be cut-and-pasted: CBS poll tracks Al Gore's vote percentage among likely voters.

Forget this nonsense and just show me a picture of the results.

Other example datasets: Click here to view another example dataset, 50 quarterly CBS/New York Times poll readings of party identification in California, 1981-1995. Entries are the proportion of all respondents who self-identify as a Republican. Since the number of Californians in national surveys is often small, filtering provides a useful means for charting partisan change, net of sampling fluctuations. Click here to view another example dataset, this one tracking Democratic partisanship among African-Americans from 1972-1996 (General Social Survey). Click here to view a dataset tracking Democratic partisanship among Southerners from 1952-1994 (American National Election Studies). Click here to view a dataset tracking Bill Clinton's 1993-1999 approval ratings, as charted by the Gallup Polls.


Input Survey Information

(1) Indicate the Most Suitable Unit of Time for Your Analysis:

(2) Please enter the survey information in the following field. Each line must contain three numbers describing a specific survey. The three columns must contain the following entries, and the columns must be separated only by spaces or tabs (no commas, semi-colons, etc...):

You may use "copy-and-paste" to enter the data in this window.

(3) Advanced Users. You may enter a prior percentage here; otherwise, leave the default value (50) unchanged in the window below. Do not include percentage signs. For example, if Candidate Smith's popularity is believed to stand at 55% in the period prior to the first poll, enter 55. If you're unsure what to enter, just leave it as is and keep the value in the next window large.

(4) Advanced Users. You may enter the mean squared error of your prior here; otherwise, leave the default value (1000) unchanged in the window below. For example, if Candidate Smith's popularity is believed to stand at 50% with a variance of 7%, enter 7. A very large value implies that one possesses no prior information whatsoever. If you're unsure, just leave the default as is.

(5) Advanced Users. You may either estimate or specify the variance of the disturbances (i.e., the perturbations that produce changes in the true level of public opinion over time). If you check the specify button, you may either use the default value (5) or enter another value in the window. For example, if the average squared change in true opinion from one time period to the next is 10%, enter 10. If you're not sure, it's probably best to let the program estimate this quantity.

Estimate variance of the disturbances.

Specify variance of the disturbances:

(6) Advanced Users. You may either estimate or specify the autoregressive coefficient (b) by which last period's opinion affects current opinion. A value of 1 (the default) implies that opinion follows a random walk -- it's as likely to go up as go down in any given period. Values less than one imply that opinion gravitates toward some underlying average level. If you're not sure, it's best to leave it at 1.

Estimate autoregressive coefficient.

Specify autoregressive coefficient:

(7) Advanced Users. You may specify the equilibrium mean here. Warning: this could produce explosive estimates if poorly specified. If this option is not selected, the sample mean is assumed to be the equilibrium (recommended).

Do not specify mean.

Specify the equilibrium mean:

(8) Advanced Users. You may choose to estimate the filtered and smoothed estimates using one or two passes through the data. If you select two passes, the first pass is used to generate more accurate estimates of the random sampling variability associated with each poll. One pass is faster.

Use One Pass Through the Data for Purposes of Estimation.

Use Two Passes Through the Data for Purposes of Estimation.



Large datasets may take a few minutes to analyze...

Troubleshooting

If your web browser returns an error complaining that the 'form contains no data,' look back over all of the input windows to ensure that you've entered your data correctly. Likely suspects: (1) you entered an inconsistent number of data points (e.g., 3 sets of poll results but only 2 sets of sample sizes), (2) you've entered some kind of non-numeric data, (3) you have inadvertently entered a carriage return in one of the fields; press the reset button, and reenter the data; (4) you are using an outdated web browser; (5) your percentage input is in decimal form; or (6) your percentage input is, due to small sample size, at the boundary of 0% or 100% (aggregate or discard these surveys to avoid this problem). If trouble persists, send me an email (see below) and explain the difficulty.

Note that, in contrast to previous versions of Samplemiser, the standard errors for the filtered and smoothed estimates in the current version of this program take into account the uncertainty in the autoregressive and error variance parameters. See James D. Hamilton (1986) "A Standard Error for the Estimated State Vector of a State-Space Model" Journal of Econometrics 33:398-97.

See also Green, Donald P., Alan S. Gerber, and Suzanna L. De Boef. 1999 "Tracking Opinion over Time: A Method for Reducing Sampling Error." Public Opinion Quarterly. 63:178-92.


[ Donald Green's Homepage ]

Many thanks to Jay Emerson for his programming assistance.


Run into problems? Need help interpreting the output? Want to learn how Samplemiser can be used to help design more efficient surveys? Send me an email... Don Green
Lightly revised: October 2014
Copyright © Yale University, 1998-2014