We've been rearranging.
The problem we're solving: sometimes pollsters ask the same question twice in the same interview. They prompt with different responses (some with "Johnson", and some without), and they get different results.
Here are the take-aways:
Our regular CSV got a "Question Iteration" column, and it's important. Filter by "Question Iteration == 1" to avoid including the same poll twice. Alternatively, you can filter by other criteria -- for instance, whether there's a "Johnson" value. If you're going to filter this way, be sure you know what choices Pollster editors have made.
We created another CSV format that returns one row per poll, and it's easier to use. The new
api/charts/:slug.csv endpoint (see? "api/"?) is documented, too. It includes exactly the rows Pollster displays on its charts by default. It has fewer columns, its static column names are more programming-language friendly; its dynamic column names show up first, so it's easier to parse them out; it adds the "poll_id" column to help us with tech support.
Which CSV should you use? Here's a handy guide:
- If you want data that has no duplicates and that is exactly what Pollster shows in its default charts, use api/charts/:slug.csv.
- If the "api/" CSV is missing rows or columns that are important for you, use the old :slug.csv endpoint.
We at Pollster operate with these same two sets of data: the API CSV is what we show on charts by default; the old CSV is what we let people show when they create custom charts.
api/polls.json and api/polls.xml may now include the same question twice in the same poll -- whenever the pollster asks the same question twice in the same poll.
All our 2016-general-election-trump-vs-clinton charts are about to gain a "Johnson" column. It may happen as soon as tomorrow.
In the old-format CSVs for all these general-election charts, if the same question appears twice on the same poll, one row will have a value set for "Johnson" and the other row will not.
Enjoy life,
Adam