Election simulation program

Nevin Brackett-Rozinsky

unread,

Apr 10, 2015, 6:12:30 PM4/10/15

to electio...@googlegroups.com

Hello, I’m new to this group, having just been referred here by Clay and Aaron.

I’m a mathematician and programmer, with a longstanding interest in voting systems. Over the last few months I’ve been writing a program to simulate elections under a variety of voting methods. There are still some more features I plan to add, but the basic version is up and running properly now. Here’s a brief overview:

First a population of voters and candidates is created, with each person having a position on several issues. Then voters determine how “good” each candidate is for them, based on their differences in position as well as how important each issue is to that voter.

Those utility values, along with the voter’s subjective opinion of how likely each candidate is to win, are used to generate a ranked ballot and a rated ballot for each voter, in accordance with that voter’s strategic behavior. Election systems go through all voters and look at the relevant type of ballot, tallying the results.

The winner of each election is checked against the aggregate utility of each candidates, and results are accumulated over many elections to measure statistics such as the average utility of winners, the standard deviation thereof, and the probability of electing the “best” candidate, for each voting method.

Voters determine their subjective odds of how likely candidate are to win by looking at the results of pre-election polls, which are really just miniature elections using a small number of voters chosen at random, and then applying their individual opinion about how much weight to ascribe to poll results. The initial odds for the first poll can be set to either be random, or all the same.

I currently have half a dozen voting methods implemented, plus two reference methods to find the honest utility and honest Condorcet winners.

I understand this discussion group is full of voting system experts and other intelligent people, so I’d appreciate any feedback or ideas. I’d be happy to provide further details on my implementation if there is interest.

Best,

Nevin

William Waugh

unread,

Apr 11, 2015, 12:16:07 AM4/11/15

to electio...@googlegroups.com

I'm not one of the experts, but your approach sounds really good to me.

Andy Jennings

unread,

Apr 11, 2015, 1:14:39 AM4/11/15

to electionscience

Nevin,

We're definitely interested in more detail about your methodology, your results, and your conclusions. It sounds like you've done a thorough job, so it should be interesting.

This kind of thing has been done before, but there are enough parameters and assumptions in any semi-realistic model that what you've done is almost certainly unique and very likely to be valuable to election science.

~ Andy Jennings

--
You received this message because you are subscribed to the Google Groups "The Center for Election Science" group.
To unsubscribe from this group and stop receiving emails from it, send an email to electionscien...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Warren D Smith

unread,

Apr 11, 2015, 10:12:44 AM4/11/15

to electio...@googlegroups.com

I did something similar in 1999-2000.
http://rangevoting.org/WarrenSmithPages/homepage/works.html
paper #56.
If you want to do something new & important, do multiwinner voting systems.
Framework for that:
http://www.rangevoting.org/BRmulti.html

--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)

Nevin Brackett-Rozinsky

unread,

Apr 11, 2015, 3:52:24 PM4/11/15

to electio...@googlegroups.com

Thanks everybody, I appreciate the welcome and words of support. Here’s a bit more info about my model:

There are n issues, v voters, c candidates, and p polls of pv poll voters, with each value specified by the user. There are also settings to pick what fraction of voters use each strategy. I currently have 2 available voter strategies, which I call “Standard” and “Minmax”. Both can be adjusted with a “level” that indicates how strongly a voter believes the poll results. I’d be happy to describe how the strategies work if there’s interest.

Each election system runs its own polls and keeps track of its own ballots. In particular, the very first poll will very likely have different outcomes under different voting systems, which means for the second poll, the voters have different perceived probabilities for each candidate to win. So the ballot a voter casts in the second poll may very be different in one election method than another. And so on through the rest of the polls and the election itself.

Casting a ballot requires both a voter’s perceived odds for each candidate to win, which may change from polls, and a voter’s actual honest opinion of each candidate, which does not change. Here’s how the honest opinion, or utility, is determined:

Voters and candidates each have a position on every issue, and those positions are drawn from a standard normal distribution. Additionally, voters ascribe a level of importance to each issue, and those importances are drawn from a standard exponential distribution. Using these numbers, the voters calculate their honest opinions about each candidate. This is one area where I’m not sure what the best approach is, or even if an objectively “best” approach exists. Here’s what I have so far:

• A voter considers a candidate and looks at the differences d_i in position between them on each issue.

• The voter also considers the the importance or “weight” w_i of each issue in their own mind.

• Then the voter sums the squared weighted differences, s = ∑(w_i × d_i)², across all n issues.

• This yields s, a non-negative value representing the weighted distance squared between the voter and the candidate.

From there, one option would be to take utility as the negative, -s.

Or the reciprocal, 1/s.

What I have right now, is that the voter’s opinion is exp(-s/2).

Another possibility would be -ln(s).

Or any of the above with √s in place of s.

There are, of course, endless options, but these seem like the most plausible. Their ranges are different: [0, 1] for some, (-∞,0] for others, all of ℝ for yet more. It could also make sense to work with s/n in any of those formulas, where n is the number of issues. And it’s worth noting that in formulas which invert magnitude, such as 1/s and exp(-s/2), the “weight” w_i is actually more like “tolerance” for the issue.

Anyway, this is starting to run long, but if anyone has recommendations on how best to convert from squared-weighted-distance to honest utility, I’d like to hear the rationale.

On Saturday, April 11, 2015 at 10:12:44 AM UTC-4, Warren D. Smith (CRV cofounder, http://RangeVoting.org) wrote:

I did something similar in 1999-2000.
http://rangevoting.org/WarrenSmithPages/homepage/works.html
paper #56.
If you want to do something new & important, do multiwinner voting systems.
Framework for that:
http://www.rangevoting.org/BRmulti.html

Thanks for the suggestion, I’ll peruse your work when I have a chance. Since you’ve gone down this road before, I’d be grateful for any ideas you learned through experience.

Multi-winner / proportional representation is definitely something I’ll consider adding later, but for now I want to get the single-winner part polished up.

Best,

Nevin

Nevin Brackett-Rozinsky

unread,

Apr 12, 2015, 11:47:14 PM4/12/15

to electio...@googlegroups.com

Having thought about it more, the formulas that diverge at 0 (eg. 1/s and ln(s)) would preclude a candidate from ever doing anything but bullet-vote for themself. I’m trying to avoid infinities in my code anyhow, so those options are out. The RMS distance between a voter and a candidate (or rather, its negative) is probably the best measure of utility, though I’m going to add the option to use exp(-s/2n) if desired.

So that settles that. I also just added two more voting systems, bringing the total up to 8, and I’m in the process of making it possible to specify min and max values for number of candidates and number of issues. The program will loop through each combination of values in the provided ranges and simulate all the corresponding elections. For example, it will enable things like, “Simulate the chosen voting systems first with 3 candidates, then 4, then 5, then 6, all with 1 issue that voters care about. Then do the whole thing over again with 2 issues. Then 3 issues.”

My intention is to take the CSV output and use it to create graphs of how each voting system performs as the parameters change. The performance metric will go on the y-axis. My question for everyone here is, which way would you rather see the graphs:

The number of candidates on the x-axis, and a separate graph for each different number of issues;

or the number of issues on the x-axis, and a separate graph for each number of candidates?

In other words, are you more interested in a graph that says, “With 3 issues, here’s how the voting systems perform as you increase the number of candidates,” or a graph that says, “With 3 candidates, here’s how the voting systems perform as you increase the number of issues”?

Best,

Nevin

Warren D Smith

unread,

Apr 13, 2015, 10:28:15 AM4/13/15

to electio...@googlegroups.com

Also, you might want to check out my voting sim program IEVS, source code here:

http://rangevoting.org/IEVS/IEVS.c

Dick Burkhart

unread,

Apr 17, 2015, 10:39:52 AM4/17/15

to electio...@googlegroups.com

I would use a weighted correlation function instead of your weighted sum of squared distances for ‘s’ below. That is s = ∑w_i × v_i × c_i summed over the ‘n’ issues, with v_i and c_i representing a voter’s and a candidate’s position (rating or ranking) on issue i and w_i being the voter’s weight for the issue. Then the voter should vote for the candidate who maximizes this correlation.

I’ve tested both measures of distance and correlation in my clustering algorithm for proportional representation and found that correlation works better.

Dick Burkhart

4802 S Othello St, Seattle, WA 98118

206-721-5672 (home) 206-851-0027 (cell)

dick...@gmail.com

--

Nevin Brackett-Rozinsky

unread,

Apr 17, 2015, 5:30:20 PM4/17/15

to electio...@googlegroups.com

I don’t quite follow. Suppose a voter happens to fall at exactly the center of the political spectrum for every issue. In other words, let v_i = 0 for every issue i. Now the sum you describe would equal zero for every candidate, regardless of how close or far their positions are to the voter’s. Or am I missing something?

Nevin

Dick Burkhart

unread,

Apr 18, 2015, 2:03:19 AM4/18/15

to electio...@googlegroups.com

That’s how correlation works. If you’re in the middle, it wouldn’t make sense to say that you were correlated with any higher rating, or with any lower rating. And in my clustering application I’m only looking for voters with at least a somewhat positive correlation with the central vector of the cluster, even for only partial membership in the cluster. That is for me, these voting blocks are “fuzzy sets”. Also a certain fraction of voters will end up being classified as “independent” voters, since they don’t identify well with any cluster. Your dead-center voter would be one of these.

Dick Burkhart

4802 S Othello St, Seattle, WA 98118

206-721-5672 (home) 206-851-0027 (cell)

dick...@gmail.com

From: electio...@googlegroups.com [mailto:electio...@googlegroups.com] On Behalf Of Nevin Brackett-Rozinsky

Sent: April 17, 2015 2:30 PM
To: electio...@googlegroups.com

--

Nevin Brackett-Rozinsky

unread,

Apr 18, 2015, 3:46:19 AM4/18/15

to electio...@googlegroups.com

It sounds like we may be modeling things quite differently. I am not creating any blocs or clusters—instead, each of my voters acts as an individual, and determines the utility of each candidate to her or him according to how closely their views match on each issue.

In particular, a voter with position 0 on a given issue does not ignore that issue, but rather it means she or he wants to vote for candidates who also have positions near 0, not those with extreme positions in either the positive or negative direction. So it really is the distance between the voter and the candidate that matters in my model.

Nevin

Dick Burkhart

unread,

Apr 19, 2015, 1:04:25 AM4/19/15

to electio...@googlegroups.com

You might want to rethink that interpretation. In my experience a middle, or 0 rating, usually means that the person either doesn’t know much about the issue or doesn’t care much about it. So your distance metric may be misleading.

Dick Burkhart

4802 S Othello St, Seattle, WA 98118

206-721-5672 (home) 206-851-0027 (cell)

dick...@gmail.com

From: electio...@googlegroups.com [mailto:electio...@googlegroups.com] On Behalf Of Nevin Brackett-Rozinsky

Sent: April 18, 2015 12:46 AM
To: electio...@googlegroups.com

--

Nevin Brackett-Rozinsky

unread,

Apr 19, 2015, 4:45:10 AM4/19/15

to electio...@googlegroups.com

Perhaps I should explain what I mean when I say “issue”. I am talking about a continuous political axis along which a person may hold any position in the spectrum. For example, there is the traditional “left-right” axis, where some people are very far to one side or the other, some people are less far, and yet more people are near the center.

We may also posit issues such as foreign policy on the isolation/intervention axis, economic policy on the regulation/laissez-faire axis, and social policy on the individual-freedom/legislated-morality axis. On each of these, there are plenty of people with strongly-held views across the entire spectrum, including near the center. In fact, because I am modeling voter position with a normal distribution (ie. Gaussian bell curve), there end up being more people whose views are near the center than toward the extremes.

Of course we could go to a much finer granularity and say there are dozens of issues like environmental policy, drug policy, healthcare, foreign aid, military spending, scientific investment, and so forth. That is why “number of issues” is a user-adjustable parameter of my model. The important point is that each issue is continuous and two-sided, and people’s positions can fall anywhere along the line. The strength of a position has to do with how important the issue is to the voter, or how much they care about it, which is independent of what their position happens to be.

Does this help clear things up?

Best,

Nevin

Nevin Brackett-Rozinsky

unread,

Apr 21, 2015, 2:02:48 AM4/21/15

to electio...@googlegroups.com

I’m open to being convinced about the relative merits of correlation versus weighted distance, but it’ll require a strong logical argument.

Regardless, I have added a number of additional features to my simulator, as well as a few more voting systems. The list now runs alphabetically:

Approval, Borda, Condorcet (honest), Instant-runoff, Majority judgment, Plurality, Random ballot, Random winner, Schulze, and Score.

With each of them, the user can select how many options are available to the voter. For instance, “Top-3 Instant-runoff” or “10-point Score voting”. Each voting system can also be configured individually to have voters behave strategically (according to the overall mix of strategies specified by the user) or to have all voters be completely honest instead.

I’ve also started analyzing the results, and I’m already seeing some rather intriguing outcomes. Notably, some systems actually perform *better* when voters are strategic than when they are honest. (Plurality, it should be mentioned, is emphatically not one of these.) Other systems do tremendously worse with strategic voters. (Plurality is also not one of these.)

Another thing I’ve noticed is that some voting systems start out doing okay up to about 5 candidates, then begin a rapid decline that becomes substantial with 8 or more candidates. This is most common among “Top-3” variants and the like.

There seems to be a major qualitative difference between 1-issue and more-than-1-issue for most voting systems, and the relative ordering even changes a bit for small numbers of candidates. But with 2+ issues, even though there are quantitative changes to the results, the exact number of issues doesn’t appear to play a major role in the relative performance of different voting systems. (Well, precisely 2 issues is something of a transition zone, but beyond that little changes.)

Brian Goldman

unread,

Apr 21, 2015, 8:01:26 AM4/21/15

to electio...@googlegroups.com

Voters and candidates each have a position on every issue, and those positions are drawn from a standard normal distribution.

I would like to request an optional variant in which you draw from two normal distributions. That way you can see how the system reacts under polarization, such as the bimodal distribution of modern US politics.

Jameson Quinn

unread,

Apr 22, 2015, 8:10:43 AM4/22/15

to electionsciencefoundation

This is definitely interesting work. I've done something similar last year; see https://github.com/The-Center-for-Election-Science/vse-sim . Nevin: would you be interested in a real-time chat some time (skype or hangout or similar) to discuss the common issues? I think it is worthwhile to have different people re-implementing this independently, because there are plenty of judgment calls involved, and it's good to see if results are robust to how those calls are made. But "independently" doesn't mean we shouldn't talk about how we've done it, so as to better understand where and why our results are similar or different.

2015-04-21 8:01 GMT-04:00 Brian Goldman <gold...@msu.edu>:

Voters and candidates each have a position on every issue, and those positions are drawn from a standard normal distribution.

I would like to request an optional variant in which you draw from two normal distributions. That way you can see how the system reacts under polarization, such as the bimodal distribution of modern US politics.

--

Warren D Smith

unread,

Apr 22, 2015, 9:50:33 AM4/22/15

to electio...@googlegroups.com

On 4/21/15, Nevin Brackett-Rozinsky <nevin.brack...@gmail.com> wrote:
> I'm open to being convinced about the relative merits of correlation versus
>
> weighted distance, but it'll require a strong logical argument.

--IEVS I think has both kinds of ideas, user can select.

Another thing you need to watch out for, is tiebreaking.
It is very important to break ties RANDOMLY, and
the way IEVS does this, is by pre-ordering the candidates by a random
permutation, then picking the first of the tiers. There are a lot of
ways to be confused and bias your randomness. That will lead to large
statistical effects which will be totally misleading.

Nevin Brackett-Rozinsky

unread,

Apr 22, 2015, 11:22:29 AM4/22/15

to electio...@googlegroups.com

Say, does anyone know if there’s data available regarding “how strategic” voters actually are? One of the central parts of my program is the ability for the user to choose the proportions of voters who use each strategy at each level, so it would be nice to have concrete information on what’s realistic.

On Tuesday, April 21, 2015 at 8:01:26 AM UTC-4, Brian Goldman wrote:

I would like to request an optional variant in which you draw from two normal distributions. That way you can see how the system reacts under polarization, such as the bimodal distribution of modern US politics.

That’s an interesting idea, I’ll have to think about it. When the difference in means between the two Gaussians is around 2 standard deviations or less (eg., μ = +1 and μ = -1), the combined distribution still “looks like” pretty much another Gaussian, slightly flattened but basically similar to a normal distribution with larger standard deviation. I mention this because, in the link you gave, the graphs just show the two parties and it’s quite possible to imagine that there are plenty of independent voters who would “fill in” the center to make the actual population distribution still essentially normal. And if that’s the case, then it’s probably not worth making the model more complex, when the observed polarization of Congress may be due in large part to the use of plurality voting, and/or the relative importance of “rallying the base” (ie. voter turnout in the tails of the distribution) versus competing for centrists (who may very well actually have already made up their minds.)

On Wednesday, April 22, 2015 at 8:10:43 AM UTC-4, Jameson Quinn wrote:

This is definitely interesting work. I've done something similar last year; see https://github.com/The-Center-for-Election-Science/vse-sim . Nevin: would you be interested in a real-time chat some time (skype or hangout or similar) to discuss the common issues? I think it is worthwhile to have different people re-implementing this independently, because there are plenty of judgment calls involved, and it's good to see if results are robust to how those calls are made. But "independently" doesn't mean we shouldn't talk about how we've done it, so as to better understand where and why our results are similar or different.

Glad to hear about other people doing similar things! And yes, I’d be happy to chat on a Google hangout.

On Wednesday, April 22, 2015 at 9:50:33 AM UTC-4, Warren D. Smith (CRV cofounder, http://RangeVoting.org) wrote:

Another thing you need to watch out for, is tiebreaking.
It is very important to break ties RANDOMLY, and
the way IEVS does this, is by pre-ordering the candidates by a random
permutation, then picking the first of the tiers. There are a lot of
ways to be confused and bias your randomness. That will lead to large
statistical effects which will be totally misleading.

Warren D. Smith

By “tiebreaking”, do you mean when an individual voter has to decide how to treat two candidate whom she or he views as being precisely equal, or when a voting system yields an exact tie in the tallies?

If the former, that essentially never happens because I am doing all my calculations in double-precision floating point. So the odds of two independently-calculated values coming out the same are negligible.

If the latter, I am not using any tiebreak at all: when a voting system says the election is a tie, then it gets recorded as a tie. In other words, that voting system does not choose a winner for that election at all. Thus, I am also keeping track of “How many elections had a winner under each voting method” as one of my statistics.

In any event, I am using arc4random and arc4random_uniform to generate all of my random values. I’ll probably switch to arc4random_buf if I ever refactor the code to use vector operations on contiguous blocks of memory.

Jameson Quinn

unread,

Apr 22, 2015, 11:29:12 AM4/22/15

to electionsciencefoundation

I have done an experiment on Mechanical Turk addressing just this question. I could let you have a look at the raw data, if you're interested. Again, when do you want to talk?

--

Nevin Brackett-Rozinsky

unread,

Apr 22, 2015, 12:32:28 PM4/22/15

to electio...@googlegroups.com

That would be great, how about this Sunday?

Dick Burkhart

unread,

Apr 22, 2015, 1:11:59 PM4/22/15

to electio...@googlegroups.com

There may be applications where distance makes more sense than correlation. But in that case instead of ratings like -3,-2,-1,0,1,2,3, they should be 0,1,2,3,4,5,6,7. It’s just that in my experience people more often use ratings like the former in their head, even if they see the latter on paper. That is, if they give a middle rating for an issue, it doesn’t mean that it’s really important to them that their preferred candidate also rates that issue in the middle. It’s much more likely that the voter simply doesn’t know much about the issue, or care much about it, or has ambiguous feelings about it, and is willing to let the candidate take a stronger position either for or against. Standard correlation (with 0 in the middle) handles this perfectly.

My biggest issue with distance is that I want to focus just on the issues most highly rated by a voter. Those are the ones that determine the voting blocks, or clusters, I am seeking. Distance weights similarity over all issues (unless the voter can artificially restrict the issues to his or her mostly highly rated ones). With correlation I can easily zero out all negative ratings by a voter, to eliminate their effect. This also eliminates strategic voting against partisan opponents, decreasing mudslinging.

In addition note that the rating difference between 1 and 2 is the same as between 2 and 3, whereas in correlation 2 * 3 = 6 carries 3 times the weight of 1 * 2 = 2, putting a much greater emphasis on agreement for the mostly highly rated issues, as most voters would want.

You could easily test both distance and correlation to see how they work in different situations.

Dick Burkhart

4802 S Othello St, Seattle, WA 98118

206-721-5672 (home) 206-851-0027 (cell)

dick...@gmail.com

--

Nevin Brackett-Rozinsky

unread,

May 3, 2015, 11:15:35 AM5/3/15

to electio...@googlegroups.com

Okay, I’ve added a bit more to the model. It is now possible to specify the standard deviation of the population’s opinions. I have not added “multi-peaked” distributions, but the spread can be as wide as you like.

Additionally, it is also now possible to specify “uncertainty” regarding candidate positions. When this is 0 it works as before, but when it is positive the voters do not know exactly what each candidate’s positions are. Instead, there is a gaussian centered at a candidate’s true position, representing how likely it appears before the election that the candidate might hold each possible position.

The correlation—meaning the product integrated over the whole issue space—of that nebulous-position function for a candidate, with the utility function of each voter (which is another gaussian centered at the voter’s position), gives the voter’s expected utility for the candidate based on what the voter knows before the election. The voter uses that value as their honest opinion when forming their ballot.

However, the true utility is simply the voter’s utility function evaluated at the candidate’s true position. That is how good the candidate would actually be for the voter if elected. The voter does not know this before the election though, which simulates imperfect information.

I am not certain if this actually improves the model, or if it just impairs the ability to compare voting by adding extraneous noise. So the option of setting uncertainty to 0 remains, in which case voters use their true utility as their honest opinion.

The strategic portion of the simulation remains the same as before.

Reply all

Reply to author

Forward