**Lies are more variable than truths.**

*signal*-- which has a value for every event. There might be a 51% group of disgruntled Romney supporters willing to lie about the election, but what if there were 19 other events they had to simultaneously report on? The chances are much lower that the members of this dishonest group are willing and able to perfectly coordinate their lies across so many different events!

*even more true*if we consider the users' reports across many events. Put another way, by increasing the number of times each voter's signal is sampled, we greatly increase the resolving power of the Schellingcoin idea. This idea is the foundation of Augur's event resolution mechanism (as well as that of Truthcoin, Augur's theoretical inspiration).

In Augur, *truth equals consensus*. As such, the Augur oracle is intended to be used for events which are easily and objectively determinable after the event has occurred. (For example, the winner of an election.) However, there are many cases where consensus might not reflect the truth, such as events where

- ...there is ongoing controversy about what happened (e.g., "Malaysia Airlines Flight 370 was brought down by terrorists")
- ...the outcome is subjective (e.g., "was Carter a good President?")
- ...the outcome is unreasonably difficult to determine (e.g., "what is President Obama's checking account balance?")

These questions are *not* good candidates for Augur! In fact, all questions include a "this was a bad question" answer, in case a user is asked to report on an ill-defined or unanswerable event.

(Of course, Augur's oracle can dutifully report the consensus in these cases -- and the consensus very often will be "this was a bad question" -- but it is up to you the user to use your judgment as to whether this consensus is an accurate reflection of the truth.)

- How do you determine the amount that each user contributed to the overall variability in this table?
- How do you determine what the correct event outcome actually was?
- How do you incentivize people to report honestly?

*Reputation*-- a tradeable token -- is needed to report on events. Each user's report is weighted by the number of Reputation tokens the user has. Reports are made on events (after they expire) at fixed, pre-defined intervals. (In this simple example, each user starts with a single Reputation token.)

*E*total events being reported on by

*N*users, and let

**r**be a column vector of Reputations. We first take the table

**B**of reports and construct its

*weighted covariance matrix*, where the weights are Reputations. The centered table (the table with each event's weighted mean value subtracted out)

**X**is:

**1**is a column vector of ones and

*T*denotes the transpose operation (swapping columns and rows). The ordinary (unweighted) covariance matrix is given by multiplying the centered data matrix and its transpose; similarly, the weighted covariance matrix is:

**R**has the elements of

**r**on its diagonal and zeros elsewhere,

**Σ**corresponds to the variability

*across*users, within a single event. Since there are

*E*events,

**Σ**is an

*E*x

*E*matrix. Next, we diagonalize

**Σ**,

**Λ**, an

*E*x

*E*matrix with

**Σ**'s eigenvalues on its diagonal, and zeros elsewhere:

**s**_1 associated with the largest eigenvalue λ_1 is the

*principal component*, which is a unit vector oriented in the direction of as much of the report matrix's variability as can be captured in a single dimension. This direction is a new coordinate axis. By projecting the (centered) data onto this axis, we obtain a

*non-conformity score*for each user:

*c_i*of this vector then tells us user

*i*'s contribution to the direction of maximum variance. Using these scores for event consensus is called

*Sztorc consensus*(named after its ruthlessly strategic and paranoid inventor, Paul Sztorc). These scores are then used to reward users who reported with the consensus (low scores) and punish those who voted against it (high scores).

*cumulative*percentage explained. This plot also suggests a simple way of incorporating more directions into our results: just keep adding eigenvectors until we reach some pre-defined threshold!

*variance threshold*(α) allows us to extract the number of components (

*n*) needed to explain at least α * 100% of the report matrix's total variance. The

*non-conformity vector*

**c**is calculated by summing together the projections onto the first

*n*components, each of which is weighted by its eigenvalue λ:

*fixed-variance threshold*method, first described in the Augur whitepaper. Although this method has improved resolving power (see below), it has the downside of being more computationally demanding than Sztorc consensus, as it requires the computation of the first

*n*eigenvectors.

*we only care about the scores*. The scores tell you how much each user contributed to the direction of maximum variance.

*E*x

*E*matrix, where rows/columns account for each event's covariance across all users. But, if all we care about is each user's relative contribution to the overall variability, this information is available more directly by constructing the

*per-user*covariance matrix:

**Σ**' is an

*N*x

*N*matrix, where each row/column represents a user's covariance, taken across all events. Summing across this matrix's rows, each user's contribution to the

*total*variance and covariance is found as the ratio of that user's sum to the total:

*when in doubt, simulate!*

- Users can be honest, in which case they just copy the correct answers down.
- Or dishonest/lazy/ignorant, in which case they just pick answers at random on their ballot.

**= 0.3.) Imagine that the dishonest/lazy/ignorant user calls up their buddy and says, "Hey Bob, can you just send a screenshot of your ballot over, I don't feel like filling this stupid thing out, I could be training for the upcoming hot dog eating contest instead." Dishonest users also have a γ^2 = 0.09 (9%) chance to copy in triples, and a γ^3 = 0.027 (2.7%) chance to copy in quadruples.**

*Sztorc*, which uses only the users' scores on the largest eigenvector.*Fixed-variance*is a weighted sum of eigenvectors up to a fixed threshold of 90% variance.*Covariance*uses ratios from the per-user covariance matrix, and does not require matrix diagonalization.

- pyconsensus: standalone Python implementation of Augur's oracle
- Simulator.jl: numerical simulations, statistics and plotting tools

*Edited on 3/4/2015 to include a short discussion of what Reputation is, as well as the kinds of events that Augur's oracle is meant to address.*