Note:MathJax

Prediction Markets

Posted: 2017-10-03.

This post will introduce prediction markets based on cost functions and/or proper scoring rules. It will focus on intuition and background rather than technical details.

Follow the Money

Suppose your schoolyard friend Prediction Pat offers you a bet. You can pay Pat 50 cents now, and he will pay you a dollar if the Home Team wins the Big Game this year (against our rivals from Geographically Neighboring Area). If they lose, you get nothing and Pat keeps the payment.

Would you take the bet? Of course you would, just to prove your faith in the team. But imagine you were a member of Homo Economicus, and you had a belief that $q$ is the probability of winning. Then your expected net payoff for taking the bet is $q$ times one dollar minus 50 cents, or $q - 0.5$. So you would only take the bet if $q \gt 0.5$.

Now imagine Pat offers you the same bet -- a dollar if the Home Team wins -- but it'll cost you $p$, let's say for example $p = 0.1$. Then you (as a strictly rational agent) would be willing to take the bet even if you think there's not much chance of a win, say $q = 0.2$, because your expected net payoff is $q-p$.

Notice -- the price you're willing to pay for the bet reveals your belief about the game.

I Don't Wanna Grow Up

Jump forward 10 years, and Pat has become a bit more enterprising. For one thing, the bets are now termed contracts. Also, he's set up a market offering contracts for purchase and sale to all comers. For now, let's consider the simplest case: contracts that pay off $\$1$ if a certain event occurs (say Home Country wins the Olympics in A Particular Sport).

Pat has participants arrive to the market one at a time. (In real stock markets, and some prediction markets, participants trade directly with each other via e.g. continuous double auctions, but these are too messy for theorists of prediction markets. And Pat was always a theorist at heart, so...)

When participant $t$ arrives, Pat offers a price $p^{(t)}$ for buying the contracts. A single share costs $p^{(t)}$ (let's say for now) and pays off $\$1$ if the event occurs, $\$0$ otherwise. The participant can buy multiple shares, or even a fraction of a share: If they buy $0.3$ shares, they get a contract that costs $0.3p^{(t)}$ (let's say for now) and pays off $\$0.30$ if the event occurs.

Now, let's say the current price is $p^{(t)}$, but the participant only thinks there's a chance $q \lt p^{(t)}$ of the event occurring. Then the participant would expect to lose money by purchasing a share. So Pat also offers participants the option to sell shares: Pat pays the participant $p^{(t)}$, and she must pay back $\$1$ if the event occurs. Notice that her expected payoff for one share of this contract is $p^{(t)} - q$. (Pat's in finance these days, so he calls this short-selling.)

Name Your Price

Now the only quesiton left for Pat is how to set prices. This is how participants' beliefs are aggregated into a prediction, as prices adjust to match demand. Naturally, if lots of people are buying contracts at the current price, it's probably too low, and vice versa. In fact, the same reasoning applies to any one person: if she wants to buy a ton of contracts, probably the price should go up.

Here's the (theoretical) prediction market approach (due roughly to Hanson 2003, Chen and Pennock 2007, and Abernethy et al. 2013): Pat keeps track of the total number of contracts (or "shares") sold in the event ocurring. Each time an additional share is purchased, the price goes up a bit. If a single person wants to buy many shares, she has to pay the increased price for each additional share.

More formally, Pat specifies a cost function $C: \mathbb{R} \to \mathbb{R}$. If the total number of shares sold so far is $\theta$, then the price for buying the next share is $C(\theta + 1) - C(\theta)$. The price for short-selling a share, i.e. decreasing the total number of outstanding shares by $1$, is by the same logic $C(\theta - 1) - C(\theta)$.

Now, if $C(\theta) = \alpha \theta + \beta$ for constants $\alpha, \beta$, then buying one share would always cost exactly $\alpha$, and selling a share would always gain exactly $\alpha$. That's no good, since the price never adjusts based on how many shares have been purchased. Earlier, we said that if many people are buying shares, prices should go up (and vice versa); i.e., as $\theta$ increases, the price $C(\theta + 1) - C(\theta)$ should increase. This is essentially a statement that the derivative of $C$ should be increasing, i.e. $C$ should be a convex function.

Notice that this pricing algorithm works for purchasing fractions of a share as well, e.g. purchasing some small amount $d\theta$ has a price $C(\theta + d\theta) - C(\theta)$. As $d\theta \to 0^+$, this price approaches $\frac{dC(\theta)}{d\theta}$ (assuming $C$ is differentiable). So $\frac{dC(\theta)}{d\theta}$ represents the current "market price", or prediction of the probability of the event.

Another way to view this is that, for small purchase amounts, the current derivative of $C$ is approximately the price per share paid. So we can write $p^{(t)} = \frac{dC(\theta)}{d\theta}$ where $\theta$ is the total number of shares sold up to time $t$.

And Now For Something Completely ... Equivalent

Remember proper scoring rules? Of course you do -- and you're not the only one. It turns out that Pat's long-lost twin, Prediction Patti, also wants to run a market to predict the Olympic Game, but is beginning with a more principled approach. She knows that she can use a proper scoring rule $S(p,Y)$ where $Y = 1$ if Home Country wins and $0$ otherwise. A participant reporting a prediction $p$ will maximize expected profit by being truthful.

But it's a bit expensive to offer a proper scoring rule to every single participant $t=1,\ldots,T$ who comes along. So Patti utilizes a key idea from Robin Hanson (2003): the so-called "market scoring rule". She makes an initial prediction herself, call it $p^{(0)}$. The first participant makes a new prediction, $p^{(1)}$, and his eventual payoff is $S(p^{(1)},Y) - S(p^{(0)},Y)$. In other words, he gets paid the difference in score between his prediction and Patti's. Because Patti's score is not under his control, he still maximizes expected score by reporting truthfully.

Similary, for any participant $t$, Patti (who likes to take a recursive approach in life) will ask him for a prediction $p^{(t)}$ and pay him $S(p^{(t)},Y) - S(p^{(t-1)},Y)$. It's worth noting that for reasonable scoring rules, participants will stand to make or lose small amounts of money if their new prediction is a small modification of the previous one, whereas it will require a very large "budget" to make a vastly different prediction.

Exercise. What is Patti's total payment over all $T$ participants?

Now, you might think that Pat and Patti have used completely different, irreconciliable approaches. But you might also think, based on standard narrative conventions, that their paths are destined to eventually meet, at which point they team up to recover the long-lost family fortune, or restore honor to the family name, or so on. Well, I can see there's no fooling you -- that's basically correct. They just need a little reunion help from a mutual friend: Duality Doug.

Duality Doug

Doug observes that Patti's market can still be phrased as a sequence of contracts. Furthermore, using the proper scoring rule characterization, he knows that we can write $S(p,Y) = G(p) + dG(p) \cdot (\delta_Y - p)$ for some convex function $G$ and choice of subgradient $dG(p)$. And finally, he recalls the basics of convex duality. So given convex function $G$, he lets $C$ be its convex conjugate, and observes that \begin{align*} S(p,Y) &= G(p) - dG(p)\cdot p + dG(p) \cdot \delta_Y \\ &= - C(\theta) + \theta \cdot \delta_Y \end{align*} where $p$ and $\theta$ are conjugate pairs (i.e. $\theta = dG(p)$). So in Patti's scoring rule market, participant $t$ gets the following "contract" (where $\theta^{(t)} = dG(p^{(t)})$ and so on): \begin{align*} S(p^{(t)},Y) - S(p^{(t-1)},Y) &= - C(\theta^{(t)}) + \theta^{(t)} \cdot \delta_Y + C(\theta^{(t-1)}) - \theta^{(t-1)}\cdot \delta_Y \\ &= C(\theta^{(t-1)}) - C(\theta^{(t)}) + d\theta \cdot \delta_Y \end{align*} where $d\theta = \theta^{(t)} - \theta^{(t-1)}$. In other words, the participant gets the exact same payoff in Patti's market as he would by purchasing $d\theta$ shares in Pat's market, if Patti uses scoring rule $S$ and Pat uses cost function $C$. Meanwhile, notice that by definition of conjugate pairs, we have at market share state $\theta$, the current prediction is $p = dC(\theta)$ where $dC(\theta)$ is a subgradient of $C$ at $\theta$.

So what do we have? (Roughly) every cost-function-based market is equivalent to some scoring-rule-based market, and vice versa. The equivalence lies in the convex duality between cost functions $C$ and expected-score functions $G$. (Three cheers for Doug, everybody!)

So far, we've only discussed prediction markets over a binary event (e.g. win/lose) and glossed over any technical details. In a future post, we'll get a bit further into the math, such as differentiability of the various convex functions and extend this equivalence much farther.

History

Hanson (2003) really had the breakthrough idea, which was to use the "market scoring rule" structure. Hanson focused on the logarithmic scoring rule, but the principle of course extends to any scoring rule. Chen and Pennock (2008) further developed the connection between scoring rules and cost functions, and this connection was fully fleshed out by Abernethy, Chen, and Wortman-Vaughan (2013). (In his 2013 thesis, Rafael Frongillo also elucidated the point that a scoring rule $S(p,y)$ can be viewed as a cost-function-based contract.)

PROBLEM 1.
Suppose that Pat and Patti wish to predict an event with many possible outcomes, e.g. the winner of the Olympic 1500m final. Let $\mathcal{Y}$ be the set of all competitors. Patti's scoring rule market can simply employ a proper scoring rule $S: \Delta_{\mathcal{Y}} \times \mathcal{Y} \to \mathbb{R}$. Derive the equivalent cost function market. In particular, interpret the "market state" $\theta$ and the "instantaneous prices" $\nabla C(\theta)$.

References

(2003) Robin Hanson. Combinatorial information market design. Information Systems Frontiers.

(2007) Yiling Chen and David M. Pennock. A utility framework for bounded-loss market makers. Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence.

(2013) Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. Efficient market making via convex optimization, and a connection to online learning. ACM Transactions on Economics and Computation.