The Missing Primitive in Social Prediction Markets Is the Cost of Opinion

For most of the past decade, the standard defence of prediction markets has run on two pillars. First, that they aggregate dispersed knowledge better than any other available mechanism: wisdom of crowds, Iowa Electronic Markets, the Hayekian argument from prices. Second, that they incentivise honest expression because traders have skin in the game.
Both claims contain real truth. Neither names the missing primitive that actually distinguishes a working prediction market from a Twitter poll, a comment section, or a Discord vote.
That primitive is a cost on having an opinion. Without it, none of the wisdom-of-crowds machinery has anything to aggregate. With it, the rest follows. The cost can take more than one form, but it has to exist, and it has to compound.
What Free Opinions Look Like
Twitter polls launched in October 2015. At their peak in the late 2010s they drove millions of votes per day and were the most ambitious attempt to aggregate public opinion in the history of the consumer internet. They are now functionally dead as a serious signal, used mostly for engagement bait or community ritual, almost never cited as evidence of anything.
The reason is not that the format failed. The reason is that the opinions were free in the only sense that matters: they were not attached to a persistent, public, identity-bound record.
A Twitter poll vote costs nothing financially, which is the obvious point. But it also costs nothing reputationally. The voter does not have to defend the choice, does not have to remember it, does not have to face it again. The vote is anonymous in aggregate, ephemeral in display, and never accumulates into a forecasting record the voter has to live with. There is no Brier score on your Twitter profile, no public history of polls you got wrong, no scoreboard that follows you between questions.
Consider the contrast with the Hollywood Stock Exchange. HSX, launched in 1996, has been quietly running a prediction market on movie box office and award outcomes for almost three decades. It uses entirely virtual currency, there is no financial cost to participation. And yet in 2007, HSX players collectively predicted 32 of 39 major-category Oscar nominees and 7 of 8 top-category winners. It has been studied by economists for years as a working example of dispersed-information aggregation. No money changes hands at any point.
This would seem to contradict the "skin in the game" argument. Players are not staking real capital. Why does it work?
Because HSX makes opinions costly the other way. Every player has a public portfolio. A persistent ranking. A trading history that follows the username from one prediction to the next. A leaderboard that displays who has been right and who has been wrong over months and years. Calling a movie's gross is reputationally costly in exactly the same way calling an election on a financial market is. You commit publicly to a position you will be measured against, and the measurement persists.
The principle is not "real money is required." The principle is that opinion has to be costly in some way that compounds into identity. Financial cost is one form. Persistent reputational cost is another. Twitter polls have neither, which is why they generated enormous volume and almost no signal. HSX has the second without the first, which is why it has been quietly accurate for thirty years.
What Costly Opinions Do
When opinion is costly (financially, reputationally, or both) three things happen simultaneously. The three together are the entire mechanism worth caring about.
Filtering. The expression is screened through a willingness-to-pay test. Not every opinion the user holds, only the ones held with enough conviction to absorb the cost. The filter is imperfect. Risk preferences, capital constraints, gambling impulses, and status motivations all introduce noise. But it is the only filter the consumer internet has at any scale.
Recording. The position is logged. It is attached to an identity. The next time anyone wants to know what this user thought about this question, the answer is unambiguous and timestamped. There is no "I always said it would happen" revisionism, no quiet pivot. Persistence does not require permanence because it requires that the cost of changing the record exceeds the cost of living with it.
Compounding. Being right or wrong on a costly opinion is now visible to anyone who looks. In the social variant, that visibility happens in the same feed where the rest of the user's reputation is being built. Reputation as a predictor becomes a durable asset, or a durable liability. The next prediction is more costly to make casually because the audience now knows what to weigh.
These three effects compound in the literal sense. Filtering produces signal. Signal makes the record meaningful. The meaningful record raises the cost of future predictions. Over enough cycles, the system builds something none of the free-opinion systems can: an identity layer for opinion.
Why This Isn't Truth-Finding (Honest Accounting)
The wisdom-of-crowds defence of prediction markets is technically true and substantively misleading.
The empirical record on prediction-market accuracy is real but more modest than the headline claim suggests. The Iowa Electronic Markets (running since 1988 and the most extensively studied prediction market in academic literature) produces an average absolute error of 1.33 percentage points across fourteen US presidential election eves. The IEM has been closer to the outcome than polling 74% of the time over its history. These are real results.
But the same body of literature is more careful than the popularisations. Erikson and Wlezien's 2012 historical reassessment, comparing markets to polls across the 1936–2008 elections, found that "polls have not performed any worse than the early election markets" and that "when both market prices and polls are available, prices add nothing to election prediction beyond polls." Combined-method forecasts (markets + polls + expert judgment + quantitative models) consistently outperformed any individual component, including the markets.
The Polymarket 2024 election cycle is the most recent test case and tells a similarly mixed story. Cumulative volume on the presidential market exceeded $3.5 billion, making it one of the largest concentrated political prediction markets ever assembled. The headline result was that Polymarket called the Trump victory more confidently than mainstream polls did. The detail is more complicated.
A single French trader operating under the pseudonym "Théo" placed over $45 million on Trump across multiple accounts, eventually winning more than $80 million per Chainalysis tracking. Four whale accounts held approximately 25% of the Trump Electoral College contracts and 40% of the Trump popular vote contracts. In October, Polymarket showed Trump with a 32-percentage-point lead at moments when traditional polls had the race within the margin of error. The market got the ultimate result right. It also exhibited concentration that would be a serious flag in any other context.
Markets reflect trader belief distributions. Trader belief distributions are systematically biased: by recency, by narrative dominance, by who has capital, by whether a question is plausibly arbitrageable. The favourite-longshot bias (empirically robust across pari-mutuel, bookmaker, and competitive market structures, persisting for over fifty years, with longshots returning roughly -61% versus -23% for random betting) is one well-documented example. The Manifold Markets community calibration data shows aggregate Brier scores around 0.168: accurate enough to be useful, far from the asymptotic limit of zero.
This is not a critique. It is just an accounting. "Truth-finding" is a description of one occasional output of prediction markets, accurate within bounded conditions. The fundamental thing being installed is the cost mechanism. Truth-tracking is a downstream effect of the mechanism being installed well.
The Tradeoff That Defines the Design Space
If costly opinion is the missing primitive, the design problem becomes: how costly, and in what currency?
Three platforms illustrate the range, and what each tells us about the limits.
Augur (2018): What Happens When the Cost Is Right but the Friction Is Catastrophic
Augur launched on Ethereum mainnet in July 2018 as the first fully decentralised prediction market protocol. It was technically ambitious and ideologically pure: no curation, on-chain settlement, REP token-based oracle resolution. The financial cost was real and the reputational record was on-chain and permanent. By the costly-opinion thesis, this should have worked.
It did not. Daily active users dropped from 265 in early July 2018 to 37 by August 8, an 86% collapse in a month. Peak TVL across the protocol's entire life remained around $3 million, a rounding error in the broader context of online wagering volumes. By 2024, Augur's share of the prediction market category had fallen below 10%.
The post-mortems converge on three causes: gas costs that made small markets uneconomic, oracle disputes that took weeks to resolve, and a UI that required novice users to navigate Ethereum tooling. The cost structure was correct in principle but stacked with so much friction at every stage that ordinary participation was infeasible. Augur is a clean case study of the upper-bound failure mode: cost the user something meaningful and the user stops showing up regardless of the philosophical merits.
Hollywood Stock Exchange (1996): What Happens When the Cost Is Reputational and the Friction Is Almost Zero
HSX is the lower-bound counter-example. No financial cost. Trivial UX (it has been a casual web product for nearly thirty years). Wide audience by prediction market standards.
What it does have: persistent identity, public portfolios, leaderboards, decade-long trading histories. The cost to opinion is entirely reputational, but the reputational cost is real because the record is durable and visible. The result is a system that has produced consistently accurate forecasts on movie box office and award outcomes for nearly three decades, despite no money ever changing hands.
HSX matters because it demonstrates the financial-cost design choice is one implementation of the underlying primitive, not the primitive itself. The primitive is identity-attached, persistent, public commitment. Money is one way to make commitment costly. Reputation is another. Either works if the cost actually compounds.
Polymarket (2020): What Happens When the Cost Works but the Audience Filter Is Sharp
Polymarket is the case study for the high-end equilibrium: financial cost plus on-chain identity, integrated with crypto-native social fabric (Crypto Twitter, podcast culture, Discord). The 2024 election demonstrated it can produce $3.5B in concentrated volume on a single market, can move headline narratives, and can generate genuine signal value when the questions are well-defined.
But it also demonstrated the limits of the audience filter. The four-whale concentration on Trump positions, the $45 million single-trader exposure, the moments of 32-point spreads against tight polling — these are not signs of a broken market. They are signs of what happens when the cost barrier is high enough that the participant pool is small enough that individual capital can move headline numbers. The signal is real. The signal is also distorted by the same property that makes it credible.
Why the Equilibrium Is Hard
The mid-curve equilibrium: cost low enough for ordinary participation, high enough for genuine signal, embedded in social fabric strong enough that the signal travels but has not been hit. Three obstacles, and they intersect.
The bootstrapping problem. A prediction market with no liquidity is not tradeable; a market with no traders has no liquidity. Augur's $3M peak TVL is the canonical illustration of failure here. Bonding curves are a recent attempt at sidestepping the problem (they let a market be tradeable from the first position taken without requiring an order book) but they introduce price impact and slippage that distort signal at thin volumes. The tradeoff has not been eliminated, only relocated.
The resolution problem. Subjective questions are the most interesting and the hardest to resolve cleanly. Augur's 2023 oracle outage caused a 28% volume drop. A single resolution failure was sufficient to durably damage trust. Manual resolution (Polymarket's UMA-based human arbiter model, Kalshi's regulated arbitration) introduces dispute risk, takes hours to days, and carries an obvious conflict-of-interest problem when the arbiter has economic exposure to the outcome. Fully automated resolution against external feeds requires question design that pre-filters out the most interesting subjective questions. The pattern that has emerged in the last 18 months (multi-model consensus protocols, with cryptographic attestation that the correct process ran) is the first credible architecture for resolving subjective questions at the speed and cost that short-window social markets require. Whether it holds up in production at scale is the open empirical question.
The social fabric problem. A market that does not propagate beyond its participants is closed. Polymarket markets are widely cited but the active trading community is small in absolute terms. The headline volumes mask significant whale concentration. Kalshi has the regulatory imprimatur but limited cultural penetration outside dedicated forums. Twitch Predictions are culturally pervasive but uncosted. Integration of costly opinion with the social platforms where conversations already happen (at scale, not just at the margin) has not been done well.
These three obstacles are usually solved in pairs by any given platform. They have not yet been solved together.
Why Now
The design space has only recently become navigable. Three forces have converged that were not all present even five years ago.
AI makes subjective resolution viable. Until LLMs could reliably evaluate social-context outcomes against consistent criteria, "Did Musk tweet X by Y?" required either human moderators (slow, contested, costly) or question design that pre-filtered out anything subjective (boring, narrow). The architecture that has made this credible is not a single LLM applying judgment. That introduces hallucination risk and single-point failure the field rightly distrusted. It is multi-model consensus protocols where independent frontier models evaluate in isolation and resolution requires agreement. This is the principle Karpathy has articulated around probabilistic systems: you do not trust a single forward pass, you architect for verification. The obstacle was load-bearing for two decades. The architecture to dissolve it is now operational.
Crypto enables programmable cost. Bonding curves, on-chain settlement, permissionless market creation, and low-friction custody are crypto-native primitives. Pre-crypto, every prediction market needed a centralised custodian, manual settlement, and a regulatory licence. The set of platforms that could exist was small and the cost structures rigid. Programmable cost (a curve that scales with conviction, settlement that does not require trust, market creation that does not require approval) opens design choices that were previously closed.
Social platforms have normalised public identity layers. Compounding identity-attached commitment requires existing public-identity infrastructure to compound into. The 2010s built it. Persistent usernames, follower counts, durable reputation, public histories. These are now culturally taken for granted in a way they were not when HSX launched in 1996 or Augur launched in 2018. Reputation as a financial-grade asset class is normalised. The host platforms are finally ready for prediction layers to ride on top of them rather than build their own from scratch.
None of these three is sufficient on its own. Together, they make the equilibrium navigable for the first time. Whether anyone successfully navigates it is a separate question. But the question can now be asked.
What a Successful Design Would Have to Hold
Without claiming any specific platform will achieve this, the design principles for a social prediction market that breaks out look like this:
Cost calibrated to ordinary participation. Whether financial or reputational, the cost has to be high enough to filter and low enough that the audience does not pre-select to crypto-natives or licensed bettors.
Costs that scale with conviction. A flat fee is regressive on signal. A bonding curve, or any mechanism where price impact rises with position size, naturally surfaces conviction in the price.
Resolution that handles subjectivity at acceptable speed. Most interesting opinions are subjective. Automated resolution that cannot handle subjectivity selects against the most valuable markets. Resolution latency that exceeds the relevant social half-life kills the market.
Native integration with existing social fabric. A standalone PM platform asks users to migrate from where they already build identity. A PM embedded in the platform where opinions already live asks them only to make costly the takes they were already going to express.
If a system holds all four, it has a real chance at the equilibrium. None has cleanly held all four yet.
Where Kash Sits. The Specific Bets
Kash is one attempt to navigate the friction curve and the three obstacles simultaneously. The specific bets are worth naming.
Bet 1: cost should sit between Twitter polls and Polymarket, not at either end. Polymarket optimised for institutional-grade liquidity and won the high-end equilibrium, at the price of an audience filter that confines its signal to crypto-natives and political junkies. Twitter polls optimised for zero friction and got mass participation with zero signal. The unaddressed middle is where ordinary users will absorb $1–$5 of conviction cost on takes they actually care about. The bonding curve is the mechanism for getting cost to scale with conviction rather than gating entry behind a flat threshold.
Bet 2: cost should be financial AND reputational, compounded together. HSX shows reputational cost alone can work, but it required nearly thirty years of leaderboard accumulation to build the reputational durability that makes the cost matter. Polymarket shows financial cost alone produces signal but does not propagate. Combining real capital with persistent on-chain identity (Proof of Intelligence cards, public position history visible in the same feed where reputation already lives) is a bet that the two compound faster than either alone.
Bet 3: the market lives inside the platform where the conversation already happens. Polymarket's structural disadvantage is that taking a position requires leaving the feed. Quote-tweeting @kash_bot keeps the prediction inside the same surface where the opinion was formed. Whether this dissolves the social-fabric problem or merely binds the platform to one host's algorithm is genuinely uncertain, but the alternative (asking users to migrate) has a long failure record. Augur is the reference case.
Bet 4: subjective resolution at scale requires multi-model consensus plus cryptographic verification, and this architecture is now operational. The implementation matters here because the field's existing oracle designs do not support the product category. Chainlink and UMA are slow and expensive at the per-market level, which makes thin short-window markets uneconomic. Polymarket's human arbiter model carries conflict-of-interest at the resolution layer: a committee resolving a political market has skin in the outcome. The architecture Kash bets on is different: an LLM Council where independent frontier models evaluate the outcome against the original market conditions in isolation, consensus required to resolve, automatic escalation on disagreement. Resolution is wrapped in a zero-knowledge proof published on-chain. Users can verify the correct process ran without trusting the specific models or re-running inference themselves. The downstream consequence is structural, not incremental: markets that open and resolve in minutes at sub-$1 stakes become economically viable for the first time. That cost-and-speed envelope is the precondition for the drama-driven micro-markets the costly-opinion mechanism actually wants to host. If the architecture fails in production (model collusion, ZKP brittleness, edge cases the council cannot reach consensus on) the entire short-window category collapses with it. The early evidence is encouraging but not yet decisive.
What this implies others are getting wrong:
Polymarket and Kalshi are over-indexed on accuracy. Both treat "the market should produce a price that tracks reality" as the core promise. The data above suggests accuracy is a downstream property of a mechanism whose actual job is making opinion costly. Optimising for accuracy at the cost of ordinary participation has produced strong markets with limited broader reach, which is exactly what the price-discovery framing predicts and exactly what the costly-opinion framing identifies as the unsolved problem.
Twitch and Discord are over-indexed on engagement. Channel Points and free polls treat participation volume as the goal. Engagement without compounding identity is just engagement. It does not aggregate, does not create reputation, does not compound into anything that travels beyond the moment. They produce cultural product but not a signal layer.
The standalone-platform model is structurally limited. Every PM platform that asks users to leave their existing social home is fighting the same migration cost. The next phase will be PMs that integrate with where users already live, not PMs that try to be where users live. The host platform's algorithm becomes the distribution layer; the PM becomes the cost layer riding on top.
The oracle layer is being treated as plumbing when it is actually a product constraint. Every existing PM platform inherits its question categories from its oracle architecture. Chainlink, UMA, and Polymarket's human arbiters can resolve binary outcomes against external feeds reliably, but they cannot resolve subjective questions at the speed and cost short-window markets require, which means the entire category of fast, drama-driven, social-narrative markets is structurally unavailable to those platforms regardless of intent. The next phase of the field will be defined by which oracle architectures unlock which question types. Treating resolution as solved infrastructure rather than as the binding constraint on product surface is the most consequential mistake the incumbents are making.
The honest version: these are bets, not conclusions. The Polymarket whale data, the Augur DAU collapse, the IEM's modest accuracy advantage, the HSX precedent, these data points point in different directions. The bet is that the missing primitive is cost, the cost is most usefully implemented as financial-plus-reputational, the host should be the social platform where the conversation already happens, and the oracle architecture is the binding technical constraint that makes the whole product category possible. If those bets are wrong, the next decade of PM design will look very different from the one Kash is betting on.