How Prediction Markets Work: The Real-Time Data Architecture Behind the Odds
Prediction markets let users trade contracts on the outcome of future events — elections, sports, economic indicators, AI milestones — where contract prices reflect the crowd's implied probability of each outcome. A $0.70 YES contract means the market estimates a 70% probability. Behind this simple interface lies a complex real-time data architecture: order matching engines processing thousands of trades per second, pricing algorithms continuously recalculating odds, oracle systems for event resolution, and settlement engines that instantly distribute payouts across hundreds of thousands of positions.
This article breaks down the technical architecture of modern prediction markets and shows how stream processing makes it all work in real time.
The Prediction Market Data Flow
Every prediction market follows this core data flow:
User places order → Order matching → Price/odds update → Position tracking → Event resolution → Settlement
│ │ │ │ │ │
REST API CLOB or AMM Real-time MV Streaming join Oracle feed Fan-out payout
Each step requires real-time data processing. Let's examine each component.
Order Matching: CLOB vs AMM
Modern prediction markets use one of two models to match buyers and sellers:
Central Limit Order Book (CLOB)
Polymarket and Kalshi use a CLOB model — the same order book structure used by stock exchanges.
How it works:
- Buyers submit limit orders: "I'll buy YES at $0.65"
- Sellers submit limit orders: "I'll sell YES at $0.67"
- The matching engine continuously matches compatible orders
- The best bid and best ask form the "spread" — tighter spreads mean more liquid markets
Polymarket's architecture:
- Hybrid-decentralized CLOB running on Polygon
- Orders are signed EIP-712 messages submitted off-chain to the matching operator
- Settlement occurs on-chain via the Conditional Token Framework (CTF), using ERC-1155 tokens
- Each market has two token types: YES and NO
- The matching operator batches up to 15 orders per on-chain call for efficiency
Kalshi's architecture:
- Fully centralized, CFTC-regulated Designated Contract Market
- Traditional financial exchange matching engine
- Regulated like a commodity futures exchange
Automated Market Maker (AMM)
Some prediction markets use algorithmic pricing instead of an order book:
Logarithmic Market Scoring Rule (LMSR):
- Originated by economist Robin Hanson specifically for prediction markets
- Guarantees continuous liquidity — you can always buy or sell without waiting for a counterparty
- A liquidity parameter controls price sensitivity: higher values = slower price movement, lower = faster
- Prices are bounded between 0 and 1, naturally representing probabilities
- LMSR directly inspired the constant function market makers (CFMMs) used by Uniswap and other DeFi protocols
Trade-offs:
| Aspect | CLOB | AMM (LMSR) |
| Liquidity | Depends on market makers | Always available (algorithmic) |
| Price discovery | Supply/demand driven | Formula-driven |
| Spread | Tight when liquid, wide when not | Consistent but wider |
| Capital efficiency | Higher (no locked liquidity) | Lower (subsidized by operator) |
| Best for | High-volume markets | Thin, long-tail markets |
Real-Time Odds Recalculation
Every trade changes the odds. A prediction market must recalculate and publish updated prices after every order fill — in milliseconds.
What Needs to Be Computed in Real Time
-- Current market state (must update with every trade)
- Last trade price → implied probability
- Best bid / best ask → current spread
- 24-hour volume → market activity signal
- Open interest → total capital at risk
- Market depth at each price level → liquidity profile
- Weighted average price → VWAP for analytics
Implementing with a Streaming Database
In RisingWave, these computations are materialized views that update automatically:
-- Real-time market pricing from trade stream
CREATE MATERIALIZED VIEW market_pricing AS
SELECT
market_id,
-- Current price = last trade price
last_value(price ORDER BY trade_time) as current_price,
-- Implied probability
last_value(price ORDER BY trade_time) as implied_probability,
-- Volume metrics
SUM(quantity) FILTER (WHERE trade_time > NOW() - INTERVAL '24 hours') as volume_24h,
SUM(quantity * price) FILTER (WHERE trade_time > NOW() - INTERVAL '24 hours') as notional_24h,
-- Trade count
COUNT(*) FILTER (WHERE trade_time > NOW() - INTERVAL '24 hours') as trades_24h,
-- Price range
MIN(price) FILTER (WHERE trade_time > NOW() - INTERVAL '24 hours') as low_24h,
MAX(price) FILTER (WHERE trade_time > NOW() - INTERVAL '24 hours') as high_24h,
-- Latest update
MAX(trade_time) as last_trade_time
FROM trades
GROUP BY market_id;
This view updates within milliseconds of every trade — no polling, no batch jobs.
Position Tracking and Real-Time P&L
Every user has a portfolio of positions across multiple markets. Computing real-time P&L requires joining the user's trade history with current market prices:
-- User position tracking
CREATE MATERIALIZED VIEW user_positions AS
SELECT
user_id,
market_id,
side,
SUM(quantity) as position_size,
SUM(quantity * price) / SUM(quantity) as avg_entry_price,
SUM(quantity * price) as total_cost
FROM trades
GROUP BY user_id, market_id, side;
-- Real-time P&L (join positions with live prices)
CREATE MATERIALIZED VIEW user_pnl AS
SELECT
p.user_id,
p.market_id,
p.side,
p.position_size,
p.avg_entry_price,
m.current_price,
CASE
WHEN p.side = 'YES'
THEN p.position_size * (m.current_price - p.avg_entry_price)
ELSE p.position_size * (p.avg_entry_price - m.current_price)
END as unrealized_pnl
FROM user_positions p
JOIN market_pricing m ON p.market_id = m.market_id;
Event Resolution and Settlement
The most critical real-time operation is settlement: when an event resolves (the election is called, the game ends), the platform must instantly compute and distribute payouts to every winning position.
The Fan-Out Problem
A popular market like "Will the 2024 US election go to the Republican candidate?" had hundreds of thousands of positions. When the event resolves:
- Oracle submits resolution (YES or NO)
- Settlement engine must compute payout for every open position
- User balances must update atomically
- All of this should happen in seconds, not hours
Streaming Settlement with SQL
-- Settlement engine: join trades with oracle resolution
CREATE MATERIALIZED VIEW settlements AS
SELECT
t.user_id,
t.market_id,
t.side,
t.quantity,
t.price as entry_price,
o.outcome,
CASE
WHEN t.side = o.outcome THEN t.quantity * (1.0 - t.price) -- Winner payout
ELSE -t.quantity * t.price -- Loser loss
END as settlement_amount
FROM trades t
JOIN oracle_resolutions o ON t.market_id = o.market_id;
-- Aggregate into user balances
CREATE MATERIALIZED VIEW user_balances AS
SELECT
user_id,
SUM(settlement_amount) as total_balance
FROM (
-- Cash deposits
SELECT user_id, amount as settlement_amount FROM deposits
UNION ALL
-- Settled P&L
SELECT user_id, settlement_amount FROM settlements
) combined
GROUP BY user_id;
When the oracle publishes a resolution, the settlements view instantly computes payouts for all positions via the streaming join. User balances update in real time.
Cross-Market Arbitrage Detection
Arbitrage opportunities arise when related markets on different platforms price the same event differently. For example, if Polymarket prices "Event X" at $0.70 YES and Kalshi prices it at $0.65 YES, a trader can buy on Kalshi and sell on Polymarket for a risk-free profit.
-- Detect cross-platform arbitrage opportunities
CREATE MATERIALIZED VIEW arbitrage_opportunities AS
SELECT
p.event_name,
p.platform as platform_a,
p.current_price as price_a,
k.platform as platform_b,
k.current_price as price_b,
ABS(p.current_price - k.current_price) as spread,
CASE
WHEN p.current_price > k.current_price THEN 'Buy on B, Sell on A'
ELSE 'Buy on A, Sell on B'
END as action
FROM polymarket_prices p
JOIN kalshi_prices k ON p.event_id = k.event_id
WHERE ABS(p.current_price - k.current_price) > 0.03; -- 3% minimum spread
Risk Management
Real-time risk management prevents catastrophic losses:
-- Position exposure monitoring
CREATE MATERIALIZED VIEW risk_dashboard AS
SELECT
user_id,
COUNT(DISTINCT market_id) as active_markets,
SUM(position_size * current_price) as total_exposure,
MAX(position_size * current_price) as largest_position,
SUM(ABS(unrealized_pnl)) as total_unrealized_risk
FROM user_pnl
GROUP BY user_id;
-- Alert on excessive concentration
CREATE MATERIALIZED VIEW risk_alerts AS
SELECT user_id, total_exposure, largest_position
FROM risk_dashboard
WHERE total_exposure > 100000 -- $100K exposure limit
OR largest_position / total_exposure > 0.5; -- >50% in single market
Why Streaming Databases Fit Prediction Markets
Traditional prediction market architectures use a fragmented stack:
Trade Events → Kafka → Flink (processing) → Redis (cache) → PostgreSQL (storage)
A streaming database like RisingWave collapses this into:
Trade Events → RisingWave (processing + serving + storage)
Advantages:
- Single system for ingestion, processing, and serving
- SQL-only — no Java, no complex stream processing code
- Sub-second updates — materialized views refresh within milliseconds
- PostgreSQL protocol — dashboards, APIs, and admin tools connect directly
- State on S3 — elastic scaling, fast recovery, no local disk failures
- Exactly-once — no double-counting of trades or settlements
The Market Opportunity
Prediction markets are the fastest-growing segment in financial technology:
- 2024 US Election: Polymarket alone processed $3.7 billion in trading volume
- 2025 Annual volume: $44 billion globally across all platforms
- 2026 Weekly volume: ~$6 billion on Kalshi + Polymarket combined
- Growth trajectory: Projections suggest $1.3 trillion in annual volume by end of 2026
- Sports dominance: 80%+ of 2025-2026 trading volume is sports-related
- Regulatory tailwind: Both Polymarket and Kalshi now have CFTC approval as Designated Contract Markets
This growth creates massive demand for real-time data infrastructure that can handle increasing trade volumes, market counts, and settlement complexity.
Frequently Asked Questions
How do prediction market odds work?
Prediction market prices represent implied probabilities. A YES contract trading at $0.70 means the market collectively estimates a 70% probability of the event occurring. If you buy YES at $0.70 and the event happens, you receive $1.00 — a profit of $0.30. If it doesn't happen, you lose your $0.70.
What is the difference between CLOB and AMM in prediction markets?
A Central Limit Order Book (CLOB) matches buyers and sellers based on price-time priority, like a stock exchange. An Automated Market Maker (AMM) uses a mathematical formula (like LMSR) to set prices algorithmically. CLOBs offer tighter spreads in liquid markets; AMMs guarantee continuous liquidity even in thin markets.
Why do prediction markets need stream processing?
Prediction markets require real-time updates for odds, user positions, P&L, risk management, and settlement. Batch processing with hourly or daily refreshes is fundamentally incompatible with live trading. A streaming database like RisingWave maintains all these computations as continuously updated materialized views, ensuring every query returns the latest state.
How does settlement work in prediction markets?
When an event resolves, an oracle (UMA Optimistic Oracle for Polymarket, internal for Kalshi) publishes the outcome. The settlement engine computes payouts for every open position: winners receive $1.00 per contract minus their purchase price; losers forfeit their purchase price. In a streaming database, settlement is a continuous join between the trades stream and the oracle feed — payouts compute instantly when the resolution arrives.
How big is the prediction market industry?
Prediction markets are growing explosively. Polymarket processed $3.7 billion in the 2024 US election alone. Global annual volume reached $44 billion in 2025. Weekly volume in early 2026 is approximately $6 billion. Industry projections suggest annual volume could reach $1.3 trillion by end of 2026, driven by sports betting, regulatory approval (CFTC), and institutional adoption.

