Detecting Wash Trading in Crypto Markets with Streaming SQL

Detecting Wash Trading in Crypto Markets with Streaming SQL

Wash trading — where the same entity buys and sells to itself to inflate volume — is rampant in crypto markets. With RisingWave, a PostgreSQL-compatible streaming database, you can identify round-trip trades, counterparty recycling, and self-dealing patterns in real time using SQL materialized views over your trade event stream, flagging suspicious activity within seconds of it occurring.

Why Wash Trading Matters in Crypto

Wash trading distorts nearly every metric that market participants rely on. Inflated volume figures make a token appear liquid when it isn't. Artificial price support misleads retail traders. Exchanges publishing fake volume attract users under false pretenses.

The problem is widespread:

  • Token launches use wash trading to appear in "top movers" and "high volume" lists
  • NFT projects inflate floor prices and collection volume to attract buyers
  • Market makers may engage in self-dealing to collect liquidity mining rewards
  • Exchange ranking sites give higher placement to exchanges with inflated volume

Detecting wash trading after the fact — in a daily batch job — is too late to prevent damage. By the time manipulated volume reaches your reports, trading decisions have already been made based on false signals.

Real-time detection enables:

  • Immediate flagging before manipulated data propagates to downstream analytics
  • Pattern identification that manual review cannot scale to match
  • Automated exclusion of suspicious volume from official metrics
  • Evidence collection for regulatory reporting

How Streaming SQL Solves This

RisingWave maintains continuously updated views over the trade stream. By tracking counterparty pairs, trade timing, and volume patterns within sliding windows, materialized views can identify the statistical signatures of wash trading — round trips completing within minutes, abnormally high self-dealing ratios, and bursts of offsetting trades — without requiring batch reprocessing.

Building It Step by Step

Step 1: Connect the Data Source

CREATE SOURCE trade_events (
    trade_id        VARCHAR,
    exchange        VARCHAR,
    trading_pair    VARCHAR,
    buyer_address   VARCHAR,
    seller_address  VARCHAR,
    price           NUMERIC,
    quantity        NUMERIC,
    side            VARCHAR,          -- 'buy' or 'sell'
    order_id_buy    VARCHAR,
    order_id_sell   VARCHAR,
    event_time      TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'crypto.trades.raw',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Build the Core Materialized View

-- Detect round-trip trades: same pair of addresses trading back and forth
-- within a 10-minute window (a classic wash trading signature)
CREATE MATERIALIZED VIEW round_trip_candidates AS
SELECT
    t1.exchange,
    t1.trading_pair,
    t1.buyer_address   AS address_a,
    t1.seller_address  AS address_b,
    t1.trade_id        AS trade_id_1,
    t2.trade_id        AS trade_id_2,
    t1.price           AS price_1,
    t2.price           AS price_2,
    t1.quantity        AS quantity_1,
    t2.quantity        AS quantity_2,
    ABS(t1.price - t2.price) / NULLIF(t1.price, 0) AS price_deviation,
    t2.event_time - t1.event_time AS round_trip_duration,
    t1.event_time      AS first_trade_time
FROM trade_events t1
JOIN trade_events t2
    ON  t1.buyer_address  = t2.seller_address
    AND t1.seller_address = t2.buyer_address
    AND t1.trading_pair   = t2.trading_pair
    AND t1.exchange       = t2.exchange
    AND t2.event_time     > t1.event_time
    AND t2.event_time     <= t1.event_time + INTERVAL '10 MINUTES'
    AND ABS(t1.quantity - t2.quantity) / NULLIF(t1.quantity, 0) < 0.05;
-- Volume anomaly detection: addresses generating disproportionate volume
CREATE MATERIALIZED VIEW volume_concentration AS
SELECT
    exchange,
    trading_pair,
    window_start,
    window_end,
    buyer_address                                           AS address,
    SUM(quantity)                                          AS address_volume,
    SUM(SUM(quantity)) OVER (
        PARTITION BY exchange, trading_pair, window_start
    )                                                      AS total_window_volume,
    SUM(quantity) / NULLIF(
        SUM(SUM(quantity)) OVER (
            PARTITION BY exchange, trading_pair, window_start
        ), 0
    ) * 100                                                AS volume_share_pct
FROM TUMBLE(trade_events, event_time, INTERVAL '1 HOUR')
GROUP BY exchange, trading_pair, window_start, window_end, buyer_address;

Step 3: Add Alerts and Detection Logic

-- Flag addresses responsible for >40% of volume in any 1-hour window
-- (high concentration is a wash trading indicator)
CREATE MATERIALIZED VIEW wash_trading_flags AS
SELECT
    exchange,
    trading_pair,
    window_start,
    address,
    address_volume,
    total_window_volume,
    ROUND(volume_share_pct, 2)  AS volume_share_pct,
    'HIGH_VOLUME_CONCENTRATION' AS flag_type
FROM volume_concentration
WHERE volume_share_pct > 40

UNION ALL

SELECT
    rt.exchange,
    rt.trading_pair,
    DATE_TRUNC('hour', rt.first_trade_time) AS window_start,
    rt.address_a                            AS address,
    rt.quantity_1                           AS address_volume,
    NULL                                    AS total_window_volume,
    rt.price_deviation * 100               AS volume_share_pct,
    'ROUND_TRIP_DETECTED'                   AS flag_type
FROM round_trip_candidates rt
WHERE rt.price_deviation < 0.002;  -- price within 0.2% = likely same actor
-- Sink wash trading flags to Kafka for compliance and analytics teams
CREATE SINK wash_trading_sink AS
SELECT * FROM wash_trading_flags
WITH (
    connector = 'kafka',
    topic = 'crypto.surveillance.wash-trading',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Step 4: Query for Real-Time Insights

-- Most active round-trip pairs in the last hour
SELECT
    exchange,
    trading_pair,
    address_a,
    address_b,
    COUNT(*)                                    AS round_trip_count,
    SUM(quantity_1 + quantity_2)               AS total_wash_volume,
    AVG(price_deviation) * 100                 AS avg_price_deviation_pct,
    MIN(round_trip_duration)                   AS fastest_round_trip
FROM round_trip_candidates
WHERE first_trade_time >= NOW() - INTERVAL '1 HOUR'
GROUP BY exchange, trading_pair, address_a, address_b
ORDER BY round_trip_count DESC
LIMIT 10;
-- Pairs with the highest suspicious volume share
SELECT
    exchange,
    trading_pair,
    flag_type,
    COUNT(DISTINCT address) AS flagged_addresses,
    SUM(address_volume)     AS suspicious_volume
FROM wash_trading_flags
GROUP BY exchange, trading_pair, flag_type
ORDER BY suspicious_volume DESC
LIMIT 10;

Comparison: Batch vs Streaming

AspectBatch ETLStreaming SQL (RisingWave)
LatencyMinutes to hoursSub-second
Round-Trip DetectionPost-hoc analysisReal-time pattern matching
Alert SpeedNext batchMilliseconds
Volume CorrectionDelayed reportingLive adjusted metrics
Compliance EvidenceHistorical onlyTimestamped real-time
Coverage ScaleSampled or delayedFull trade stream

FAQ

Q: How do you handle addresses that use multiple wallets to disguise wash trading?

Multi-wallet wash trading is harder to detect with simple counterparty matching alone. Extend the approach by tracking behavioral patterns: wallets that always deposit and withdraw at similar times, wallets funded from the same source address, or clusters of addresses with correlated trading schedules. These signals can be modeled as additional materialized views joining on funding transaction data from an on-chain source.

Q: Can this handle decentralized exchanges where there's no central order book?

Yes. For DEXs, the trade event stream comes from on-chain swap events rather than a centralized matching engine. The source schema adapts to include contract addresses and transaction hashes. Round-trip detection works identically — you're still looking for the same wallet pair exchanging assets within a short time window. AMM mechanics actually make certain wash trading patterns easier to detect because slippage creates a systematic cost that shows up in the price deviation metric.

Q: What threshold separates legitimate market making from wash trading?

There is no single threshold — context matters. A market maker legitimately represents high volume but has offsetting positions with many different counterparties. Wash trading concentrates volume between a small, recurring set of counterparties. The volume_share_pct metric combined with the round-trip detection gives a multi-dimensional view. Tune thresholds based on observed baseline behavior per exchange and trading pair, then flag statistical outliers rather than using fixed cutoffs.

Key Takeaways

  • A self-join on the trade event stream within a time window efficiently identifies round-trip trades between the same address pair
  • Volume concentration analysis catches cases where wash traders don't round-trip but inflate gross volume
  • Combining pattern types in a union view gives compliance teams a unified flagging stream
  • Kafka sinks deliver flags to downstream reporting and alert systems without polling
  • All detection logic is plain SQL — auditable, version-controllable, and adjustable without redeploying a streaming application

Ready to try this? Get started with RisingWave. Join our Slack community.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.