Real-Time Features for Account Takeover Detection

Account takeover (ATO) is the fraud vector that breaks most traditional detection systems -- not because it is technically sophisticated, but because it uses entirely valid credentials. The attacker is not forging a card number or synthesizing a fake identity. They have a real username and a real password. From the perspective of your authentication service, the login looks legitimate.

This is why the feature engineering for ATO detection is fundamentally different from payment fraud detection, and why batch pipelines fail completely. By the time your nightly job runs, the attacker has already changed the recovery email, drained the wallet, and moved on.

This article walks through how to build a real-time ATO feature layer using streaming SQL -- materialized views that compute behavioral signals continuously from auth events, session data, device fingerprints, and profile change logs.

Why ATO Requires a Different Feature Strategy Than Payment Fraud

Payment fraud detection leans heavily on transaction-level signals: card BIN lookup, merchant category, transaction amount, velocity across card numbers, shipping address mismatch. These signals are powerful because each transaction is self-contained and introduces an artifact the attacker must forge.

ATO introduces none of those artifacts. The attacker presents correct credentials. There is no suspicious merchant, no unusual amount, no mismatched address -- at least not in the first few minutes. What distinguishes an ATO login from a legitimate one is behavioral context:

The device fingerprint has never been seen for this account
The login originates from a country where this user has never been
There have been 47 login attempts across 12 IPs in the past hour
Within three minutes of login, the recovery email and phone number changed

These are not transaction signals. They are identity-continuity signals. They require memory of past sessions, awareness of the user's normal device set, and the ability to detect anomalous sequences of events as they unfold. Batch systems cannot provide any of these in time to matter.

A 2023 report by Sift found that ATO attacks increase in velocity after credential breach events -- attackers typically attempt to exploit stolen credentials within hours of acquisition, and the most damaging exploitation (account mutation, fund transfer) happens within the first 10 minutes of access. Your detection layer must operate in the same time horizon.

The ATO Attack Lifecycle

Understanding the attack lifecycle shapes which features matter most and when.

Stage 1: Credential stuffing. The attacker feeds a list of leaked username/password pairs (sourced from dark web breach dumps or purchased from credential marketplaces) into an automated tool. Requests come from rotating IPs and device fingerprints to evade IP-based blocks. Success rate is typically 0.1-2% but scale makes up the difference -- a list of 10 million credentials yields thousands of valid logins.

Stage 2: Validation and prioritization. Once a valid login is confirmed, automated tooling checks account value: balance, stored payment methods, loyalty points, access to linked accounts. High-value accounts are queued for human operators or more careful automated exploitation.

Stage 3: Account mutation. The attacker secures control by changing the recovery email, phone number (bypassing SMS-based 2FA), and sometimes the password. This is the point of no return -- a successful mutation makes account recovery by the legitimate user difficult and slow.

Stage 4: Cashout. Funds are transferred, gift cards are purchased, stored credentials are harvested, or the account is sold. In some ATO campaigns targeting fintech, this entire stage-to-stage lifecycle runs in under five minutes.

The detection window is narrow. Signals at Stage 1 (login velocity, IP diversity) let you block before access is granted. Signals at Stage 3 (rapid profile mutation post-login) let you interrupt before the account is lost. Both require real-time feature freshness.

Key Behavioral Signals for ATO Detection

Device Fingerprint Change

A user logging in from a device they have never used before is not inherently suspicious. People buy new phones. But a first-seen device combined with an unusual location or time, followed immediately by profile changes, is a high-confidence ATO indicator.

The signal requires a persistent record of each user's known device set. A materialized view that tracks device history over a rolling 90-day window serves this purpose -- long enough to include infrequent devices (secondary laptops, travel devices), short enough to expire genuinely abandoned fingerprints.

Device fingerprint typically includes browser user agent, screen resolution, installed fonts or plugins, and sometimes hardware identifiers. The fingerprint alone is not authentication, but its presence or absence in the user's history is a strong contextual signal.

Impossible Travel

Legitimate users do not log in from São Paulo and then from Seoul 25 minutes later. When the time delta between two successful logins from the same account is shorter than the minimum possible travel time between the two geolocated IPs, you have either a VPN/proxy (common among attackers attempting to mask origin) or a credential shared across multiple actors.

The detection requires comparing the current login location against the most recent login location and computing whether the implied travel speed exceeds a threshold -- typically 800-1000 km/h is used as the "physical impossibility" cutoff, with VPN probability scoring applied below that threshold.

Credential stuffing attacks are volume operations. Even with rotating IPs, the per-account signal is detectable: multiple login attempts across different IPs and device fingerprints within a short window. This is especially true at Stage 1, where the same user_id may be attempted across many attacker-controlled sessions simultaneously.

Features here include: login attempt count, unique IP count, unique device count, and failure-to-success ratio -- all within rolling windows of 1 hour or less.

This is one of the most reliable ATO-specific signals and the most time-critical. Legitimate users do not change their email, phone number, and password within three minutes of logging in. Automated ATO tooling does exactly this, moving as fast as the API allows to secure control before the legitimate user notices.

A materialized view over session events that tracks sensitive action counts within a 5-minute window per session catches this pattern with very high precision. Withdrawals, payment method additions, and external link actions within seconds of login also fit this pattern.

Data Sources and Event Schema

The feature layer consumes four primary event streams:

auth_events -- Login attempts, successes, failures, and MFA events. Required fields: user_id, event_type (login_success, login_failure, mfa_bypass, etc.), event_time, ip_address, device_fingerprint, ip_country, ip_city.

session_events -- Actions taken within an authenticated session. Required fields: session_id, user_id, action_type, event_time. Key action types: password_change, email_change, phone_change, payment_method_add, withdrawal, external_link_click.

profile_change_events -- Durable record of identity mutations for audit and feature purposes. Required fields: user_id, field_changed, changed_at, ip_address, session_id.

device_fingerprint_events -- Detailed fingerprint breakdown for richer device signals. Required fields: device_fingerprint, user_agent, screen_resolution, timezone, first_seen, last_seen, ip_address.

These streams should be published to Kafka (or equivalent) and consumed by the streaming SQL layer. In RisingWave, you define Kafka sources once and then build arbitrarily complex materialized views on top without managing consumer offsets or state backends manually.

SQL Implementation

Device History Materialized View

This view maintains a per-user, per-device fingerprint record over a 90-day rolling window. Querying this view for a given user_id and device_fingerprint at login time tells you whether the device is known, how many times it has been used, and from which countries.

CREATE MATERIALIZED VIEW user_device_history AS
SELECT
    user_id,
    device_fingerprint,
    COUNT(*) AS login_count,
    MIN(event_time) AS first_seen,
    MAX(event_time) AS last_seen,
    array_agg(DISTINCT ip_country ORDER BY ip_country) AS countries_used
FROM auth_events
WHERE event_type = 'login_success'
  AND event_time >= NOW() - INTERVAL '90 days'
GROUP BY user_id, device_fingerprint;

At inference time, a LEFT JOIN against this view with a WHERE login_count IS NULL predicate identifies first-seen devices. Combined with the current login's IP country and time of day, you can construct a device novelty risk sub-score.

This view computes rolling 1-hour velocity metrics per user. The combination of unique IPs and unique devices in a short window is a strong credential stuffing indicator even when all attempts are against the same user_id.

CREATE MATERIALIZED VIEW login_velocity AS
SELECT
    user_id,
    COUNT(*) AS login_attempts_1h,
    COUNT(DISTINCT ip_address) AS unique_ips_1h,
    COUNT(DISTINCT device_fingerprint) AS unique_devices_1h,
    COUNT(*) FILTER (WHERE event_type = 'login_failure') AS failures_1h
FROM auth_events
WHERE event_time >= NOW() - INTERVAL '1 hour'
GROUP BY user_id;

A user with unique_ips_1h > 5 and failures_1h > 10 before any success is a near-certain credential stuffing target. This signal is actionable before the attacker succeeds, enabling pre-emptive step-up authentication (FIDO2 passkey challenge, NIST SP 800-63B compliant re-verification) or temporary rate limiting.

This view is the high-precision ATO signal. It operates on a 5-minute rolling window per session. The sensitive_changes_5min column -- counting password, email, and phone changes -- is the key field. A value greater than 0 within 2-3 minutes of session creation warrants immediate challenge or suspension.

CREATE MATERIALIZED VIEW post_login_actions AS
SELECT
    session_id,
    user_id,
    COUNT(*) AS actions_5min,
    COUNT(*) FILTER (WHERE action_type IN ('password_change', 'email_change', 'phone_change')) AS sensitive_changes_5min,
    COUNT(*) FILTER (WHERE action_type = 'withdrawal') AS withdrawals_5min
FROM session_events
WHERE event_time >= NOW() - INTERVAL '5 minutes'
GROUP BY session_id, user_id;

Impossible Travel Detection

This view computes the time delta between a user's current login location and their previous login location and flags physically impossible transitions. The LEAD and LAG window functions over the auth event stream make this straightforward in streaming SQL.

CREATE MATERIALIZED VIEW travel_anomaly AS
WITH ordered_logins AS (
    SELECT
        user_id,
        event_time,
        ip_country,
        ip_city,
        LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event_time,
        LAG(ip_country) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_country
    FROM auth_events
    WHERE event_type = 'login_success'
)
SELECT
    user_id,
    event_time,
    ip_country,
    prev_country,
    EXTRACT(EPOCH FROM (event_time - prev_event_time)) / 3600.0 AS hours_since_last_login,
    CASE
        WHEN ip_country != prev_country
         AND EXTRACT(EPOCH FROM (event_time - prev_event_time)) < 3600
        THEN true
        ELSE false
    END AS impossible_travel_flag
FROM ordered_logins
WHERE prev_country IS NOT NULL;

This example uses a 1-hour threshold for cross-country logins. In production, you would incorporate great-circle distance calculations using stored geolocation coordinates rather than country-level comparison alone. The country-level version is presented here for clarity.

Composite Risk Score Assembly

Individual signals have limited precision in isolation. A first-seen device is not suspicious on its own. Impossible travel may indicate a VPN. Post-login sensitivity changes within 5 minutes is a strong signal but has edge cases (users who immediately update security settings after a required password reset). The composite risk score integrates signals with weights calibrated against historical labeled ATO events.

CREATE MATERIALIZED VIEW ato_risk_score AS
SELECT
    v.user_id,
    -- Velocity sub-score
    LEAST(1.0, v.unique_ips_1h / 5.0) * 30 AS velocity_score,
    -- Device novelty sub-score (1 if device unknown, 0.3 if known from single country)
    CASE WHEN dh.login_count IS NULL THEN 30
         WHEN dh.login_count < 3 THEN 15
         ELSE 0 END AS device_novelty_score,
    -- Post-login mutation sub-score
    LEAST(1.0, COALESCE(pa.sensitive_changes_5min, 0) / 2.0) * 30 AS mutation_score,
    -- Impossible travel sub-score
    CASE WHEN ta.impossible_travel_flag THEN 10 ELSE 0 END AS travel_score,
    -- Total composite score (0-100)
    LEAST(100,
        LEAST(1.0, v.unique_ips_1h / 5.0) * 30
        + CASE WHEN dh.login_count IS NULL THEN 30
               WHEN dh.login_count < 3 THEN 15
               ELSE 0 END
        + LEAST(1.0, COALESCE(pa.sensitive_changes_5min, 0) / 2.0) * 30
        + CASE WHEN ta.impossible_travel_flag THEN 10 ELSE 0 END
    ) AS total_risk_score
FROM login_velocity v
LEFT JOIN user_device_history dh
    ON v.user_id = dh.user_id
LEFT JOIN post_login_actions pa
    ON v.user_id = pa.user_id
LEFT JOIN travel_anomaly ta
    ON v.user_id = ta.user_id
    AND ta.event_time >= NOW() - INTERVAL '1 hour';

Sessions scoring above 70 warrant real-time interruption -- a step-up authentication challenge (FIDO2 passkey, TOTP, or out-of-band SMS to the original registered phone). Sessions scoring 40-70 warrant silent flagging and session monitoring escalation. Below 40 is baseline noise.

Thresholds should be tuned using historical labeled data. The weights above are illustrative starting points, not production-calibrated values.

False Positive Management

ATO detection systems that trigger too aggressively destroy user experience and erode trust in the signal. Two categories of legitimate behavior generate the most false positives.

Business travel and location jumps. A sales engineer who flies from New York to Tokyo will trigger impossible travel flags. The mitigation strategy is to use confirmed travel signals where available (VIP users, enterprise accounts may have calendar integrations) and to weight the impossible travel signal lower for users with a history of multi-country logins (high countries_used cardinality in user_device_history). Users who have logged in from 10+ countries over 90 days should have the travel anomaly threshold relaxed.

New device purchases. A first-seen device is not suspicious when the user just bought a new phone. The challenge is that you cannot know this at detection time. A reliable mitigation is to combine device novelty with other signals rather than treating it as standalone: a new device in the same country, at a normal time, with no post-login mutations, is low risk. The composite score handles this naturally when weights are set correctly.

Required security actions. Some platforms force a password change on first login after a breach notification or policy update. A user who changes their password immediately after login in this context is following instructions, not attacking their own account. Profile change events should carry a session context flag (is_forced_reset) that the feature layer can use to suppress the sensitive_changes_5min signal in this case.

Feedback loops matter. When a security analyst marks a flagged session as a false positive, that label should feed back into your model calibration pipeline. Over time, account-level features (tenure, transaction history, engagement pattern) can further suppress noise for established users while maintaining sensitivity for new and dormant accounts -- which are disproportionately targeted by ATO campaigns.

Frequently Asked Questions

How does this approach handle SIM swap attacks?

SIM swap targets the phone number used for SMS-based 2FA. The attacker convinces a carrier to port the victim's number to an attacker-controlled SIM, then uses the number to bypass 2FA and reset credentials. From a feature perspective, SIM swap preceding an ATO will often appear as a successful login from a new device shortly after a phone number change on the account -- which the profile_change_events stream captures. Cross-referencing carrier events (if available via telco partnership APIs) with device novelty and post-login mutation signals strengthens detection. The FIDO2/passkey authentication stack specified in NIST SP 800-63B Level 2 eliminates SIM swap as an attack vector entirely, since credentials are device-bound and cannot be redirected via carrier port.

What latency should I expect from these materialized views?

With a streaming database like RisingWave, materialized views update incrementally as events arrive. In practice, the login velocity and post-login action features reflect events within 1-3 seconds of ingestion. Device history updates lag slightly more due to aggregation over a 90-day window, but the write path (does this user have any history?) is a simple lookup that responds in milliseconds. End-to-end feature freshness from event to inference is typically under 5 seconds.

How do I serve these features at login time?

The materialized views are queryable via standard PostgreSQL-compatible SQL. Your authentication service queries user_device_history, login_velocity, and travel_anomaly as part of the login flow -- either inline (for hard blocks) or asynchronously via a shadow call (for logging and model input). Post-login features (post_login_actions) are evaluated on a short polling loop or via webhook trigger from the session service. The feature store pattern here is pull-based: the serving layer queries the materialized views directly rather than materializing to Redis, which eliminates a synchronization hop and reduces operational complexity.

Can this detect ATO via API credential theft, not just web login?

Yes, with adjustments to the event schema. API-based ATO (stolen OAuth tokens, API keys) generates session events without a traditional login flow. The post-login action velocity pattern still applies -- unusual API call sequences (bulk data export, permission escalation, webhook registration to attacker-controlled URLs) within a short window after token issuance are strong signals. Extend session_events to include API sessions and add action types specific to your API surface.

What about account sharing (e.g., families sharing a streaming account)?

Shared accounts are a known source of false positives for device novelty and multi-location signals. If your platform permits account sharing, add a known_shared_account flag to the user profile and relax device novelty thresholds for flagged accounts. Alternatively, use sub-profile or per-user session tracking to distinguish between authorized users on a shared account.

Summary

ATO detection fails when teams apply payment fraud feature logic to a fundamentally different problem. The attacker has valid credentials. The only signals available are behavioral -- and they expire within minutes.

The streaming SQL approach outlined here gives security engineering teams a pragmatic path to real-time ATO features without the operational complexity of a traditional Flink cluster or a custom Kafka Streams application. Materialized views over auth, session, and device fingerprint streams continuously maintain the device history, velocity, and post-login mutation signals that ATO patterns generate. Composite scoring integrates them with configurable thresholds. False positive management keeps the friction acceptable for legitimate users.

The attack window is narrow. The detection window must be narrower.