Streaming SIEM: Process Security Events in Real Time

A traditional Security Information and Event Management (SIEM) platform works like a search engine for logs. Events arrive, get indexed, and sit waiting for an analyst to query them. Detection rules run on a schedule, typically every minute or every five minutes. That gap is where attackers move laterally, escalate privileges, and exfiltrate data before any alert fires.

A streaming SIEM inverts this model. Instead of querying historical data on demand, it maintains continuously updated views of security patterns. When a detection condition is satisfied, an alert fires immediately -- no waiting for the next scheduled rule evaluation.

RisingWave, an open-source PostgreSQL-compatible streaming database, provides the primitives to build a streaming SIEM: Kafka ingestion, SQL-defined detection logic, materialized views for real-time aggregation, and a PostgreSQL-compatible serving layer for analyst queries.

What events does a streaming SIEM process?

A production streaming SIEM correlates events from multiple sources simultaneously:

Authentication events: login attempts, MFA challenges, session creation and termination
Firewall and network logs: connection attempts, port scans, blocked traffic
Endpoint telemetry: process creation, file modifications, registry changes
Application logs: API calls, error rates, unusual request patterns
Identity provider events: role changes, group membership modifications, privilege grants

Each source becomes a Kafka topic. RisingWave maintains a source connector for each topic and correlates them through materialized view joins.

How to build a streaming SIEM with RisingWave

Step 1: Define event sources

CREATE SOURCE auth_events (
    event_id     VARCHAR,
    user_id      VARCHAR,
    source_ip    VARCHAR,
    event_type   VARCHAR,  -- login_success, login_failure, mfa_bypass
    event_time   TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'auth_events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

CREATE SOURCE firewall_logs (
    log_id       VARCHAR,
    src_ip       VARCHAR,
    dst_ip       VARCHAR,
    dst_port     INT,
    action       VARCHAR,  -- allow, deny, drop
    bytes        BIGINT,
    log_time     TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'firewall_logs',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

CREATE SOURCE process_events (
    event_id     VARCHAR,
    host_id      VARCHAR,
    parent_pid   INT,
    child_pid    INT,
    process_name VARCHAR,
    command_line VARCHAR,
    event_time   TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'process_events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Detect brute-force attacks

A burst of authentication failures from one IP within a short window is the signature of a brute-force or password spray attack.

CREATE MATERIALIZED VIEW brute_force_detections AS
SELECT
    source_ip,
    window_start,
    window_end,
    COUNT(*) FILTER (WHERE event_type = 'login_failure') AS failures,
    COUNT(DISTINCT user_id)                               AS targeted_accounts
FROM TUMBLE(auth_events, event_time, INTERVAL '5 minutes')
GROUP BY source_ip, window_start, window_end
HAVING COUNT(*) FILTER (WHERE event_type = 'login_failure') > 20;

Step 3: Detect lateral movement via process chains

Lateral movement often involves spawning unusual child processes from common parent processes like PowerShell or cmd.exe.

CREATE MATERIALIZED VIEW suspicious_process_chains AS
SELECT
    e.host_id,
    e.parent_pid,
    e.child_pid,
    e.process_name,
    e.command_line,
    e.event_time
FROM process_events e
WHERE
    e.process_name IN ('cmd.exe', 'powershell.exe', 'bash', 'sh')
    AND (
        e.command_line LIKE '%net user%'
        OR e.command_line LIKE '%whoami%'
        OR e.command_line LIKE '%net localgroup%'
        OR e.command_line LIKE '%mimikatz%'
        OR e.command_line LIKE '%procdump%'
    );

Step 4: Correlate network and auth events for multi-stage detection

A login success from an IP that was previously blocked by the firewall is a high-confidence indicator of compromise.

CREATE MATERIALIZED VIEW recently_blocked_ips AS
SELECT DISTINCT src_ip, MAX(log_time) AS last_blocked
FROM firewall_logs
WHERE action IN ('deny', 'drop')
GROUP BY src_ip
HAVING MAX(log_time) > NOW() - INTERVAL '1 hour';

CREATE MATERIALIZED VIEW auth_from_blocked_ip AS
SELECT
    a.user_id,
    a.source_ip,
    a.event_time   AS auth_time,
    b.last_blocked AS block_time,
    a.event_type
FROM auth_events a
JOIN recently_blocked_ips b ON a.source_ip = b.src_ip
WHERE a.event_type = 'login_success';

Step 5: Aggregate alerts into a unified alert stream

CREATE MATERIALIZED VIEW siem_alerts AS
SELECT
    'brute_force'                AS alert_type,
    source_ip                   AS entity,
    window_end                  AS detected_at,
    failures || ' failures in 5 min from ' || source_ip AS description
FROM brute_force_detections

UNION ALL

SELECT
    'suspicious_process'        AS alert_type,
    host_id                     AS entity,
    event_time                  AS detected_at,
    'Suspicious command: ' || command_line AS description
FROM suspicious_process_chains

UNION ALL

SELECT
    'auth_from_blocked_ip'      AS alert_type,
    user_id                     AS entity,
    auth_time                   AS detected_at,
    'Login from previously blocked IP ' || source_ip AS description
FROM auth_from_blocked_ip;

Step 6: Route alerts to a response platform

CREATE SINK siem_alerts_sink
FROM siem_alerts
WITH (
    connector = 'kafka',
    topic = 'siem_alerts',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Security orchestration tools like PagerDuty, Splunk SOAR, or a custom webhook consumer can subscribe to the siem_alerts Kafka topic to trigger automated response playbooks.

How does this compare to Splunk or QRadar?

Splunk and QRadar are indexing platforms. They ingest logs, build inverted indexes for text search, and let analysts write correlation rules that run as scheduled searches. The minimum detection latency is the rule scheduling interval, typically 1 to 5 minutes.

RisingWave is a streaming processor. Detection views evaluate continuously. When the 21st login failure arrives from an IP within a 5-minute window, the brute_force_detections view updates immediately. There is no scheduling interval.

The trade-off: Splunk is better at ad-hoc log search across months of history. RisingWave is better at real-time pattern detection and multi-stream correlation. Most security programs benefit from running both: RisingWave for real-time detection, and a log archive (S3 plus Athena or Iceberg plus Trino) for forensic investigation.

Why not use Apache Flink for the streaming SIEM?

Apache Flink can build the same streaming SIEM, but it imposes a Java or Scala tax on every detection rule. Your security team writes YAML or SQL detection rules today; rewriting them as Flink DataStream jobs requires a JVM engineer. Flink also requires a separate serving database for analysts to query results. RisingWave serves results directly over its PostgreSQL interface.

When a detection threshold changes, Flink requires a full job redeployment with state drain. RisingWave accepts an ALTER MATERIALIZED VIEW statement that takes effect on the next checkpoint. Security detection rules change frequently; the operational cost of Flink redeployments compounds quickly.

RisingWave is an open-source PostgreSQL-compatible streaming database. Explore the cybersecurity threat detection landing page for deployment guides and architecture diagrams.