How to Translate Sigma Rules to SQL for Kafka Streams

How to Translate Sigma Rules to SQL for Kafka Streams

Sigma rules are YAML-based threat detection signatures widely used in SIEM systems. Security engineers can translate these rules into SQL manually and run them as materialized views over Kafka event streams using RisingWave. This gives you continuous, per-event detection without waiting for batch index cycles.

What is a Sigma rule and how is it structured?

A Sigma rule is a vendor-neutral YAML file that describes a threat pattern. It can be ported across different SIEM platforms using converters. A typical Sigma rule has these key fields:

  • title: A human-readable name for the detection
  • detection: The core logic, with named sub-selectors (e.g., selection, filter) that list field-value conditions
  • condition: A boolean expression combining the selectors (e.g., selection and not filter)
  • timeframe: An optional window specifying how long activity should be observed (e.g., 5m)
  • logsource: The log category and product this rule targets (e.g., category: authentication, product: linux)

Here is a minimal example:

title: Failed SSH Login Attempt
status: experimental
logsource:
  category: authentication
  product: linux
detection:
  selection:
    EventID: 4625
    LogonType: 3
  condition: selection
  timeframe: 5m
level: medium

The detection block is the translation target. Each selector becomes a set of filter conditions, and the condition field tells you how to combine them with AND, OR, and NOT logic.

Why run Sigma rules as streaming SQL instead of in a SIEM?

Traditional SIEMs operate on a store-then-query model: raw events are ingested, indexed, and only then made available for search and correlation. A detection rule only fires after the indexing pipeline completes, which typically introduces latency of 30 seconds to several minutes depending on ingestion volume and index configuration.

RisingWave evaluates rules differently. It processes events as they arrive from Kafka, maintaining incremental state in materialized views. The moment a matching event hits the stream, the view updates. There is no separate indexing step and no batch scan across historical data. For rules that depend on time windows (such as counting five failed logins in 60 seconds), RisingWave maintains the window state continuously rather than recalculating it on demand.

The practical difference: a brute-force detection that takes 2 to 5 minutes to surface in a SIEM can surface in under a second in RisingWave, measured from the moment the triggering event lands in Kafka.

How do you translate a Sigma rule detection block to SQL?

The translation is a manual process. There is no built-in Sigma-to-SQL converter in RisingWave. A security engineer reads the Sigma YAML and writes the equivalent SQL constructs by hand. The mapping follows a consistent pattern:

Sigma constructSQL equivalent
detection.selection field-value pairsWHERE clause conditions joined by AND
condition: selection1 or selection2WHERE (condition_1) OR (condition_2)
condition: selection and not filterWHERE (selection conditions) AND NOT (filter conditions)
timeframe: 5mTUMBLE(event_time, INTERVAL '5 minutes') or HOP window
Counting thresholdHAVING COUNT(*) >= N
Field contains list of valuesfield IN ('val1', 'val2') or field ILIKE '%pattern%'

The logsource section tells you which Kafka topic or RisingWave source to query. The detection block becomes the body of a CREATE MATERIALIZED VIEW statement with a SELECT over that source.

For rules without a timeframe, you write a simple filtered view:

CREATE MATERIALIZED VIEW ssh_failed_login AS
SELECT
  event_time,
  source_ip,
  user_name,
  host
FROM auth_events
WHERE event_id = 4625
  AND logon_type = 3;

For rules with a timeframe, you introduce a window function and an aggregation:

CREATE MATERIALIZED VIEW ssh_brute_force_alert AS
SELECT
  window_start,
  window_end,
  source_ip,
  COUNT(*) AS failed_attempts
FROM TUMBLE(auth_events, event_time, INTERVAL '5 minutes')
WHERE event_id = 4625
  AND logon_type = 3
GROUP BY window_start, window_end, source_ip
HAVING COUNT(*) >= 5;

Step-by-step: translating a brute force Sigma rule to SQL

Take this realistic Sigma rule for detecting an SSH brute force attack:

title: SSH Brute Force Attack
id: 5cc90652-4cbd-4f72-8c34-3c7f870c5a73
status: stable
description: Detects multiple failed SSH authentication attempts from a single source
logsource:
  category: authentication
  product: linux
  service: sshd
detection:
  selection:
    syslog.program: sshd
    message|contains: 'Failed password'
  condition: selection | count(source.ip) > 10
  timeframe: 2m
level: high
tags:
  - attack.credential_access
  - attack.t1110.001

Reading this rule:

  1. The logsource maps to a Kafka source ingesting sshd authentication logs
  2. The selection filters to sshd events containing "Failed password"
  3. The condition counts distinct source IPs (here we interpret it as: more than 10 failed attempts from the same IP in 2 minutes)
  4. The timeframe: 2m maps to a tumbling window of 2 minutes

First, create the Kafka source in RisingWave:

CREATE SOURCE sshd_logs (
  event_time TIMESTAMPTZ,
  program    VARCHAR,
  message    VARCHAR,
  source_ip  VARCHAR,
  user_name  VARCHAR,
  host       VARCHAR
)
WITH (
  connector = 'kafka',
  topic = 'sshd-auth-logs',
  properties.bootstrap.server = 'kafka:9092',
  scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Then translate the Sigma rule into a materialized view:

CREATE MATERIALIZED VIEW sigma_ssh_brute_force AS
SELECT
  window_start,
  window_end,
  source_ip,
  COUNT(*)          AS failed_attempts,
  MAX(host)         AS target_host,
  MAX(event_time)   AS last_seen
FROM TUMBLE(sshd_logs, event_time, INTERVAL '2 minutes')
WHERE program = 'sshd'
  AND message ILIKE '%Failed password%'
GROUP BY window_start, window_end, source_ip
HAVING COUNT(*) > 10;

Each row in this view represents a 2-minute window where a single source IP had more than 10 failed password attempts. The view stays current automatically. Any downstream system (a Postgres client, a BI tool, or an alerting script) can query this view and see fresh results without issuing time-range scans.

To send alerts downstream, you can create a Kafka sink from the materialized view:

CREATE SINK sigma_ssh_brute_force_alerts
FROM sigma_ssh_brute_force
WITH (
  connector = 'kafka',
  topic = 'security-alerts',
  properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

What Sigma rule categories translate most cleanly to SQL?

Not all Sigma rule types are equally straightforward to translate. Here is a practical ranking:

Easy to translate

  • Authentication failures (EventID 4625, sshd failed password, Kerberos pre-auth failures): These have well-structured fields and map directly to WHERE clauses. Threshold-based rules add a window and HAVING.
  • Process execution (Windows Security EventID 4688, Sysmon EventID 1): Parent-child process relationships, command line patterns, and image paths are standard columns. Field-level matching translates directly.
  • Network connections (Sysmon EventID 3, firewall deny logs): Source IP, destination IP, port, and protocol are structured fields. Geolocation joins can be added as stream-table joins against a reference table.

Moderate complexity

  • Lateral movement sequences (pass-the-hash, remote service creation, WMI exec): These rules correlate multiple event types across a time window. You can model them in RisingWave using session windows or by joining two materialized views, but the translation requires understanding the intended sequence, not just the individual field conditions.
  • Registry modification patterns (Sysmon EventID 13): Straightforward if the registry path is an exact match; more complex when the rule uses substring matching on deep key hierarchies.

Harder to translate

  • Multi-hop graph correlations: Rules that track an attacker moving from host A to host B to host C require maintaining path state. This is possible in RisingWave using multiple joined views, but each hop adds complexity.
  • Aggregations across heterogeneous log types: If a single Sigma rule combines fields from auth logs, process logs, and network logs in one condition, you need to join multiple Kafka sources first, then apply the filter.

What are the limitations of SQL-based Sigma translation?

Being honest about limitations helps you decide where this approach fits.

Unstructured message fields require preprocessing. Many Sigma rules match against message or CommandLine fields using contains or re (regex) operators. These work in SQL via ILIKE or the regexp_match function, but they require the raw string to be available as a column. If your log pipeline stores events as opaque blobs or heavily nested JSON without field extraction, you need a preprocessing step (a Flink job or a log shipper with field extraction) before RisingWave can evaluate the condition cleanly.

Complex regex patterns have performance implications. A Sigma rule with a re: condition using a long alternation regex (matching 50 known malicious command-line patterns, for example) will evaluate that regex against every single event. At high Kafka throughput (millions of events per second), this adds CPU cost. Materializing the regex match results in a staging view or pre-filtering by a faster indexed condition can mitigate this.

There is no automated translation. Every rule requires a human to read the YAML, understand the intent, and write the SQL. A rule that seems simple in Sigma syntax may have subtle semantics (for example, a null check or a list expansion) that you need to handle explicitly in SQL. Maintaining a library of translated rules requires ongoing engineering work as the original Sigma rules evolve.

Time synchronization matters. Sigma rules with timeframe fields assume events arrive close to real time. If your Kafka topic has significant lag (events arrive minutes after they occur), a 2-minute tumbling window may miss cross-event correlations. You need to ensure your Kafka pipeline delivers events with low latency, or switch to processing time windows with the understanding that the window boundaries are approximate.

Despite these constraints, the approach covers a large fraction of high-value detections: authentication anomalies, process execution monitoring, and network connection analysis are all well-served by structured Kafka events and SQL materialized views.

FAQ

Does RisingWave have a built-in tool to automatically convert Sigma rules to SQL?

No. RisingWave does not include a Sigma converter. The translation is a manual process: a security engineer reads each Sigma rule, understands the detection logic, and writes the equivalent SQL. There are open-source community tools (like sigma-cli with backend plugins) that attempt partial automation for some SIEM targets, but there is no official RisingWave backend for those tools as of this writing.

Can RisingWave query historical Kafka data, or only new events?

RisingWave can be configured to scan from the earliest available offset when creating a Kafka source (scan.startup.mode = 'earliest'). This lets a new materialized view backfill against historical data already in Kafka. After the initial backfill, the view updates continuously as new events arrive. The backfill is bounded by your Kafka retention window.

How do I handle Sigma rules that reference Windows Event IDs alongside Linux syslog fields?

These are separate log sources. You create separate Kafka sources (or separate topics) for Windows event logs and Linux syslog, create materialized views over each, and join them if the rule requires cross-source correlation. In practice, most Sigma rules target a single logsource and do not require cross-platform joins.

What happens if a Kafka partition lags behind? Will RisingWave miss detections?

RisingWave processes events in event-time order per partition. If a partition lags, its events will be processed when they arrive, and time-windowed aggregations will be updated accordingly. For rules that depend on tight time windows, a lagging partition may delay when an alert surfaces, but events are not silently dropped. You should monitor Kafka consumer lag as part of your operational setup to ensure detection latency stays within acceptable bounds.

Get started

Translating Sigma rules into RisingWave SQL materialized views gives security teams continuous, per-event detection over Kafka streams without the index-wait overhead of traditional SIEMs. The translation is manual, but the mapping is consistent and learnable. Start with your highest-value authentication and process execution rules, which have structured fields and translate directly.

To try it yourself, get started with RisingWave or explore the RisingWave documentation to connect your first Kafka source.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.