Command-and-Control Beaconing Detection with SQL

Command-and-Control Beaconing Detection with SQL

·

12 min read

Command-and-Control Beaconing Detection with SQL

C2 beaconing is one of the most consistent behavioral signatures in modern malware. An infected host makes periodic, automated connections to attacker infrastructure at nearly fixed intervals, creating a statistical pattern that stands out from organic human-driven traffic. RisingWave detects this regularity signature using streaming SQL over live NetFlow data, computing inter-connection timing statistics across millions of flows without custom application code.

What Is C2 Beaconing and Why Is It Detectable?

Command-and-control beaconing is the communication pattern used by malware implants to maintain persistent contact with attacker infrastructure. After initial compromise, an implant calls home at a configured interval to check for new commands, report status, or exfiltrate data in small chunks. The critical insight for defenders is that this behavior is fundamentally different from human-generated network activity.

Human network usage is irregular and bursty: a user opens a browser, reads for a few minutes, clicks a link, and so on. Malware implants, by contrast, operate on a timer. The timer fires every 60 seconds, every 300 seconds, or every 3600 seconds, and the implant sends a small heartbeat packet. The next interval fires, another packet goes out. This mechanical regularity is the fingerprint.

The Key Statistical Characteristics

Beacon interval regularity. The most important property is the consistency of inter-connection timing. If a host connects to the same destination at timestamps T, T+300, T+600, T+900, the standard deviation of the intervals is near zero. Real user traffic to the same destination might have inter-request gaps of 2 seconds, 45 seconds, 180 seconds, and 3 seconds depending on browsing behavior.

Low jitter. Mature C2 frameworks like Cobalt Strike, Metasploit, and Empire include configurable jitter as an evasion mechanism. Jitter adds a random percentage to the beacon interval, so a 300-second beacon with 20% jitter fires anywhere between 240 and 360 seconds. Even with jitter, the standard deviation of intervals remains low relative to the mean interval. A jitter setting of 20% on a 300-second beacon produces a coefficient of variation around 0.12, far below what organic traffic produces.

Small consistent payload sizes. Heartbeat traffic is designed to be minimal, typically under 1 kilobyte. The implant sends just enough data to prove it is alive and receive a small command block. This contrasts with legitimate application traffic, which varies substantially in payload size depending on content being transferred.

Common beacon intervals. The most frequently observed intervals in incident response investigations are 60 seconds, 300 seconds (5 minutes), 600 seconds (10 minutes), and 3600 seconds (1 hour). Longer intervals reduce network noise and evade anomaly detection systems tuned to high connection rates, but they increase the attacker's response latency for hands-on-keyboard operations.

Fixed destination. Classic beaconing goes to a single IP or domain that resolves to a stable set of addresses. Domain generation algorithms (DGAs) introduce destination variation to evade blocklists, but even DGA-based beacons maintain the timing regularity and small payload characteristics. The SQL approach here targets the timing signature, which survives DGA rotation.

How Do You Ingest NetFlow Data into RisingWave?

Network flow telemetry arrives as NetFlow v9, IPFIX, or sFlow records exported by routers, switches, or dedicated flow probes. Before computing timing statistics, the raw flow data needs to be ingested into a streaming source that RisingWave can consume.

The standard pattern is to use a NetFlow collector (such as nfdump, Logstash with the netflow codec, or a dedicated collector like GoFlow2) to parse and normalize raw UDP datagrams into structured records, then publish them to a Kafka topic. RisingWave connects to Kafka as a streaming source and maintains incrementally updated materialized views over the incoming data.

An important note on timing: NetFlow v9 and IPFIX use active and inactive flow timeouts. A flow active timeout of 30 to 60 seconds means that long-lived connections are exported as multiple records during the timeout interval, not as a single record when the connection closes. Depending on the collector configuration, the flow_time field represents either the flow start time or the export time. For beaconing detection against short-lived repeated connections, using the flow start or export time is accurate enough. NetFlow data typically arrives 30 to 60 seconds after the underlying connection due to export interval and collector processing, so detection latency is in that range, not milliseconds. sFlow, which samples packets at wire speed, can provide sub-second latency for sampled flows.

Create the Kafka source in RisingWave:

CREATE SOURCE net_flows (
    flow_id     VARCHAR,
    src_ip      VARCHAR,
    dst_ip      VARCHAR,
    dst_port    INT,
    bytes       BIGINT,
    flow_time   TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'net_flows',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

This source definition gives RisingWave a continuous stream of flow records to process.

How Does RisingWave Compute Inter-Flow Intervals?

Beaconing detection requires comparing consecutive connection timestamps for each source and destination pair. The LAG() window function retrieves the previous flow timestamp within a partition, making interval computation straightforward in SQL.

The first materialized view computes the interval in seconds between each flow and the preceding flow from the same source to the same destination:

CREATE MATERIALIZED VIEW flow_intervals AS
SELECT
    flow_id,
    src_ip,
    dst_ip,
    dst_port,
    bytes,
    flow_time,
    EXTRACT(EPOCH FROM (
        flow_time - LAG(flow_time) OVER (
            PARTITION BY src_ip, dst_ip
            ORDER BY flow_time
        )
    )) AS interval_seconds
FROM net_flows;

The LAG(flow_time) OVER (PARTITION BY src_ip, dst_ip ORDER BY flow_time) expression looks back at the immediately preceding row for the same src/dst pair, ordered by timestamp. EXTRACT(EPOCH FROM ...) converts the interval to a numeric value in seconds for statistical aggregation downstream.

The first row for any new src/dst pair produces a NULL interval (no preceding row exists). Downstream aggregation handles this by filtering NULLs.

RisingWave maintains this materialized view incrementally. As new rows arrive from the Kafka source, RisingWave computes the LAG value for the new row against the prior stored state, updates the view, and propagates changes to downstream views. No batch job or periodic refresh is needed.

How Do You Aggregate Interval Statistics to Flag Beaconing?

The second materialized view aggregates interval statistics per src/dst pair over a rolling time window. The key metric is the standard deviation of inter-flow intervals: a low standard deviation means the connections are arriving at a highly regular cadence, which is the beaconing signature.

CREATE MATERIALIZED VIEW beacon_candidates AS
SELECT
    src_ip,
    dst_ip,
    dst_port,
    COUNT(*)                            AS flow_count,
    AVG(interval_seconds)               AS avg_interval_sec,
    STDDEV(interval_seconds)            AS stddev_interval_sec,
    AVG(bytes)                          AS avg_bytes,
    MIN(flow_time)                      AS window_start,
    MAX(flow_time)                      AS window_end
FROM flow_intervals
WHERE
    interval_seconds IS NOT NULL
    AND flow_time >= NOW() - INTERVAL '1 hour'
GROUP BY src_ip, dst_ip, dst_port;

This view computes, for each src/dst/port combination observed in the past hour:

  • flow_count: total number of flows. At least 10 to 12 flows are needed to produce a statistically meaningful standard deviation and rule out coincidental timing matches.
  • avg_interval_sec: the mean inter-flow interval, which identifies the beacon period (60, 300, 600, or 3600 seconds in typical cases).
  • stddev_interval_sec: the standard deviation of intervals. Values below 30 seconds for a 300-second beacon indicate strong periodicity.
  • avg_bytes: mean bytes per flow. Heartbeat beacons are typically under 1 kilobyte.

What Are the Alert Thresholds?

The alert view applies threshold conditions to the beacon_candidates aggregates. The thresholds balance sensitivity against false positive rate. The most effective combination is a strict standard deviation cap, a byte size limit to exclude legitimate application traffic, and a minimum flow count to ensure statistical confidence.

CREATE MATERIALIZED VIEW c2_beacon_alerts AS
SELECT
    src_ip,
    dst_ip,
    dst_port,
    flow_count,
    ROUND(avg_interval_sec::NUMERIC, 1)     AS avg_interval_sec,
    ROUND(stddev_interval_sec::NUMERIC, 2)  AS stddev_interval_sec,
    ROUND(avg_bytes::NUMERIC, 0)            AS avg_bytes,
    window_start,
    window_end,
    NOW()                                   AS detected_at
FROM beacon_candidates
WHERE
    stddev_interval_sec < 30
    AND avg_bytes < 1024
    AND flow_count >= 10;

The stddev_interval_sec < 30 threshold catches beacons with jitter up to roughly 10% of a 300-second interval. For 60-second beacons, a tighter threshold of 10 seconds may be appropriate. For 3600-second beacons, the threshold can be relaxed to 120 seconds. Tuning the threshold per expected beacon interval produces fewer false positives from legitimate scheduled tasks.

The avg_bytes < 1024 constraint limits alerts to traffic consistent with heartbeat payloads. This excludes backup jobs, software update checks that download large manifests, and monitoring agents that upload metrics in bulk. Most Cobalt Strike and Empire beacon heartbeats fall well below 500 bytes.

The flow_count >= 10 requirement ensures at least 10 consecutive flows are observed before alerting, preventing single coincidental timing matches from generating noise.

Enriching Alerts with Threat Intelligence

The raw alert view can be joined against a threat intelligence feed or known-bad IP table to prioritize alerts. RisingWave supports joining materialized views against other sources:

CREATE MATERIALIZED VIEW c2_beacon_alerts_enriched AS
SELECT
    a.*,
    t.threat_category,
    t.confidence_score
FROM c2_beacon_alerts a
LEFT JOIN threat_intel t
    ON a.dst_ip = t.indicator
WHERE
    t.indicator IS NOT NULL
    OR a.stddev_interval_sec < 5;

This surfaces two classes of high-priority alerts: known-bad destinations from threat intelligence, and extremely regular beacons (stddev below 5 seconds) that are suspicious regardless of destination reputation.

Both RisingWave and Apache Flink can process streaming NetFlow data, but the operational and developer experience differs substantially for a security engineering team.

Language and tooling. Flink's native API requires Java or Scala. Flink SQL exists, but it runs inside the Flink runtime and requires cluster management (JobManager, TaskManagers, checkpointing configuration). Writing and testing detection logic means packaging JARs or managing Flink SQL scripts through the Flink REST API or SQL Client. RisingWave exposes a standard PostgreSQL wire protocol, so analysts can iterate on detection logic using psql, any PostgreSQL-compatible client, or tools like DBeaver and DataGrip without writing application code.

Checkpoint and state management. Flink requires configuring checkpointing intervals and state backends (RocksDB for large state, heap-based for small state) to ensure fault tolerance. Misconfigured checkpoints are a common cause of Flink job failures under backpressure. RisingWave handles state persistence and recovery internally, without requiring the operator to tune checkpoint configuration.

PostgreSQL compatibility. RisingWave's PostgreSQL wire protocol compatibility means that downstream dashboards (Grafana, Metabase, Superset) can connect directly using standard PostgreSQL drivers. Results from c2_beacon_alerts are queryable with SELECT * FROM c2_beacon_alerts ORDER BY stddev_interval_sec LIMIT 50 from any analyst workstation. Connecting Flink results to a dashboard typically requires sinking to an intermediate store (Kafka, Elasticsearch, or a database) and querying that store separately.

Window function support. Flink SQL supports LAG() in streaming contexts, but behavior and performance characteristics differ between batch and streaming modes. RisingWave supports standard SQL window functions in materialized views with consistent incremental semantics, making the LAG()-based interval computation straightforward to reason about.

For security teams that want to ship detection logic quickly and iterate based on alert feedback, RisingWave's SQL-native approach removes the operational overhead that makes Flink setups slow to deploy and modify.

How Do You Tune Detection for Low-Jitter Evasion?

Sophisticated operators configure high jitter values (30 to 50%) in their C2 frameworks specifically to defeat standard deviation-based detection. A 300-second beacon with 50% jitter produces intervals uniformly distributed between 150 and 450 seconds, raising the standard deviation to approximately 86 seconds, well above the 30-second threshold.

Several complementary approaches handle high-jitter beacons.

Coefficient of variation. Normalizing standard deviation by the mean interval (stddev / avg) produces the coefficient of variation (CV). For a high-jitter beacon with 50% uniform jitter and a 300-second mean, the CV is approximately 0.29. Most organic traffic to fixed destinations has a CV above 1.0 due to the bursty nature of human browsing. A CV threshold of 0.5 catches high-jitter beacons that survive the raw standard deviation filter:

CREATE MATERIALIZED VIEW high_jitter_candidates AS
SELECT *,
    stddev_interval_sec / NULLIF(avg_interval_sec, 0) AS cv
FROM beacon_candidates
WHERE
    flow_count >= 10
    AND avg_bytes < 1024
    AND stddev_interval_sec / NULLIF(avg_interval_sec, 0) < 0.5;

Autocorrelation proxy. True periodicity, even with jitter, produces autocorrelation at the fundamental frequency. A rough proxy is computing the difference between observed flow counts in each sub-interval bucket and the expected count under a uniform distribution. RisingWave's tumbling window functions can bucket flows into 60-second bins to approximate this distribution analysis.

Long observation windows. Extending the aggregation window from 1 hour to 6 or 24 hours reduces the impact of jitter on standard deviation estimates and increases flow_count, improving statistical power. The tradeoff is increased detection latency for new beacons.

What Does a Complete Detection Pipeline Look Like?

The full pipeline from flow ingestion to alert has four components in RisingWave:

  1. Source: the net_flows Kafka source receives raw flow records.
  2. Interval view: flow_intervals computes LAG()-based inter-flow intervals per src/dst pair.
  3. Aggregate view: beacon_candidates rolls up statistics over the past hour.
  4. Alert view: c2_beacon_alerts applies thresholds and surfaces actionable rows.

Each downstream view is a materialized view that RisingWave maintains incrementally. Changes propagate from the source through the pipeline automatically as new flow records arrive. Analysts query c2_beacon_alerts directly and can subscribe to changes using SUBSCRIBE or sink the alerts to a Kafka topic for integration with a SIEM.

CREATE SINK c2_alerts_to_kafka
FROM c2_beacon_alerts
WITH (
    connector = 'kafka',
    topic = 'c2-beacon-alerts',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Downstream SIEM systems consume the c2-beacon-alerts Kafka topic to correlate beacon detections with other indicators such as DNS resolutions, file system events from EDR telemetry, and authentication logs.

Frequently Asked Questions

Does this approach work against DGA-based C2 traffic?

DGA-based C2 changes destination domains periodically to evade static blocklists, but the timing regularity and small payload size of the beacon remain unchanged. The SQL detection approach described here targets the statistical properties of the connection pattern, not the destination. A DGA beacon that fires every 300 seconds with low jitter will still produce a low standard deviation in inter-flow intervals and trigger the alert view, regardless of which DGA-generated domain it resolves to.

How many flows per second can RisingWave handle?

RisingWave is designed for high-throughput streaming workloads. A single RisingWave cluster handles millions of events per second with horizontal scaling. For NetFlow data from enterprise networks (typically millions of flows per hour rather than per second at the flow record level), a modest RisingWave deployment is sufficient. For service provider environments with tens of millions of flows per minute, RisingWave can be scaled horizontally across multiple compute nodes.

Will legitimate monitoring agents trigger false positives?

Some application monitoring agents, NTP synchronization, and keepalive mechanisms do produce regular periodic connections. The avg_bytes < 1024 filter eliminates many of these because monitoring agents often transmit larger metric payloads. An allowlist of known-good src/dst pairs can be maintained in a separate table and applied as a NOT EXISTS filter on the alert view to suppress recurring false positives from understood traffic sources.

How do I validate the detection logic before deploying to production?

RisingWave supports querying historical data from Kafka by setting scan.startup.mode = 'earliest' on the source. This allows the detection pipeline to be run against historical NetFlow data from a previous incident to verify that known beaconing activity would have been detected. Synthetic test data can also be injected into the Kafka topic to validate threshold behavior without requiring a live attacker.


Ready to deploy streaming threat detection in your environment? Explore the RisingWave network anomaly detection guide for architecture patterns, connector configurations, and performance benchmarks for security workloads.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.