Real-Time Log Analytics Without Elasticsearch

Streaming SQL is a better fit for log analytics when your goal is continuous aggregation, alerting, and dashboards over a live Kafka feed. RisingWave processes each log event as it arrives, maintains pre-computed results in SQL materialized views, and gives you sub-second query latency without the indexing delay that Elasticsearch introduces before logs become searchable.

What is the indexing latency problem with Elasticsearch?

When logs arrive at Elasticsearch, they are not immediately searchable. Each document must pass through an indexing pipeline: the data is parsed, analyzed, tokenized into an inverted index, and written to Lucene segments. Only after this pipeline completes does the data surface in search results.

In practice, this means typical indexing latency ranges from 30 seconds to 2 minutes under production load. During high-volume ingestion spikes, the latency grows further because Elasticsearch throttles indexing to protect heap pressure. For security event detection or real-time error rate alerting, a 60-second blind spot is a meaningful gap.

Beyond latency, the inverted index itself consumes substantial storage. Every indexed field is tokenized and stored redundantly alongside the raw document. For high-cardinality fields such as IP addresses, user agents, or trace IDs, the index overhead can reach 2 to 3 times the raw log size. Organizations that retain months of indexed logs at scale frequently find that Elasticsearch storage costs dominate their observability budget.

How does streaming SQL analyze logs differently?

RisingWave processes log events in a fundamentally different model. Instead of storing raw events and building an index for later search, it maintains incrementally updated aggregations as events flow through the system.

When a log event arrives from Kafka, RisingWave evaluates it against all active materialized views at ingestion time. If you have a materialized view computing error counts per service per minute, that count is updated the moment the event arrives, not after an indexing delay. A downstream query against the materialized view reads pre-computed results, not raw documents, so it returns in milliseconds regardless of how many events have been ingested.

This means there is no index storage overhead for aggregated results. A materialized view tracking error rates for 1,000 services holds 1,000 rows, not billions of raw log lines. The storage footprint for operational metrics is orders of magnitude smaller than a full-text log index.

The trade-off is intentional: RisingWave excels at continuous, incremental computation over streams. It is not a document store for ad-hoc retrieval of raw log content.

Elasticsearch vs RisingWave for log analytics: a comparison

Dimension	Elasticsearch	RisingWave
Query latency after ingestion	30s to 2 minutes (indexing delay)	Sub-second (pre-computed materialized views)
Query language	KQL / Lucene query syntax	Standard SQL (PostgreSQL-compatible)
Infrastructure model	Index shards, replica management, JVM heap tuning	Streaming database, PostgreSQL wire protocol
Storage model	Inverted index per document field (2 to 3x raw size)	Incremental state for aggregations (compact)
Alerting approach	Watcher rules polling the index	Streaming SQL rules trigger on live data
Full-text raw log search	Native strength	Not the primary use case
Open source license	Server Side Public License (SSPL)	Apache 2.0

The table shows that the two systems optimize for different access patterns. Elasticsearch is built for full-text retrieval of raw events. RisingWave is built for continuous aggregation and real-time query over streaming data.

What log analytics use cases work best with streaming SQL?

Error rate monitoring per service. When you need to know within seconds that a service error rate has crossed a threshold, a streaming materialized view computes that count continuously. There is no polling interval, no indexing wait, and no full-document scan at query time.

Security event detection. Detecting patterns such as repeated failed authentication from a single IP, or an unusual sequence of privilege escalation events, benefits from stream-table joins and window functions that RisingWave supports natively. You define the detection logic in SQL and the engine evaluates it continuously against the incoming event stream.

Infrastructure health dashboards. Aggregating CPU saturation events, memory pressure signals, or network error counts by host over a rolling 5-minute window is a natural fit for tumbling and sliding window aggregations in SQL. Grafana can query the materialized views directly over the PostgreSQL wire protocol.

Compliance audit counts. Regulatory requirements often call for counts of access events, data export events, or authentication attempts over defined time periods. A materialized view that accumulates these counts by category can serve audit queries instantly, without scanning raw log files.

How to build a streaming log analytics pipeline with RisingWave

The pipeline has three components: a Kafka topic receiving structured log events, a RisingWave source connected to that topic, and materialized views that compute the metrics you care about.

Step 1: Create a Kafka source for log events

CREATE SOURCE log_events (
    event_time   TIMESTAMPTZ,
    service_name VARCHAR,
    log_level    VARCHAR,
    message      VARCHAR,
    host         VARCHAR,
    trace_id     VARCHAR
)
WITH (
    connector     = 'kafka',
    topic         = 'application-logs',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

This source connects RisingWave to your existing Kafka topic. Log events are read as they are produced, with no separate connector process to manage.

Step 2: Create a materialized view for error rates per service

CREATE MATERIALIZED VIEW error_rate_per_service AS
SELECT
    service_name,
    window_start,
    window_end,
    COUNT(*)                                              AS total_events,
    COUNT(*) FILTER (WHERE log_level = 'ERROR')           AS error_count,
    ROUND(
        COUNT(*) FILTER (WHERE log_level = 'ERROR') * 100.0
        / NULLIF(COUNT(*), 0),
        2
    )                                                     AS error_rate_pct
FROM TUMBLE(log_events, event_time, INTERVAL '1' MINUTE)
GROUP BY service_name, window_start, window_end;

RisingWave maintains this view incrementally. As each new log event arrives from Kafka, the relevant window bucket is updated in place. A Grafana dashboard querying SELECT * FROM error_rate_per_service WHERE window_end >= NOW() - INTERVAL '15 minutes' returns the latest per-service error rates with millisecond latency.

Step 3: Connect Grafana

Because RisingWave speaks the PostgreSQL wire protocol, you add it to Grafana as a standard PostgreSQL data source. Point it at the RisingWave host and port, select the materialized view as your table, and build time-series panels using the window_start or window_end columns as the time field. No plugin, no custom adapter.

When does Elasticsearch still make sense?

Full-text search on raw log content is Elasticsearch's native strength. If your workflow requires searching for an arbitrary string across billions of historical log messages, the inverted index is the right data structure for that access pattern. Grep-style queries over raw text, fuzzy matching on message fields, and highlighting matched terms in context are capabilities that the Elasticsearch query engine handles well.

Ad-hoc historical investigation also favors Elasticsearch when the investigation involves unstructured queries you could not have anticipated in advance. Materialized views are defined ahead of time; if you need to answer a question that was not anticipated when the views were created, you may need access to the raw event store. Many teams use both systems together: RisingWave handles continuous metrics and alerting, while a tiered cold store (S3 plus Elasticsearch or OpenSearch) handles historical forensics.

If your log analytics requirements center on operational dashboards, threshold-based alerting, and compliance counters over structured log fields, streaming SQL with RisingWave removes the indexing delay and reduces storage cost without requiring you to learn a custom query language.

Frequently asked questions

Does RisingWave replace the entire ELK stack?

It replaces the Elasticsearch component for aggregation and alerting workloads. Logstash or Fluent Bit can continue to write structured logs to Kafka, and RisingWave reads from Kafka directly. If you need raw log storage for forensic search, you can write the same Kafka topic to an object store in parallel and use Elasticsearch or OpenSearch only for that cold-path use case.

How does RisingWave handle late-arriving log events?

RisingWave supports watermark-based late event handling. You configure a watermark on the event_time field to define how late an event can arrive and still be included in the correct window. Events that arrive after the watermark deadline are dropped or routed to a side output, depending on your configuration.

What happens to the materialized views if RisingWave restarts?

RisingWave uses an exactly-once processing guarantee backed by checkpointed state. When the system restarts, it replays from the last checkpoint and resumes processing from the correct Kafka offset. Materialized view state is not lost on restart.

Can I query raw log events in RisingWave, not just aggregations?

Yes. You can define a table with Kafka as the backing source and retain raw events for a configurable retention period. You can query those raw events with SQL. However, this is row-oriented SQL over a table, not a full-text inverted index search, so it suits point lookups and filtered scans rather than fuzzy text search across billions of records.

If your log analytics pipeline is hitting indexing latency limits or Elasticsearch storage costs are growing faster than the value you get from full-text search, streaming SQL is worth evaluating. RisingWave is open source under Apache 2.0. You can connect it to an existing Kafka topic, define a materialized view, and have a live error-rate dashboard running in under an hour.

Get started at risingwave.com or explore the documentation to see how streaming SQL fits your log analytics architecture.