RisingWave vs Amazon Kinesis Data Analytics for SQL Workloads

RisingWave vs Amazon Kinesis Data Analytics for SQL Workloads

If your team runs streaming workloads on AWS, you have probably evaluated Amazon Kinesis Data Analytics. The managed service removes cluster administration, integrates natively with the AWS data stack, and now runs Apache Flink under the hood through Managed Service for Apache Flink. On paper it is an attractive option.

But AWS integration comes with trade-offs that matter at scale: vendor lock-in to Kinesis as your primary source, Flink's Java-first complexity surfacing through thin SQL wrappers, pricing that scales steeply with Kinesis Processing Units, and the inability to run the same workload outside of AWS without a full rewrite.

RisingWave is a streaming database built in Rust with a PostgreSQL-compatible SQL interface. It runs on AWS but is not exclusive to AWS. It reads from Kafka, Kinesis, Pulsar, CDC sources, and S3. It stores query results in materialized views you can query directly with psql or any PostgreSQL client. This article compares the two systems across the dimensions that matter most for data engineering teams building production streaming pipelines.

Amazon Kinesis Data Analytics has gone through two major iterations. The original version offered a built-in SQL engine for simple streaming queries over Kinesis streams. In 2019, AWS added support for Apache Flink applications. In 2023, AWS rebranded the service as Managed Service for Apache Flink to reflect its primary runtime.

The service manages Flink JobManagers and TaskManagers for you, handles Flink application packaging and deployment, provides autoscaling through Kinesis Processing Units (KPUs), and integrates with AWS services including Kinesis Data Streams, S3, DynamoDB, Lambda, and MSK.

What it does not manage is the complexity of Apache Flink's programming model. To write anything beyond the simplest aggregations, you are writing a Flink application in Java, Scala, or Python using the Table API, the DataStream API, or Flink SQL submitted through an external client. Each of these requires familiarity with Flink's connector configuration, watermark strategies, state backends, and deployment lifecycle.

What Is RisingWave?

RisingWave is a streaming database that combines stream processing and serving in a single system. You write SQL to define streaming pipelines. Results are continuously maintained in materialized views. Queries against those views return fresh, low-latency results over a standard PostgreSQL connection.

The key architectural properties:

  • Compute-storage separation: Compute nodes process streams. State is persisted to S3-compatible object storage (Hummock, RisingWave's LSM-tree storage engine). The two scale independently.
  • PostgreSQL wire protocol: Connect with psql, JDBC, Python psycopg2, Go pgx, or any tool that speaks PostgreSQL. No SDK, no custom client, no JVM required.
  • Multi-source ingestion: Kafka, Kinesis, Pulsar, PostgreSQL CDC, MySQL CDC, MongoDB CDC, S3, Iceberg, and more. No vendor dependency on any single broker.
  • Apache 2.0 license: Self-host on any cloud, on-premises, or on Kubernetes without restriction.

RisingWave Cloud is a fully managed deployment available on AWS, GCP, and Azure. The same codebase runs self-hosted on EKS or EC2, giving AWS teams full portability.

SQL is where the day-to-day experience diverges most sharply.

Managed Service for Apache Flink supports Flink SQL, but the SQL workflow is not as seamless as a native SQL database. Flink SQL must be embedded inside a JAR application, submitted via the Kinesis Studio notebook environment (built on Apache Zeppelin), or submitted through the Flink REST API. There is no built-in SQL prompt connected directly to your streaming pipeline.

Flink SQL supports:

  • Windowed aggregations (TUMBLE, HOP, SESSION via the TABLE-VALUED FUNCTION syntax)
  • Stream-table joins and stream-stream joins with watermarks
  • Pattern matching with MATCH_RECOGNIZE
  • Limited subquery support

What it lacks:

  • A native PostgreSQL wire-protocol interface
  • Direct SELECT queries against incrementally maintained views from any client
  • Familiarity for engineers who know standard PostgreSQL

Connector configuration in Flink SQL uses WITH clause properties, but getting the right Kinesis connector version, the right AWS credentials, and the right serialization format (JSON vs Avro vs Protobuf) requires consulting Flink connector documentation carefully. Schema evolution adds another layer of complexity.

RisingWave SQL

RisingWave implements PostgreSQL-compatible SQL end-to-end. The same SQL patterns you use in PostgreSQL work in RisingWave: CTEs, subqueries, window functions, aggregate functions, joins, and more. Engineers familiar with any SQL database can be productive immediately.

Here is how a complete streaming pipeline looks in RisingWave SQL, ingesting from Kinesis and computing revenue by region in a tumbling window:

-- Step 1: Define a source backed by Amazon Kinesis
CREATE SOURCE kda_orders_source (
    order_id    BIGINT,
    user_id     INT,
    product_id  INT,
    amount      DOUBLE PRECISION,
    region      VARCHAR,
    order_ts    TIMESTAMPTZ
)
WITH (
    connector            = 'kinesis',
    stream.name          = 'orders-stream',
    stream.region        = 'us-east-1',
    scan.startup.mode    = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Step 2: Create a continuously maintained materialized view
CREATE MATERIALIZED VIEW kda_revenue_by_region AS
SELECT
    region,
    window_start,
    window_end,
    COUNT(*)       AS order_count,
    SUM(amount)    AS total_revenue,
    AVG(amount)    AS avg_order_value
FROM TUMBLE(kda_orders_source, order_ts, INTERVAL '1 HOUR')
GROUP BY region, window_start, window_end;

-- Step 3: Query results with any PostgreSQL client
SELECT * FROM kda_revenue_by_region
WHERE region = 'us-east-1'
ORDER BY window_start DESC
LIMIT 5;

That is the complete pipeline. No JAR file. No Flink cluster configuration. No JVM version management. The CREATE MATERIALIZED VIEW statement defines the streaming computation, and RisingWave maintains the result incrementally as new events arrive from Kinesis.

Kafka vs Kinesis as a Source

This is one of the most important practical differences for AWS data engineers.

Managed Service for Apache Flink is engineered to work best with Kinesis Data Streams. The service's native autoscaling is based on KPUs tied to Kinesis shard count. While it can technically connect to Kafka through Amazon MSK using the Flink Kafka connector, the operational experience is noticeably smoother with Kinesis.

This creates a gravitational pull toward Kinesis as your primary message broker. If your team uses Apache Kafka, Redpanda, or Confluent Cloud, you either migrate to Kinesis or accept a more complex Flink setup that requires custom connector configuration, IAM role management for cross-account MSK access, and potentially a VPC peering arrangement.

RisingWave: Broker-Agnostic by Design

RisingWave has no preferred source. It connects to Kafka, Kinesis, Pulsar, Redpanda, and other brokers using the same SQL syntax pattern:

-- Kafka source
CREATE SOURCE kda_events_kafka (
    event_id  BIGINT,
    user_id   INT,
    event_ts  TIMESTAMPTZ,
    payload   VARCHAR
)
WITH (
    connector                    = 'kafka',
    topic                        = 'app-events',
    properties.bootstrap.server  = 'broker1:9092,broker2:9092',
    scan.startup.mode            = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Kinesis source (same structure, different connector)
CREATE SOURCE kda_events_kinesis (
    event_id  BIGINT,
    user_id   INT,
    event_ts  TIMESTAMPTZ,
    payload   VARCHAR
)
WITH (
    connector         = 'kinesis',
    stream.name       = 'app-events',
    stream.region     = 'us-east-1',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

The materialized view definition is identical regardless of which connector you use. If your team migrates from Kinesis to Kafka (or the reverse), you update the CREATE SOURCE statement. The rest of your SQL pipeline stays the same. This portability is not possible when your streaming compute is tightly coupled to your message broker.

Stream-Table Joins and Multi-Source Enrichment

One of the most common streaming SQL patterns is enriching events with dimension data. For example, joining an orders stream against a product reference table to add product names and categories to each order record.

In Managed Flink, this requires careful coordination: the reference table must be available as a Flink LookupJoin source (typically backed by DynamoDB, JDBC, or HBase), and watermarks must be configured to handle late arrivals. The operational overhead of keeping reference data fresh and coordinated across the Flink job adds significant complexity.

RisingWave handles stream-table joins natively with standard SQL:

-- Reference table loaded from an external source or direct inserts
CREATE TABLE kda_products (
    product_id  INT PRIMARY KEY,
    name        VARCHAR,
    category    VARCHAR,
    price       DOUBLE PRECISION
);

-- Continuously maintained join: orders enriched with product details
CREATE MATERIALIZED VIEW kda_enriched_orders AS
SELECT
    o.order_id,
    o.user_id,
    o.region,
    o.amount,
    o.order_ts,
    p.name        AS product_name,
    p.category    AS product_category
FROM kda_orders_source o
JOIN kda_products p ON o.product_id = p.product_id;

RisingWave maintains this join incrementally. When a new order arrives, the join executes immediately against the current product table state. When the product table is updated (price change, new category), downstream materialized views that depend on it are also updated incrementally.

This is the pattern covered in more depth in stream-table joins explained, and it is one of the areas where a native streaming database delivers the most tangible engineering advantage over a processing framework layered on top of a message broker.

Windowing: SQL Syntax Compared

Window aggregations are the core of most streaming analytics. Here is how the syntax compares for a 5-minute tumbling window:

Flink SQL (Managed Flink):

SELECT
    region,
    window_start,
    window_end,
    SUM(amount) AS revenue
FROM TABLE(
    TUMBLE(TABLE orders, DESCRIPTOR(order_ts), INTERVAL '5' MINUTES)
)
GROUP BY region, window_start, window_end;

RisingWave SQL:

CREATE MATERIALIZED VIEW kda_orders_5min AS
SELECT
    region,
    window_start,
    window_end,
    SUM(amount) AS revenue
FROM TUMBLE(kda_orders_source, order_ts, INTERVAL '5 MINUTES')
GROUP BY region, window_start, window_end;

Both use similar windowing syntax, but RisingWave wraps the result in a CREATE MATERIALIZED VIEW that persists the computation continuously. In Flink SQL through Managed Flink, the result must be written to a sink (S3, DynamoDB, Kinesis) before it can be consumed. In RisingWave, you query the materialized view directly.

For top-N patterns, which are common in leaderboards and ranking workloads, RisingWave SQL is straightforward:

CREATE MATERIALIZED VIEW kda_top_products_per_region AS
WITH product_revenue AS (
    SELECT
        region,
        product_id,
        SUM(amount)  AS total_revenue,
        COUNT(*)     AS order_count
    FROM kda_orders_source
    GROUP BY region, product_id
),
ranked AS (
    SELECT
        region,
        product_id,
        total_revenue,
        order_count,
        ROW_NUMBER() OVER (
            PARTITION BY region
            ORDER BY total_revenue DESC
        ) AS rank
    FROM product_revenue
)
SELECT region, product_id, total_revenue, order_count, rank
FROM ranked
WHERE rank <= 3;

This view is maintained incrementally. When a new order arrives, RisingWave updates only the affected rows in the ranking. The SELECT against this view always returns the current top-3 products per region without scanning the full event history.

Open Source vs AWS Lock-In

This is often the deciding dimension for mature engineering teams.

The Managed Service for Apache Flink is an AWS-only service. Your application lifecycle (create, start, stop, update, scale) is managed through the AWS Console, CloudFormation, CDK, or the AWS CLI. Application state checkpoints are stored in S3 but in an AWS-proprietary format that cannot be portably restored to a self-hosted Flink cluster without additional work.

If your organization needs to:

  • Run the same workload in GCP or Azure for disaster recovery
  • Test locally without AWS credentials
  • Avoid licensing or support dependencies on a cloud provider
  • Operate in a regulated environment where cloud vendor lock-in is explicitly prohibited

...then Managed Flink creates friction that compounds over time.

RisingWave: Apache 2.0, Cloud-Agnostic

RisingWave is fully open source under Apache 2.0. Every feature available in RisingWave Cloud is also available in the open-source self-hosted version with no feature gates, no usage caps, and no license key.

You can run RisingWave:

  • On AWS EKS or EC2 (self-managed)
  • On RisingWave Cloud (AWS, GCP, or Azure)
  • On-premises on bare metal or VMware
  • Locally via docker run for development and testing
  • On any Kubernetes distribution via the RisingWave Helm chart

The same SQL, the same connector configurations, and the same operational tooling work in every deployment mode. This flexibility is particularly valuable for teams that build on AWS today but want to preserve multi-cloud optionality, or for teams in industries where vendor dependency is a compliance concern.

PostgreSQL Compatibility: A Practical Advantage

RisingWave's PostgreSQL compatibility is not just a marketing claim. It has concrete implications for how you integrate streaming results into your stack.

BI and dashboarding tools: Grafana, Metabase, Tableau, Superset, and Redash all support PostgreSQL connections natively. You can point them directly at RisingWave's materialized views and get live-updating dashboards without any intermediate serving layer.

Application backends: Any application that uses a PostgreSQL ORM (SQLAlchemy, ActiveRecord, Prisma, GORM) can query RisingWave directly. Real-time leaderboards, live inventory counts, and fraud alerts can be served directly from materialized views rather than from a separate Redis or DynamoDB layer.

Standard tooling: psql, pg_dump for schema export, pgAdmin, DBeaver, and DataGrip all connect to RisingWave without additional configuration.

With Managed Flink, results must be written to a sink before they can be consumed: S3, DynamoDB, Kinesis, RDS, or OpenSearch. This adds latency (the time to write to the sink plus the time for the consuming service to poll), operational complexity (managing and monitoring the sink), and cost (additional read/write operations against the sink service).

For a deeper look at how RisingWave compares to other streaming SQL tools in the PostgreSQL compatibility dimension, see the RisingWave vs ksqlDB comparison.

Pricing Comparison

Pricing models differ significantly, and the cost profile changes depending on your workload size.

Managed Flink charges per Kinesis Processing Unit (KPU). One KPU is 1 vCPU and 4 GB of memory. Pricing (us-east-1 as of April 2026):

  • KPU on-demand: $0.11 per KPU-hour
  • Durable application state (RocksDB backed): $0.10 per GB-month
  • Data transfer: standard AWS rates

A typical production streaming job with 4 KPUs (4 vCPUs, 16 GB RAM) costs roughly $0.44/hour, or $316/month, plus state storage and data transfer. Autoscaling can spike this cost unpredictably during traffic surges.

Additionally, if your data flows through Kinesis Data Streams as the source, you pay for:

  • Shard hours: $0.015 per shard-hour
  • PUT payload units: $0.014 per million units
  • Extended data retention (beyond 24 hours): $0.023 per shard-hour

For a 10-shard Kinesis stream with 30-day retention, that is roughly $180/month before any compute costs.

RisingWave Pricing

Self-hosted (open source): You pay only for your infrastructure. On AWS, a 3-node RisingWave cluster using r6i.xlarge instances (4 vCPU, 32 GB RAM each) costs approximately $540/month. State storage is in S3 at $0.023/GB/month, which is negligible compared to compute. There is no per-record or per-shard charge.

RisingWave Cloud (managed): Priced per RisingWave Unit (RWU). Starts at approximately $0.227 per RWU/hour, with a free tier for evaluation. Annual commitments provide volume discounts.

Source flexibility savings: Because RisingWave can read directly from Kafka (including Amazon MSK), you can avoid Kinesis Data Streams costs entirely if you are not already committed to Kinesis. MSK pricing starts at $0.21/hour for a single-broker cluster with no per-record fees.

Cost Scenario: 100K Events/Second, 24/7

Cost FactorManaged Flink + KinesisRisingWave Self-Hosted on AWS
Compute~$316/month (4 KPUs)~$540/month (3x r6i.xlarge)
State storage~$50/month (500 GB RocksDB)~$12/month (500 GB S3)
Message broker~$180/month (Kinesis 10 shards)$0 (using existing Kafka/MSK)
Serving layer~$100/month (DynamoDB or ElastiCache)$0 (query MVs directly)
Monthly total~$646/month~$552/month
Annual total~$7,750/year~$6,624/year

At larger scale, the gap widens. Managed Flink KPU costs scale linearly with throughput. RisingWave's compute-storage separation means you can scale compute and storage independently, and object storage costs stay low regardless of state size. Additionally, as workloads grow beyond what self-hosting comfortably handles, RisingWave Cloud's pricing remains predictable rather than scaling with per-record or per-shard charges.

For a broader pricing comparison across streaming platforms, see streaming database pricing comparison.

Feature Comparison Table

DimensionRisingWaveManaged Service for Apache Flink
SQL interfacePostgreSQL-compatible, native promptFlink SQL via notebook or REST API
SQL client compatibilityAny PostgreSQL client (psql, JDBC, BI tools)Flink REST API, Kinesis Studio notebook
Serving layer requiredNo (query MVs directly)Yes (sink to S3, DynamoDB, etc.)
Kafka sourceNative connectorFlink Kafka connector (via MSK or self-managed)
Kinesis sourceNative connectorNative (primary optimized source)
Pulsar sourceNative connectorVia community Flink connector
PostgreSQL CDC sourceNative connectorVia Debezium Flink connector
State backendHummock (LSM on S3)RocksDB on EBS or local SSD
State scalabilityUnlimited (S3)Bounded by EBS volume / local disk
LicenseApache 2.0AWS proprietary managed service
Self-hosted optionYes (any cloud, on-prem, Kubernetes)No (AWS only)
Multi-cloudYes (AWS, GCP, Azure, on-prem)No
Programming language requiredSQL onlyJava/Scala/Python for non-trivial jobs
PostgreSQL ecosystem toolsFull compatibilityNot applicable
Managed offeringRisingWave Cloud (AWS, GCP, Azure)Managed Service for Apache Flink (AWS only)
Pricing modelPer RWU/hour or infrastructure costPer KPU/hour + shard/hour (Kinesis)
AutoscalingElastic (RisingWave Cloud)KPU-based (Managed Flink)
Open sourceYes (github.com/risingwavelabs/risingwave)No

Running RisingWave on AWS Without Lock-In

For AWS data engineering teams, RisingWave provides a path to full streaming SQL capability without surrendering portability.

Deployment options on AWS:

  1. RisingWave Cloud on AWS: Fully managed, single-click deployment. RisingWave handles cluster management, scaling, and upgrades. Your data stays in your AWS account's S3 bucket.

  2. Self-hosted on EKS: Deploy via the RisingWave Helm chart. RisingWave nodes run in your VPC. State goes to your S3 bucket. You control the network, IAM, and security policies.

  3. Self-hosted on EC2: Use the RisingWave binary or Docker image directly on EC2 instances. Suitable for teams that prefer to manage their own VM infrastructure.

In all three modes, RisingWave connects to Amazon MSK (Kafka-compatible), Amazon Kinesis, Amazon S3, Aurora PostgreSQL via CDC, and Amazon RDS for MySQL via CDC. IAM-based authentication is supported for Kinesis and S3 access.

This architecture is explored in more depth in the streaming database Kubernetes deployment guide.

When to Choose Each System

Choose Managed Service for Apache Flink if:

  • Your team has existing Flink expertise (Java/Scala) and Flink-specific requirements (MATCH_RECOGNIZE, Flink-native connectors)
  • Your workload is tightly integrated with Kinesis Data Streams and you are committed to the AWS ecosystem long-term
  • You need features specific to Flink's DataStream API that have no SQL equivalent
  • You already use other AWS managed services (Glue, EMR) and want a unified operational model on AWS

Choose RisingWave if:

  • Your team wants to write and maintain streaming pipelines in SQL without learning Flink's Java API
  • You need PostgreSQL compatibility to integrate with BI tools, application backends, or analytics dashboards directly
  • You consume from Kafka, Kinesis, or multiple broker types and want broker-agnostic streaming SQL
  • You need to run the same workload across multiple clouds or on-premises without architectural changes
  • You want to avoid a separate serving layer (DynamoDB, Redis, ElastiCache) by querying materialized views directly
  • Cost efficiency at scale matters and you want to take advantage of S3-based state storage

Frequently Asked Questions

Can RisingWave read from Amazon Kinesis Data Streams?

Yes. RisingWave has a native Kinesis connector. You create a CREATE SOURCE statement specifying connector = 'kinesis', the stream name, and the AWS region. RisingWave uses standard AWS credential chains (IAM roles, environment variables, or credential files) for authentication. This means you can use the same IAM role-based access control that governs other AWS services in your account. The connection requires no additional proxy or bridge service.

Does RisingWave require Kafka to work on AWS?

No. RisingWave is broker-agnostic. It connects to Kafka (including Amazon MSK), Amazon Kinesis, Apache Pulsar, Redpanda, and other message brokers independently. You can use RisingWave entirely without Kafka if your data flows through Kinesis, or without Kinesis if your data flows through Kafka. The SQL pipeline definition is identical regardless of which broker backs the source.

What happens to streaming state if a RisingWave node fails on AWS?

RisingWave uses a checkpoint-based recovery mechanism inspired by the Chandy-Lamport algorithm. The Meta Node periodically injects barrier markers into the data stream. When barriers propagate through all operators, each compute node snapshots its state to Hummock, which writes to your S3 bucket. If a compute node fails, RisingWave reschedules that node's tasks to healthy compute nodes and restores state from the last checkpoint in S3. Recovery typically completes in seconds. With the default one-second checkpoint interval, the recovery point objective (RPO) is approximately one second of event data. Unlike Managed Flink's RocksDB-backed state on EBS, RisingWave's state on S3 benefits from S3's 11-nines durability without requiring EBS snapshot management.

Is RisingWave production-ready on AWS in 2026?

Yes. RisingWave is deployed in production at companies across financial services, e-commerce, IoT, and real-time analytics. RisingWave Cloud (the managed offering) runs on AWS and is covered by an enterprise SLA. The self-hosted deployment on EKS is documented with production-grade Helm charts covering high availability, autoscaling, monitoring with Prometheus/Grafana, and backup configuration. Version 2.8 (current as of April 2026) is stable and actively maintained. See companies using streaming databases in production for real-world deployment examples.

Conclusion

Amazon Kinesis Data Analytics (Managed Service for Apache Flink) is a capable streaming compute service for teams invested in the AWS ecosystem and willing to work with Flink's Java-centric programming model. Its tight integration with Kinesis and other AWS services reduces operational overhead within that ecosystem.

RisingWave offers a different trade-off: a streaming SQL database that works on AWS without requiring AWS exclusivity. SQL is the primary interface. PostgreSQL compatibility means your existing tools connect without modification. The Kinesis connector gives you direct access to your existing Kinesis streams. State is stored in S3, so durability is automatic and costs are low. And because RisingWave is Apache 2.0 open source, you are never locked into a single cloud provider or a single managed service.

For AWS data engineering teams that want streaming SQL without the complexity of Flink applications, and without surrendering portability, RisingWave is worth a direct evaluation against your current workload.

Try RisingWave Cloud free at cloud.risingwave.com or pull the Docker image and run RisingWave locally in minutes.

Join the RisingWave Slack community to connect with engineers using RisingWave on AWS and get answers from the core team.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.