How to Stream Data from S3 to a Streaming Database

How to Stream Data from S3 to a Streaming Database

How to Stream Data from S3 to a Streaming Database

S3 contains vast amounts of data in Parquet, CSV, and JSON files. RisingWave can ingest data from S3 as a source, processing files as they arrive for real-time analytics over object storage data.

S3 Source in RisingWave

CREATE SOURCE s3_events (user_id INT, event VARCHAR, ts TIMESTAMP)
WITH (
  connector = 's3',
  s3.region_name = 'us-east-1',
  s3.bucket_name = 'my-data-bucket',
  s3.match_pattern = 'events/*.json',
  s3.access = 'your-key',
  s3.secret = 'your-secret'
) FORMAT PLAIN ENCODE JSON;

CREATE MATERIALIZED VIEW s3_analytics AS
SELECT event, COUNT(*) as cnt, MAX(ts) as latest
FROM s3_events GROUP BY event;

RisingWave watches for new files in S3 and processes them as they appear.

Frequently Asked Questions

Can RisingWave process Parquet files from S3?

Yes. RisingWave supports JSON, CSV, and Parquet formats from S3 sources.

Is S3 ingestion real-time?

Near-real-time. RisingWave polls for new files at configurable intervals. For true real-time, use Kafka or CDC sources.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.