Join our Streaming Lakehouse Tour!
Register Now.->

Sink

In stream processing, a Sink is a component that represents the destination for the results of a streaming computation. After data has been ingested (from a Source), processed, transformed, and analyzed by the stream processing engine (like RisingWave), the final output or the continuous stream of changes needs to be sent to an external system for storage, further analysis, or to trigger downstream actions. This external system is the Sink.

In RisingWave, Sinks are primarily used to export data from Materialized Views or Tables to other systems.

Core Purpose

  • Data Egress: The primary function of a Sink is to enable data to flow out of the stream processing system.
  • Integration with External Systems: Sinks allow the stream processor to connect and write data to a wide variety of other technologies, such as:
    • Message Queues / Event Streaming Platforms (e.g., Apache Kafka, Apache Pulsar, Redpanda)
    • Databases (Relational or NoSQL)
    • Data Warehouses or Data Lakes (e.g., Apache Iceberg tables, Snowflake, BigQuery)
    • Monitoring Systems
    • Custom applications or APIs
  • Operationalizing Insights: By sending processed data to operational systems, Sinks help in making real-time insights actionable (e.g., updating a dashboard, sending alerts, feeding a recommendation engine).

Key Characteristics and Operations

  • Connectors: Sinks are implemented using connectors specific to the target technology. The connector handles the protocol, authentication, data formatting, and write semantics required by the external system.
  • Data Formatting: The Sink is responsible for serializing the data from the stream processor's internal format into the format expected by the external system (e.g., JSON, Avro, Parquet).
  • Delivery Guarantees: The Sink, in conjunction with the stream processor, plays a role in providing end-to-end processing guarantees (e.g., at-least-once, exactly-once semantics). For exactly-once sinks, this often involves transactional writes or idempotent operations.
  • Change Data Capture (CDC) Output: For sinks that consume changes from a materialized view (which is common in RisingWave), the sink often outputs a stream of change events (inserts, updates, deletes) rather than just the current state. This is useful for replicating changes or feeding systems that can process CDC data.

Sinks in RisingWave

In RisingWave, you define a Sink using the CREATE SINK SQL statement. This statement specifies:

  • The source table or materialized view within RisingWave whose data (or changes) will be exported.
  • The type of the external system (e.g., kafka, iceberg, jdbc).
  • Connection details for the external system (e.g., broker addresses, database URL, credentials).
  • Data format for the output (e.g., JSON, AVRO, UPSERT_AVRO).
  • Options related to delivery guarantees or specific connector behavior.

Example (Conceptual CREATE SINK):

CREATE SINK my_kafka_sink
FROM my_materialized_view
WITH (
    connector = 'kafka',
    kafka.topic = 'processed_orders',
    kafka.brokers = 'kafka_broker1:9092',
    format = 'json',
    type = 'upsert' -- To send changes as upserts
);

Once a Sink is created, RisingWave continuously sends the changes from the specified materialized view or table to the configured external system, effectively making the real-time results generated within RisingWave available to the broader data ecosystem.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.