A Serialization Format (often called a "wire format") defines the rules and conventions for converting structured data (like objects, records, or messages) into a sequence of bytes for storage or transmission across a network, and for converting those bytes back into a usable structured format (deserialization).
In the context of data streaming and distributed systems, the choice of serialization format is crucial as it impacts performance, data size, schema evolution capabilities, and interoperability between different services and programming languages.
JSON (JavaScript Object Notation):
Apache Avro:
Protobuf (Protocol Buffers):
CSV (Comma-Separated Values):
Plain Text / Bytes:
When defining SOURCEs (for data ingestion) or SINKs (for data egress) in RisingWave, you specify the serialization format of the data. RisingWave's connectors use this information to correctly interpret incoming byte streams or to format outgoing data. Common formats supported by RisingWave for sources like Kafka include:
The choice of format for your data streams feeding into RisingWave will depend on your specific upstream systems, performance requirements, and schema management strategy.