Discover the power of data and AI at the upcoming Data + AI Summit and make sure to visit our booth for an extraordinary experience.
Data + AI Summit 2023, organized by DataBricks, is just around the corner, and we at RisingWave Labs are thrilled to announce our sponsorship of this highly anticipated event.
Key offerings include:
Latest trends and insights on open source technologies, real-world case studies, and best practices
More than 250 sessions for technical deep dives, hands-on training, lightning talks, and more
Networking opportunities with like-minded professionals, industry experts, and thought leaders
As a leading provider of cutting-edge data and AI solutions, we understand the immense value and potential this summit holds for professionals in the industry. In this blog, we shed light on why we are sponsoring this event and why visiting our booth is a must for all attendees.
Why We Are Sponsoring Data + AI Summit 2023
1. Showcasing our innovative solutions
At RisingWave Labs, we are dedicated to pushing the boundaries of what's possible in the world of data and AI. Sponsoring the Data + AI Summit gives us a unique platform to showcase our stream processing platform to a highly targeted audience of industry professionals. We believe data and AI can benefit immensely by adopting real-time stream processing where ever possible. In this event, we will demonstrate how RisingWave can be used alongside Spark and Delta tables. For data professionals, this is a great opportunity to learn firsthand how stream processing empowers organizations to unlock the full potential of their data and accelerate their AI initiatives.
2. Engaging with industry leaders and influencers
Data + AI Summit brings together industry leaders, influencers, and decision-makers from around the globe. As a sponsor, we have the opportunity to engage in meaningful conversations, build valuable connections, and establish ourselves as thought leaders in the data field. By actively participating in discussions, sharing insights through presentations, and connecting with attendees at our booth, we aim to foster collaborations and drive the future of data and AI innovation.
Why Visit Our Booth
While attending the Data + AI Summit 2023, we invite all participants to visit our booth — Booth 46 — and explore stream processing with RisingWave. Here are the key reasons why you shouldn't miss the chance to stop by:
1. Live demos of our cutting-edge products
Witness the power of our advanced data and AI solutions through live demonstrations of our exciting products — RisingWave Cloud and RisingWave Database — at our booth. Our team of experts will showcase how our platform can help your business get continuous actionable insights and enable intelligent decision-making. Get a firsthand experience of our solutions' user-friendly interfaces, seamless integrations, and exceptional performance.
2. In-depth discussions with our experts
Our highly knowledgeable and experienced team will be available at the booth to provide a deep dive into our platform. Customers can bring up their specific challenges and discuss with the RisingWave team how stream processing can augment customer data platforms. Whether you have questions about data management, predictive analytics, machine learning, or any other aspect of data and AI, our experts are eager to provide insights, share best practices, and guide you toward the most effective solutions.
3. Swag and more
For all the attendees who visit our booth, you get special invitation codes to explore our cloud product. Last but not least, you can score a stylish t-shirt, travel bag, or mug too.
Data + AI Summit 2023 is an event that brings together the brightest minds and the most innovative solutions in the data and AI industry. As proud sponsors of this prestigious event, RisingWave Labs invites all attendees to visit our booth for a truly immersive experience. Engage with our experts, explore our cutting-edge products through live demos, and discover how our solutions can empower your organization to thrive in the era of data-driven decision-making.
Mark your calendar, make a beeline for our booth, and let’s embark on a journey of data-driven success together. We look forward to meeting you at Data + AI Summit 2023!
Unlocking the Power of Kafka Summit: Top 4 Reasons for Attending It
Attending Kafka Summit can be a game-changer for professionals in the data industry. It is a unique opportunity to learn from leading experts and stay updated with the latest trends in Kafka technology.
RisingWave Labs, founded in January 2021, is a leading company in the augmentation of enterprise data platforms with SQL-based stream processing capabilities. Our mission is to empower businesses to achieve new levels of efficiency and insights through reliable, timely, and cost-effective real-time processing of event data.
As an open-source product and in support of the tech community, we proudly sponsor Kafka Summit London 2023, the premier event for the Apache Kafka community in Europe. In this blog post, we will outline the top four reasons to attend Kafka Summit London 2023.
Why Visit RisingWave at Kafka Summit London 2023?
Whether you’re a seasoned Kafka user or just starting out, Kafka Summit London 2023 is the ultimate opportunity to learn from industry experts, network with peers, and discover the latest trends and best practices in the world of streaming data. We confidently sponsor this event for the following four compelling reasons:
Discover RisingWave products: The Kafka Summit is a fantastic opportunity to learn about the latest products and services in the data streaming industry. We are proud to have launched the beta version of RisingWave Cloud in April’2023. Visit our booth to watch demos and discover our exciting products: RisingWave Cloud and RisingWave Database.
Connect with industry experts and potential customers: At the Kafka Summit London 2023, we are bringing together industry experts, developers, and business leaders from all around the world. Our leadership, including our founder and CEO, Yingjun Wu, will be there to provide answers and share insights, best practices, and new ideas. Be sure to book time for a meeting with us at Kafka Summit London 2023.
Keep up to date with the latest developments in the Kafka ecosystem: Beyond RisingWave, Kafka Summit London 2023 features keynotes, sessions, and workshops covering the latest developments in the Kafka ecosystem. Attendees can look forward to this unique opportunity to stay up-to-date with the latest trends and developments in the industry.
Last but not least, you can score some swag: We have a variety of cool swag that you can take home as a memento of your experience. Get your hands on some stylish t-shirts and cozy mugs to keep you hydrated on the go! In addition to the swag, we will also be holding a raffle with a grand prize for one lucky winner. So make sure to stick around until the end of the event for your chance to win big!
When and Where is Kafka Summit London 2023?
Kafka Summit London 2023 is scheduled for May 15-16, 2023, at the ExCeL London in the heart of London’s Docklands. The conference will offer two full days of keynotes, workshops, and networking opportunities.
How to Find Us?
Pre Kafka Summit Social
We are partnering with Quix to host a pre-Kafka Summit London Social event that you won’t want to miss. Join us for a data streaming-themed quiz night, burgers, and drinks, all just a stone’s throw away from the conference venue.
Our quiz night is the highlight of the evening. Test your knowledge of data streaming, message brokers, stream processing, and other related topics in a fun and friendly environment. You can form teams with other attendees or compete as an individual. The winning team or individual will receive a prize.
This is the perfect opportunity to meet other data enthusiasts, learn about the latest trends in data streaming, and have fun. The event will take place the evening before the conference, giving you a chance to relax and unwind before the busy schedule.
RisingWave Labs at booth U2
If you want to stay ahead in stream processing technology, visit booth U2 at the Kafka Summit in London. Although we wish the band U2 would be present, we will settle for our expert staff to confidently answer all your questions about RisingWave.
Discover how this powerful tool with state-of-the-art real-time data processing, scalability, flexibility, and user-friendly interface can help your business achieve unparalleled success. Take advantage of the opportunity to learn more and collect some swag while you’re there!
We are thrilled to be sponsoring Kafka Summit London 2023 and look forward to meeting all the attendees. We believe that by participating in this conference, we can continue to grow and develop as a company, connect with other professionals in the field, and contribute to the ongoing growth and innovation of the data streaming and analytics industry.
RisingWave Labs Launching a Webinar Series: Streaming Stories
We are excited to announce the launch of Streaming Stories. This platform brings together stories from users around the world, all centered around their experiences with stream processing.
From the margins to the mainstream, stream processing has rapidly evolved over the last ten years. This exciting journey—which many argue began long before Apache Storm came into being—has not been without challenges and detours. Today’s state of stream processing systems is undoubtedly a result of the continual evolution of stream processing systems from subsystems of commercial database systems to standalone open-source big data systems.
The role of the community in this evaluation is of no small measure in nurturing and growing stream processing beyond what was ever thought possible. As the community has grown exponentially around different technologies, it is clear that we are witnessing the dawn of what we call the “Modern Streaming Data Stack.” This is not a monolithic, vendor-led ecosystem but a shared, purpose-driven endeavor focused on driving innovation and interoperability between various systems, regardless of whether they compete in the marketplace. Stream processing can reach a wider audience by creating an environment where developers, designers, and users can share the best ideas and technologies.
To that end, we are excited to announce the launch of Streaming Stories. This platform brings together stories from users around the world; all centered around their experiences with stream processing. Through these stories, we hope to continue strengthening the community and fostering collaboration and innovation as we strive to reach the full potential of stream processing. We look forward to hearing your stories and hope you will join us on this journey!
As a startup developing the next-generation stream processing system, this platform provides an excellent opportunity for us to contribute to the community by engaging data practitioners and creating a space for them to engage in discussions about future trends and present solutions.
With a biweekly cadence, Streaming Stories will include:
Product-focused meetups: These meetups will showcase stream processing tools along with demonstrating their integration with RisingWave. They will be technical, hands-on events highlighting key product features, typical use cases, and prime business benefits.
Panel discussions and interviews: With prominent domain experts to spur the discussion around trends, challenges, and opportunities within the stream processing space, these sessions will offer a higher-level view. They will evaluate the past, present, and future of stream processing without shying away from discussing more controversial issues.
To better illustrate the topics covered, some of the questions we have been asking ourselves and which keep inspiring us are:
Why do we need stream processing?
Can stream processing replace batch processing?
Is it one or the other, or are we talking about a peaceful coexistence here?
What tools are vital for the popularity of stream processing?
What tools—existing or in the making—are going to shape the future of stream processing?
Finally, where does RisingWave fit into this picture?
In the future, the Streaming Stories series will also feature technical topics targeting new and experienced developers with a focus on solving specific data processing problems. The presenters will include industry thought leaders in the stream processing space. Attendees can gain an inside look into how industry giants like Google, LinkedIn, Facebook, Uber, and Alibaba leverage stream processing to derive maximum business value.
If you, like us, seek to learn more, join the next Steaming Stories event. You can join our meetup group or participate in the discussions in our Slack community space. We look forward to hearing your suggestions for topics and speakers, as well as how we can improve Streaming Stories.
Upcoming Streaming Stories sessions include:
Vol 1: Cloud-Native Streaming SQL over Pulsar
Time: Thursday, February 23, 2023, 9 AM PT/6 PM CET
Speakers: Rayees Pasha (Head of Product at RisingWave Labs) and Tim Spann (Developer Advocate at StreamNative)
Future of Real-time Data Systems: A Recap of Current ’22
Get a comprehensive recap of Current 22 and dive into the future of real-time data systems in this blog. Discover this event's key insights and trends, providing valuable foresight for leveraging real-time data in your business strategy.
Austin, the Live Music Capital of the World! This city in Texas, USA, is becoming an emerging tech city due to the influx of tech giants in recent years. Thanks to its convenient transportation and geographical location (located in the central United States), Austin is also attracting more and more technology conferences. And I have been fortunate to visit Austin for the second time this year to participate in the top technology conference in the field of data systems - Current 2022. When it comes to the name Current, it may be unfamiliar to many of us. But its alias should be familiar to all engineers: Kafka Summit. Due to various reasons such as branding, the event organizer, Confluent, a giant in the field of data systems, decided to rename the Kafka Summit in 2022 to Current 22.
Texas State Capitol
Texas State Capitol
The focus of the Current conference is the real-time data system. Different from widely adopted big data platforms (such as Apache Hadoop, Apache Spark, and Apache Hive) or data warehouses (such as Snowflake, Redshift, and BigQuery), real-time data systems emphasize real-time storage and computations of data generated in real-time. In recent years, the field of real-time data systems has gained gradual recognition in the market with the rise of applications such as real-time reporting, monitoring, and tracking. Confluent, an industrial leader in this field, also successfully IPO’d in mid-2021, driving a new wave of real-time data system development and application.
Current Conference Sponsor Booth
Current Conference Sponsor Booth
A great way to understand where a field is heading is to see what startups are doing. During my two days at Current, I chatted with all of Current's sponsors. Apart from certain industry giants, most conference sponsors are startups. In this article, based on my understanding, I will introduce you to these startups to help you understand the future of real-time data systems. I will begin by classifying these companies based on their core products. Then I will share my observations and raise the potential risks I can imagine. And I will end by introducing each startup company's core product and business model.
Disclaimer: This article only introduces the startups that sponsored the Current conference, and this article attempts to ensure that the comments are objective and fair; however, remember these are my personal observations. This article does not constitute any investment advice on any specific company, but I do recommend you invest in this field for the long term 🙂
Classification of Startups
Based on the analysis of the various startup companies sponsoring the conference, I have divided the entrepreneurial direction into the following nine categories:
1. Commercializing mature open-source projects on the cloud
This category includes Databricks, Aiven, Conduktor, Imply, StreamNative, StarTree, Immerok, Factorhouse, and more. Such companies can be further divided into three development models:
The core teams start a business: such companies include Databricks, Imply, StreamNative, StarTree, Immerok, etc. One of their core selling points is "orthodoxy". These companies guide the direction of open-source project development through strong control from the core contributors. Then they try to convert open-source users to paying users through the community.
Non-initiative teams start businesses: such companies include Conduktor, Factorhouse, etc. These companies are founded by some active members of certain open-source projects, but not always founding members. They usually emphasize the multi-directional development of cloud platforms, looking for differentiation from unique perspectives such as security, visualization, and system integration.
The companies directly commercialize multiple open-source projects. Aiven is one example. By providing hosting services for different open-source projects on the cloud, such companies can reduce the sense of separation between different projects and provide users with a complete set of solutions.
2. Real-time ETL/ELT/data integration
This category contains Decodable, Striim, Airbyte, etc. Airbyte focuses on ELT, while Decodable and Striim are more toward ETL. One core selling point of these companies is the ease of use. For example, Decodable emphasizes that users only need to click a few buttons on the platform and then write SQL to realize real-time data transformation from one database to another database. and import.
3. Message queues
This category contains StreamNative, Redpanda, etc. Message queues are fundamental building blocks of a modern data stack. I believe that everybody knows Kafka, which is backed by the company Confluent, the organizer of this Current conference. Message queues proved their value over the past decade. But this does not mean that the development of message queues has come to an end. In the new cloud era, new challengers and products have emerged based on new technologies, such as the separation of storage and computing to reduce costs and increase efficiency.
4. Real-time API
This category contains Aklivity, Macrometa, etc. Such companies fill the gap from data source to data access, allowing users to access data that is ingested in real time easily.
5. Edge computing
This category includes Ably et al. — companies focusing on real-time data processing on edge devices, which are very close to the data source. A critical source of real-time data is the data generated by edge devices, such as sensors, mobile phones, and so on. In order to reduce latency, real-time computing needs to be carried out on near-source devices rather than concentrated in the data center on the cloud brute-forcibly. This also brings new demands and challenges to the basic software.
6. Real-time vertical industry SaaS
This category contains Clear Street, Bicycle, and more. Such companies focus on the application of real-time data analysis in vertical markets. The development of such companies proves that real-time computing can be adapted on a large scale in some specific industries.
7. Real-time high-level language framework
This category contains Quix, Meroxa, etc. This type of company aims at developers who use a specific programming language (such as Python, etc.). People unfamiliar with low-level languages (e.g., Java, Scala) can easily program their stream processing applications in high-level languages.
8. Real-time analytical database
This category contains Imply, StarTree, InfluxData, Rockset, FeatureBase, and many more. This type of product mainly focuses on improving the ability of real-time data ingestion in traditional analytical databases. Data ingestion latency is reduced, and the newly ingested data is visible to analytics in real time. These databases provide SQL interfaces that allow users to perform analytical queries on data like traditional databases.
9. Streaming database
This category contains RisingWave, Materialize, DeltaStream, TimePlus, and more. Like real-time analytical databases, they all provide SQL interfaces and are optimized for real-time data ingestion. But on top of those, streaming databases take a step further to support the real-time computation of data. Streaming databases integrate stream processing technology into databases and support query results' real-time and continuous updates through incremental computing.
The following table summarizes all companies introduced in this blog according to the year of founding.
Without a doubt, 2019, 2020, and 2021 were the golden years for launching real-time data systems. Before 2019, the field of real-time computing was far less hot than it is today. But in recent years, as traditional batch data processing has entered a bottleneck period, real-time data processing has begun to attract people's attention. The landmark event of Confluent's IPO in 2021 has brought new confidence into the real-time data processing industry. It is no exaggeration — real-time computing is one of the hottest markets today.
The field of real-time data systems has gradually transformed from being used by technology giants on a small scale to being used by the general public. If Confluent's IPO proves that enterprises need to store real-time data, we can say that many real-time data systems startups are now trying to prove that users also need to do computations on real-time data. These startups in this wave have different entry points, from real-time APIs to real-time analytical databases to streaming databases. Although they seem similar from a macro perspective, everyone is actually looking for segments to differentiate. For example, the real-time API field is more for application developers, while the real-time high-level language frameworks are more for data analysts and scientists. Regardless of the segment, we can see that real-time computing has ushered in the next wave.
Frankly speaking, I don't think the real-time data system field, or the data system field in general, is technically insurmountable. After all, everyone is playing a game of balancing performance and resources. I am very bullish on the real-time data system field. Of course, if I am not optimistic about this field, I will not start a business in this field. So, are there actually any challenges in launching a startup? Yes, there still are.
The biggest challenge comes from the immature market. Whether technology can usher in explosive development depends not on how advanced the technology is but on whether the technology matches the market demand. The concept of stream processing was first proposed in academia 20 years ago and landed in the industry almost a decade ago, but it has been in a tepid state for the past two decades. Although some technology giants have adopted such technology, it does not mean that various companies can widely use this technology. This is the typical technology-market mismatch. Today, in the field of real-time data systems, we clearly feel the market is heating up, but it will take time to be widely recognized and accepted, like Oracle in the enterprise market or Snowflake in the cloud computing field. I predict this time to be about 2-5 years. During this time, startups in this industry must spend a lot of time and energy educating the market. This investment is massive and full of unknowns. Of course, as we all know, challenges and opportunities coexist, and whoever can grasp the opportunity in the challenge will eventually become the market leader.
Next, let's list the startups that are sponsors of the Current conference. Here, I only included private companies established within the last ten years (i.e., companies established after 2012). As for those giants (such as AWS, Google, Microsoft, etc.), acquired companies or companies have lasted for more than 10 years; I will not introduce them one by one here.
Let me first introduce our company RisingWave Labs. We launched it in early 2021. Over the past two years, we have grown into a fully distributed team spanning seven time zones. RisingWave has been focusing on developing cloud-native streaming databases from day one. The core idea is to democratize stream processing, making it simple, affordable, and accessible. We hope users can develop their stream processing applications simply by operating a regular database. To do stream processing, the only thing a user needs to do is to create a materialized view. Similar to some popular systems today, RisingWave does not depend on the JVM ecosystem, making the deployment, operation, and maintenance very simple. The entire system is written in Rust from scratch, mainly because of the efficient and safe features of the Rust language. As an open-source project using the Apache license, RisingWave's commercial model is still providing cloud services. The private preview version has been released already, and the GA version is expected to be released next year. As a streaming database, users can process streaming data while also storing data. This also means that users can query directly on the database. Many people will think of the popular concept of unifying batches and streaming. RisingWave currently focuses more on stream processing, and the batch processing capabilities still depend on the project implementation priorities.
Databricks needs no introduction. With a valuation of up to 38 billion US dollars, starting with Apache Spark, this company is moving towards a unified lakehouse direction (integrating data lakes and warehouses). This year, Databricks also surpassed $1 billion in revenue. If there are no surprises, presumably, the company will have a successful IPO in the next two years. Databricks is steadily moving towards real-time computing. Spark Streaming is also one of their core development directions. This year Databricks has also announced the upcoming launch of the next-generation Spark Streaming engine Lightspeed. Since Lightspeed is not open-sourced yet, I won't elaborate on it.
Slower is a mysterious company. As a company founded in 2014, I could not find any concrete information on the internet, and its official website is just a simple company logo. After chatting with their employees, I learned they mainly provide various cloud and on-premise deployment solutions for multiple enterprises. The business covers databases, data platforms, data management tools, machine learning platforms, security, etc. In general, it is a well-rounded solution team. Because Slower is too mysterious, I will stop here.
Aiven is a Helsinki-based cloud service provider. Unlike many other startups, the business story is not about a core member of an open-source project, starting a company, or commercializing an open-source project. It is about providing cloud hosting services for various existing open-source projects! From Kafka to Flink, from ClickHouse to InfluxDB, from MySQL to PostgreSQL, they can provide hosting services as long as it is a mainstream open-source project. At first glance, it does not seem particularly competitive, but in fact, Aiven is coping with a big pain point for users, that is, the problem of software selection. Today there are too much data software, and they are too complex. To build complex applications, companies often have to compose a variety of software. Providing a complete set of cloud services will largely solve the problem of users choosing software. In addition, the unified management interface will make the connection between the software smoother, without a strong sense of separation; I believe this is also a good advantage.
Apache Kafka is a distributed streaming message storage system. As long as there is streaming data, Apache Kafka can be used for storage. However, it is not enough to have this storage system. We also need to deploy, maintain, and operate the system, monitor the system, and analyze and manage the data on the system. Conduktor is a company that does this series of things. Their tagline is "Streamline Apache Kafka". I thought it was highly similar to what Confluent did. After chatting with their CTO, I realized that they not only do Kafka hosting services but also analyze and manage data. For example, you can use their platform to know whether the quality of your data stored in Kafka is reliable, or you may want to query or monitor the data. Conduktor essentially takes Kafka as a data platform. Users no longer need to import data into downstream data warehouses or data lakes, but can process data within Kafka. I believe this is a good direction.
Keywords: data pipelines, data engineering, cloud services
Funding round: Series A
Last funding round year: 2022
Decodable was born in 2021 — hence is a rookie in the field of data engineering. The focus of Decodable is a very familiar problem: ETL. Countless platforms provide ETL capabilities, so what is the entry point for Decodable? The answer is straightforward. Decodable provides engineers with a simple and easy-to-use platform: through simple clicks and writing SQL code — you can import data from one platform (such as Apache Kafka and Apache Pulsar) to another (such as Snowflake and Redshift). And they provide a cloud service that allows users to connect to databases on the cloud without installing any software locally. I believe this is also an excellent product.
Engineers in the database field should not be unfamiliar with Imply. Imply commercializes Apache Druid, a well-known real-time analysis engine in the industry. The core objective of Apache Druid is to respond to random complex queries on large-scale data with low latency. Although many new startups have emerged in the field of real-time analysis in recent years, Imply still maintains a relatively leading position in terms of customer volume by virtue of its stable performance.
Materialize is the most direct friend of our company. Like what we do, the core product of Materialize is also a streaming database. At Current, they finally released their long-awaited product: a cloud-native streaming database. Although Materialize has been building a streaming database based on the Timely Dataflow open-source project since 2019, for a long time, Materialize has always been a single-machine database running on pure memory. Hence its availability may encounter considerable challenges in real production environments. However, this new version is worth looking forward to and hopefully will make everyone's eyes sparkle.
StreamNative was founded in 2019. Although it has not been long, StreamNative has an outstanding reputation in the open-source and infrastructure community. Their core product is a commercial version of Apache Pulsar. Since open sourced in 2016, many companies worldwide have adopted Apache Pulsar. Apache Pulsar and Apache Kafka have two significant differences as message queuing systems. Compared to Kafka, which focuses solely on storing event data, Pulsar also pays attention to the message data generated within the application. Pulsar is also more cloud-native, and its separate storage and computing architecture can make the entire system more scalable. As for commercialization, StreamNative is also currently focusing on providing services to users on the cloud.
Ably is a London-based company that provides message queuing services on the cloud. Speaking of the message queue service, you may think that Ably’s product is similar to Kafka or Pulsar. Yes, Ably's products are identical to Kafka in terms of category, but the most significant difference is edge computing. Kafka is usually deployed in a company's data center. Through Kafka, we can obtain message data in a centralized way. Ably focuses on the edge. To achieve millisecond-level latency, you can deploy their product on the edge cloud and process data directly on the device side, such as mobile phones, sensors, and tablet computers. Ably has been established for six years, and the cumulative financing has exceeded 80 million US dollars.
Acceldata is a comprehensive data observability platform in the cloud. The field of the observability platform has been really hot in recent years. Aside from market leaders — Splunk and Datadog —there are also various other startups working in this field. Based on conversations with Acceldata folks, I would think Acceldata is an observability platform that does everything. But I needed a better understanding of how they differ from Datadog etc. After I dug deep, I found that they are more concerned about the observability of various systems on the so-called "modern data stack" rather than the observability of the machine itself or traditional applications such as CI/CD. What they observe is slightly different from the focus of other companies.
Aklivity is a startup with only three employees (including the founder himself)! I chatted with all three of their employees at a cocktail party. They told me they are working on an open-source API tool called Zilla. It has raised $4 million in the seed round. More specifically, what they are doing is a real-time API gateway. Simply put, when users use Kafka, depending on the device or application, they may choose different interfaces to connect to Kafka, which is relatively troublesome. Zilla, developed by Aklivity, is essentially a unified encapsulation on top of Kafka, so that different applications can access Kafka in the same way.
At Current 2022, we were pleasantly surprised to discover some SaaS products that provide real-time analytics capabilities. Bicycle is one of them. Bicycle is not selling a real-time analysis engine or storage engine; it provides real-time data monitoring, alarming, and analysis functionalities for customers. For example, for an e-commerce platform company, Bicycle can analyze and predict possible future sales through past sales data and use such data to manage its revenue. Employees from Bicycle revealed that they developed their core engine and used machine learning methods to analyze sales data. As the underlying systems for real-time analytics are gradually improved, I feel there will be more and more SaaS startups like this.
Clear Street is a New York-based fintech platform service provider that provides a cloud-based securities trading service. Traditional securities brokers, such as banks, have out-of-date infrastructures, opaque information, and low efficiency. Clear Street saw this opportunity and wanted to revolutionize the field with cloud computing. If Robinhood is a trading platform for retail investors, then Clear Street is Robinhood for professional institutions, in my opinion. Through Clear Street, users can not only trade securities but also perform real-time data analysis through a simple interface. In 2021, the average transaction volume processed in a single trading day on its cloud platform had reached 3 billion US dollars.
When I first saw Cube's booth, I thought they were a BI visualization company, but it turned out that they were not. What Cube does is a layer between data storage systems (such as databases, data warehouses, etc.) and visual BI tools. It solves several problems:
Unified caliber: When users query data from different data sources, the type, unit, and representation of data may not be uniform, and Cube can provide a data model to solve this problem;
Access permission: Administrators can set different permissions for different users through Cube to display different reports for different users;
Cache: Every time you fetch data from a BI tool to the underlying data storage system, there is always a lot of access overhead. Cube provides a layer of caching to solve this problem;
Different APIs: You don’t have to worry about the different APIs of underlying data storage systems.
Overall, I think the Cube is a very thin and easy-to-use tool, and many companies should like it.
Immerok is probably the youngest company to sponsor the Current conference. However, what they do may be familiar to people who do real-time analysis: commercializing the Flink system in the cloud. When it comes to Flink-related companies, you may think of Ververica, which Alibaba acquired in 2019. Immerok's relationship with Ververica is extraordinary: Almost the entire founding team of Immerok is from Ververica. Since its establishment in the first half of this year, Immerok has completed a seed round of 17 million euros. Unlike Ververica, which offers on-premise deployments, Immerok has put its entire focus on cloud services. It appears this is the future trend.
InfluxData is already very famous — it does not need too much introduction. The main product, InfluxDB, is a mainstream time series database in the industry. As a commercial open-source software company, InfluxData uses the most relaxed MIT license to open source its core code. But what's interesting is that only the stand-alone version is open sourced, and the distributed version is completely closed-source with applied charges. InfluxData is also recently rewriting its system kernel in Rust. It seems that rewriting a system with Rust is quite common in the industry.
Macrometa provides real-time data API services. What exactly is a real-time data API service? Essentially Macrometa can be regarded as a global real-time multimodal database. Users write to the database through API or connect to real-time event sources. The underlying system achieves global real-time synchronization through CRDT. Users can implement data read, write and cache services just like using a distributed database across multiple regions and can directly query the data written in real time. Compared to other real-time data services, Macrometa has two specialties. First, Macrometa has encapsulated the underlying technology into services, and the upper layer provides a very rich data model and API, such as key-value store, document database, graph database, and pub/sub; second, Macrometa is very focused on edge computing. Users who have a large demand for edge computing can simply access real-time data by accessing their API.
If you look at the official website of Oxylabs, you will find that their core business is network proxy, such as IP address proxy and data center gateway. This seems to have nothing to do with real-time systems. But if you take a closer look, you will find that another pillar of Oxylabs' business is real-time SERP scraping. I was relatively unfamiliar with this term before, so I did my research, and let me briefly introduce it. SERP refers to the Search Engine Result Page, and SERP scraper refers to the automatic tracking of the query results of some keywords in the search engine. Such results include advertisements, related queries, web page rankings, etc. And those results are returned to users in a structured format. SERP scraping is mainly used in the market department to analyze the SEO of its own products and competitors. The core market value here is to provide SaaS services that lower the technical bar. Oxylab takes a step further by providing automated real-time SERP scraping. This requires a comprehensive real-time data stack, from data crawling to real-time analysis to sending to users. Oxylabs has been established for seven years, yet it has no public funding records. Nevertheless, the company has expanded from Lithuania to the globe. This also shows that the application of real-time data is really of high value.
The previous stream processing platform was designed for programmers familiar with low-level APIs. They only provided language interfaces such as Java. However, for data scientists, the dominant programming language is actually Python. Here comes an opportunity: how to simplify stream processing for users who are more familiar with Python, such as data scientists and engineers. Quix is a stream processing platform mainly for other high-level languages such as Python. Quix was founded by several data scientists from McLaren (you read that right, the company that sells luxury sports cars) in the UK. It has been a stream processing platform for data scientists and engineers since its inception. It relies on message queues such as Kafka for data input and output but provides streaming data processing services hosted on the cloud. In addition to Python, I also noticed from their official website that they have added support for C#.
Redpanda competes directly with Apache Kafka. If Red Hat is a commercial distribution for Linux, then Redpanda is a commercial distribution for Kafka. The interface of Redpanda is fully compatible with Kafka. Compared to Kafka, Redpanda is mainly cost-effective: it claims to be ten times more performant than Kafka, while the hardware efficiency is more than six times higher. As a C++ project, Redpanda also has a big selling point: to completely abandon the JVM dependency. When installing Redpanda, users no longer need to install JVM ecological components such as Zookeeper. I highly agree with this idea. In the era of big data, the complexity of installation and maintenance of the Hadoop ecosystem is too high. Today, it is clear that a minimalist deployment and operation and maintenance environment will be a solid competitive advantage compared to existing technologies in the era of big data.
Rockset is a real-time analytics database. There are already many open-source products In this field, yet Rockset is one of the few closed-source products. In the early days, Rockset was not actually for real-time analytics (OLAP). The Rockset founding team is the same bunch of people who built RocksDB and HDFS on Facebook. And indeed, their products are developed based on RocksDB. I have been following their products since they were only 10 people or so. I still remember that what they did in the earliest days was actually SQL on raw data, which is to query raw data (such as semi-structured data and JSON). After that, it gradually became the so-called indexing database. The product was fully positioned as a real-time analysis database in the past two years. The most attractive point for me is that they can predict from that time in 2016 that the future data will be in the cloud, and more and more raw data will be saved. Looking back at the products they made from this point in time, there is no doubt that they are ahead of their times.
StarTree is a rookie in the real-time analytics database field. Although it was founded recently, it has already gained good attention in Silicon Valley. In addition to products, their VP of DevRel — Tim Berglund — has also attracted much attention. StarTree's core business is commercializing Apache Pinot, an open-source real-time analytics database. Pinot focuses more on high-concurrency queries than other real-time analytics databases, which is also required in many user interaction scenarios (such as the "Who's viewed your profile" application on LinkedIn). In addition to selling real-time analytics databases in the cloud, StarTree has a SaaS service called ThirdEye that uses Pinot for data anomaly detection. This reflects a trend — infra companies are developing their SaaS layer.
Striim is a company specializing in data integration. It was founded by the original Oracle GoldenGate team. GoldenGate was acquired by Oracle in 2009, and their team focused on data import and export business for the Oracle database. Striim's current core business is the same as what GoldenGate did: database-to-database data integration solutions. Due to its relatively early establishment, Striim was still mainly privatized deployments in the early days. But in recent years, with the rise of the cloud, they have also expanded their business to cloud services. What's impressive about the company is how friendly its products are to the data ecosystem. Striim Cloud supports all mainstream databases, data services, and cloud platforms. The company has even developed dozens of connectors and released them in the application market of cloud manufacturers, which greatly reduces the complexity of user access.
Keywords: real-time data analysis, real-time application
Funding round: Series B
Last funding round year: 2019
Swim's product is mainly to help developers build and manage real-time applications based on streaming data. Their open-source product Swim OS provides a framework for building real-time applications. In contrast the commercial product Swim Continuum enables stream data source management, real-time analysis and display based on stream data, and monitoring of application running status. It combines application monitoring and business analytics into one platform. This company provides not a single service (middleware), but a set of real-time data processing and analysis solutions for business users.
Tinybird is a real-time data analytics company. There are various data sources, but in order to build an application, the data used by the application side still needs to be accessed through API. What Tinybird builds is the bridge from the data source to the API. Their products support multiple types of data sources, developers can use SQL to transform and process these data, and then expose the used queries through API interfaces. Downstream applications only need to call these API interfaces to instantly access the latest data, and there is no need to build complex data pipelines. From a technical point of view, Tinybird uses the currently popular Clickhouse for data processing.
Airbyte is a Silicon Valley-based, fast-growing data integration company. Their product can be seen as an open-source alternative to FiveTran. Specifically, Airbyte is a data connector that supports the connection from multiple data sources (applications, APIs, message streams, databases, etc.) to target data systems (databases, data warehouses, data lakes, etc.). Unlike many stream computing systems, which require cleaned and structured data, or complex data cleaning logic written by engineers, Airbyte directly supports end-to-end data integration. Through simple SaaS-style configuration, data exchange between more than 100 different systems can be implemented easily. Thanks to the active contribution of the open-source community, the number of Airbyte data connectors has exceeded 150. The official development kit is also provided, and developers can complete the development of custom connectors without spending too much time (the official claim is within 30 minutes). Airbyte is very popular due to its ease of use as well as rich documentation and support resources.
CelerData is a new company founded in the United States by StarRocks, a real-time analytical database startup. StarRocks is a commercial product stemming from the open-source project Apache Doris. Similar to Rockset, StarTree, Imply, etc., StarRocks can efficiently handle complex analytical requests. Its interface is compatible with MySQL, and in terms of performance, it claims to be able to significantly outperform similar products.
DeltaStream is a stream database company founded at the end of 2020. Its founder Hojjat Jafarpour is also the founder of Confluent's KSQL project. DeltaStream provides a serverless streaming database to manage and process data streams in real time. DeltaStream itself does not contain a storage module, but considers streaming storage platforms such as Kafka and AWS Kinesis or static data sources such as AWS S3 as a storage layer. It allows users to read data from one or more data sources, perform computations and simultaneously write the result across different storage components. DeltaStream internally uses Apache Flink SQL as the engine.
Factor House (also known as http://Operatr.IO before the rebranding in September 2022) is a three-member (3!) team based in Australia. Its CEO and COO are Derek Troy-West and Kylie Troy-West respectively. The main product Kpow is a web visualization tool specially designed for Apache Kafka, which can help enterprise users better manage and monitor Kafka resources. Kpow enables users to visualize, retrieve, and export real-time data, greatly improving Kafka's observability and ease of maintenance. Kpow can also easily manage all Kafka clusters and topics without using complex command lines. After chatting with its team, I learned that they started developing the platform at home during the epidemic. So far, they have not received any funding, but they already have customers.
LakeFS is developed by Treeverse. It allows users to manage data in the data lake like code: branch, commit, merge, and revert are all a small piece of cake. Data lakes, especially super-large data lakes, are very difficult to manage. The object storage system it relies on lacks critical features such as atomicity, rollback, and recurrence, resulting in reduced data quality and recovery. In the past, when we used data lake, we often created a copy of the production environment to test data changes in the copy first and then applied changes to the production environment when it was ready. But the problem is this method is very time-consuming and expensive, and it is difficult for many people to work together. LakeFS transforms object storage into a Git-like repo, without duplicating any data, supporting multi-person collaboration, injecting only secure data, and reducing the occurrence of errors. Even if errors occur, the corrupted data can be directly rollbacked atomically in the production environment.
Memgraph is a low-latency, high-performance in-memory graph database that handles transactional and analytical graph tasks well. Memgraph can analyze data from multiple data sources and discover potential connections between them, allowing users to apply graph algorithms to analyze and then build their own real-time applications. CEO Dominik Tomicevic mentioned that the most typical users of Memgraph are from the chemical industry, manufacturing, and financial industries. They all have one thing in common: they need to obtain real-time analysis from scattered data.
With Meroxa, users can build, test, and deploy real-time data applications in days. Meroxa is developer-centric, code-first tooling that lets software engineers maximize their time spent building data products as opposed to maintaining fragile data systems that weren’t designed for developers. Meroxa's goal is to help developers focus on building applications with real-time data~~,~~ rather than automating repetitive operational functions. Their vision is to make Meroxa the industry-leading Data Application Platform as a Service (DAPaaS).
FeatureBase is based in Austin, Texas, and was renamed from the recent merger of Molecula Corporation and the Pilosa Project. Its product is an OLAP database that uses bitmaps for data indexing. Specifically, FeatureBase converts the data stored in traditional OLAP columns into features based on bitmaps, thereby achieving better read and write performances and resource efficiency. At the same time, FeatureBase regards streaming data as an important focus, emphasizing the data freshness of streaming updates brought by bitmaps. FeatureBase can index structured data well, but it can't do anything for unstructured data. Its products have two service modes — open source and cloud services — and support two data access interfaces, SQL and its custom PQL.
Nussknacker is a visualized real-time analysis tool. Its target users are managers, analysts, and others accustomed to using interactive tools such as Excel. Users can build analysis and processing logic on the data streams through the visual operation in a web page without writing code. For simple analytical queries, Nussknacker uses Kafka as the main input stream and output stream interface and develops its lightweight engine to perform simple stream processing operations. In contrast, advanced complex aggregation operations will be processed on Flink. Nussknacker lowers the threshold for building real-time data processing analysis. Business teams can deploy and test business processing logic without the need for code writing or the help of professional developers.
Timeplus is a company founded by a group of senior experts from the Splunk engineering team. Their core product is a streaming database on the cloud. Users can register and apply for beta version access on the official website. At first glance, the interactive interface is a good experience. Their product so far is still wholly close-sourced. This business model allows the team to focus more on commercialization.
Thanks for being here. This blog is a comprehensive overview of the entrepreneurial development direction in the field of real-time data systems. I believe this is also the most exciting direction in data infrastructure in recent years. If you are interested in this direction or the RisingWave open-sourced product and cloud product, do not hesitate to contact us. I believe that real-time data systems will usher in leaps and bounds in the near future.
My talk at Current conference
My talk at Current conference
In fact, when I went to the Current conference this time, in addition to chatting with colleagues and promoting RisingWave, I also gave a technical talk. The title of the talk is "Rethinking State Management in Cloud-Native Streaming Systems". This talk is all about technicalities. It introduces some internal implementations of the RisingWave system. If you are interested, you can check out my presentation slides here. Meanwhile, the full video is also available on the Current conference website. Also, the RisingWave source code can be accessed here, and the cloud private preview is here. Please check it out and leave your comments!
Kubecon + CloudNativeCon: Top 3 Reasons for Attending
Uncover the top 3 reasons why attending KubeCon + CloudNativeCon is a must. Explore this blog to understand the immense benefits of this premier event for cloud-native technologies and stay ahead in the ever-evolving world of Kubernetes and cloud-native ecosystems.
What is it all about
The Cloud Native Computing Foundation brings Kubecon, its flagship conference, to promote education and advancement of cloud native computing. Kubernetes, Envoy, and Prometheus are just a few of the rapidly expanding cloud-native projects that the Cloud Native Computing Foundation supports, oversees, and directs as a member of the Linux Foundation. The main goal of Kubecon is to bring together early adopters and tech experts from the top open-source and cloud-native communities. This time it's happening from October 24 – 28, 2022, in Detroit, Michigan.
Join renowned technologists from the most influential open-source and cloud-native groups in a unique hybrid environment if you're a fan of cloud-native computing. Interact, connect, and collaborate with peers and like-minded individuals in the cloud-native community. You can register here and browse the schedule to find sessions of your interest.
As a company, RisingWave Labs set the bold strategic goal of democratizing stream processing with its cloud-native architecture to maximize the efficiency of cloud resources. Our mission is to make stream processing more accessible to everyone as data streaming technologies become widely used data technologies. With this in mind, we open sourced our flagship project 'RisingWave' earlier this year. The opportunity to discuss RisingWave with the CNCF community at this conference is excellent. As active participants in the Kubernetes community, our engineering team has created Kubernetes Operator to manage RisingWave on Kubernetes. You can learn more about RisingWave Operator on our blog.
At RisingWave Labs, we believe the CNCF community represents the principles that align very well with our core project 'RisingWave.' As a sponsor for this event, we have three clear objectives:
Interact with technical and business leaders from the CNCF community to understand the latest trends that will help mold the fast-developing cloud-native ecosystem.
Engage with a wide cross-section of attendees, including the industry's top developers, customers, established vendors, and cutting-edge startups, to advance the cause of the Modern Streaming Data Stack.
Share information about our open-source project 'RisingWave' with the CNCF community and invite collaboration and contributions.
How to find us
If you're attending the conference in person, find us at booth SU26. Don't miss the opportunity to watch some of our latest demos or engage in conversations with our senior engineers. And indeed, don't miss picking up some fun swag.
We will be hosting two office hours and a cloud live demo — an amazing opportunity to learn firsthand what RisingWave is all about and how RisingWave Cloud will help you build your streaming applications at a low cost.
Take advantage of the key takeaways from RisingWave Labs' attendance at Current 2022, where global data streaming enthusiasts gathered. Explore this blog to uncover the four emerging standout themes, providing valuable insights into the evolving landscape of streaming technologies.
Recently, RisingWave Labs’ team attended Current 2022, where over 2,000 people from around the globe gathered to see what is new and exciting in the world of data streaming technologies. There was a wide range of sessions and events available — attendees could get a taste of what is happening in the world of streaming. Broadly, four themes stood out for us:
Data Streaming is Ubiquitous
Data streaming is indeed everywhere. From anomaly detection in IoT devices to feature engineering to facilitate online ML, data streaming is applied in most modern applications in every conceivable business vertical. As the premier event for streaming, it showcased that streaming workloads are no longer a niche but a must as part of any data platform. New, dynamic data is generated continually; this requires real-time actions with almost instantaneous latency. Users are moving past the initial phase of streaming adoption, and the conference reiterated this fact. Streaming technologies are no longer solely focused on stream ingestion. Today, every business wants to create a new layer of business insight and control. With a data-driven, data-centric, and data-derived approach, businesses can analyze their data streaming pipelines in real time to provide a granular and accurate understanding of what's going on. Hence, it is no surprise: our team had several stimulating discussions with advanced users about the next stage of growth, which leads to the next theme.
Stream Processing is Growing
Stream processing is a big deal, and it goes beyond ingestion. The earlier mindset of using data stream just as an ingestion mechanism doesn't do justice to the potential of the data streaming field. The question on everyone's mind is: What do we do after we have set up the infrastructure to ingest the data? The obvious answer is to process data closer to the source. Processing the data in flight is becoming a must to derive the most value out of data. Stream processing is designed for immediate data processing and real-time analytics. The purpose is to enable you to respond to critical events by providing millisecond-level insights into what is happening within a system. Different technologies have put out many different approaches. The use cases have an influence on some of the distinctions. Clearly, stream processing is growing; everyone wants to reap the benefits. Today's consumers and businesses anticipate real-time action as a given.
Streaming Databases are the Next Frontier
The conference featured a number of sessions on streaming databases, streaming analytics, and real-time analytics. One approach that gained traction is using the full-featured database to not only process and store the streaming data but also to serve the results to the user applications directly. Several vendors are building solutions in this area. The audience's volume of inquiries on this subject revealed a keen and serious degree of attention. Some of the questions we received were to clarify the similarities and differences of our architecture vis-à-vis other vendors in the space. Our prediction is that this is a new frontier for data streaming. Streaming databases shorten the data pipeline cycle significantly. These systems provide the best opportunity to harness insights for event data with short shelf life. Streaming databases also help provide a unified solution like a traditional database for running an application.
Shameless plug: RisingWave is a streaming database. With RisingWave, stream processing is no longer the preserve of the few. You can easily ingest data across multiple data sources, run analytics and gain insight from seeing how things are happening in real time. RisingWave is the next-gen cloud-native streaming database. And RisingWave Cloud service is now available in private preview.
It’s about Interconnection
This conference is about making connections. The connection with partners and potential clients is fundamental to the growth of this movement, just as establishing a data pipeline necessitates connecting one service to another. Having a large number of streaming enthusiasts all in one place is an invigorating experience. This has inspired us to double down on our vision to democratize stream processing. We hope all attendees enjoyed the conference just as much as we did. Let's connect via Slack and start a dialog. Follow us on LinkedIn & Twitter for the latest updates.
We hope all attendees enjoyed the conference just as much as we did. Let’s connect via Slack and start a dialog. Follow us on LinkedIn & Twitter for the latest updates.
We encourage you to try RisingWave. One of the best ways to do that is with RisingWave Cloud. It takes only 10 minutes to get started, and it’s free.
sign up successfully_
Welcome to RisingWave community
Get ready to embark on an exciting journey of growth and inspiration.
Stay tuned for updates, exclusive content, and opportunities to connect with
message sent successfully_
Thank you for reaching out to us
We appreciate your interest in RisingWave and will respond to your inquiry as soon as possible.
In the meantime, feel free to explore our website for more information about our services and offerings.
Welcome to RisingWave community
Get ready to embark on an exciting journey of growth and inspiration.
Stay tuned for updates, exclusive content, and opportunities to connect with