Company Background


Tencent Cloud is the cloud computing arm of Tencent, one of the world's leading technology conglomerates. Since its inception in 2013, Tencent Cloud has offered a wide and integrated range of cloud services that cater to various business needs, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). With a vast array of solutions such as computing, data storage, content delivery, and artificial intelligence, Tencent Cloud is committed to providing robust, secure, and scalable platforms to both individual developers and enterprises across different sectors. Its global network and presence allow clients to harness the benefits of cutting-edge cloud technology, while also ensuring performance, reliability, and cost efficiency.

Tencent Cloud's infrastructure engineering team saw an imperative need for a more advanced Quality of Service (QoS) framework. Addressing large-scale cloud deployments, the challenge was to devise a system that combined real-time monitoring, instant alert mechanisms, and efficient scheduling strategies.

Technically, this was not a straightforward feat. The scope involved managing real-time metrics from an infrastructure spanning tens of thousands of machines. This demanded a resilient data ingestion pipeline for high-throughput metrics, a swift alerting mechanism for anomalies, and a dynamic scheduling algorithm for optimal resource allocation, ensuring consistent performance across Tencent Cloud's vast network.


Situation Before Using RisingWave


Initially, the team had architected a system that, on paper, appeared robust and future-proof. They used Kafka, the gold-standard in the world of streaming brokers, ensuring data flow consistency and fault tolerance. Flink, with its unparalleled stream processing capabilities, was the backbone of the system, offering both low-latency and stateful computations.

MySQL was serving a dual role — acting as an external sink while also shouldering responsibilities as the chief operational database. This choice underscored the balance the team aimed to strike between reliability and real-time data operations.

Close
Featured Data stack before using RisingWave.

Central to this architecture was an event-driven state machine, formulated in SQL. Each incoming event triggered a systematic workflow: pulling the current state of the stream job, processing this influx in a data-centric manner, and subsequently updating the outcome with precision and speed.

To accomplish this intricate dance of data, the team delved deep into Flink's advanced features, notably its lookup join capability. This mechanism allowed for a dynamic interplay between the ceaselessly streaming events and the MySQL database, ensuring that real-time updates weren't just an afterthought but an integral part of the system's design ethos. The intention was clear: to foster an ecosystem where data integrity and real-time responsiveness coexisted seamlessly.


Challenges Encountered


As the team delved deeper into the integration and optimization of their system, they encountered a series of unforeseen challenges.

  • Performance Issues: Initially, Flink combined with lookup join showed great potential. However, as scalability requirements increased, a marked decline in TPS performance became evident, particularly during the sink integration phase.
  • Debugging Complexities: The team grappled with intricate SQL formulations. Those involving multiway joins and deeply nested subqueries were especially problematic. Gaining a comprehensive insight into the state machine components was crucial, but Flink's structure offered no straightforward pathway for diagnosing these sophisticated queries.
  • Financial Constraints: Economic concerns began to loom as the system expanded. Flink's dependency on block storage, specifically through RocksDB for persisting streaming states, led to mounting costs. The escalating volume of streaming states correlated directly with rising expenses, signaling potential unsustainability.


Why RisingWave?


Recognizing these challenges, the team turned to RisingWave, known for its capabilities of both streaming processing and database serving. After a thorough evaluation, they initiated a proof of concept with RisingWave. The results were transformative. The pipeline became more streamlined and efficient. RisingWave’s compatibility with Kubernetes and S3 object storage was a game-changer. By integrating with TKE (Tencent Kubernetes Engine) and TOS (Tencent Object Storage), the team could efficiently manage the RisingWave cluster and handle the increasing traffic.

Close
Featured Use RisingWave to power Quality of Service (QoS) framework.
  • Unified Streaming Database: RisingWave's innovative streaming database technology was a revelation. It seamlessly merged static and dynamic objects into a singular relational concept. This not only facilitated smooth operations but also negated the need for external data sinking, leading to increased efficiency. Defining the source, views, and tables are no difference from defining a regular Postgres table, and it was just straightforward to develop complicated streaming joins, aggregations, filters using SQL on top of them.
  • Transparent SQL Development: RisingWave's PostgreSQL syntax offers data engineers an easier way to define streaming jobs and query streaming results. Debugging, even for nested subqueries and intermediate streaming states, became more straightforward.
  • Ecosystem Integration: RisingWave's adaptability was evident in its seamless integration capabilities. Thanks to its compatibility with the Postgres Wire protocol, it could be effortlessly integrated with a plethora of data management tools, including the likes of Azure Data Studio.


RisingWave in Production


Transitioning RisingWave into the production environment wasn't merely about addressing initial pain points. It also meant gauging its performance, scalability, and adaptability in a high-throughput, real-world setting. As the data began flowing and systems interfaced with RisingWave, the team began to unpack its intricate engineering capabilities and the profound implications they had for Tencent's infrastructure.

  • Performance Enhancement: With RisingWave's integration, Tencent's systems witnessed not just an incremental but a logarithmic leap in TPS performance metrics. This was a direct reflection of RisingWave’s meticulously engineered data handling algorithms and its optimization of I/O operations within the data streams.
  • Simplified Maintenance: The embrace of RisingWave’s Postgres-inspired syntax was akin to handing engineers a Swiss army knife. Complex maintenance tasks, which previously required multiple toolsets and scripts, were consolidated, making pipeline upkeep and alert rule modifications more deterministic and less error-prone.
  • Efficient Data Tracing: RisingWave's support for chained materialized views was a paradigm shift in data management. Beyond just facilitating shared logic, this feature ensured data lineage was preserved and traceable. It meant that engineers could, at any point, trace back computations, transformations, and aggregations to their original data sources, ensuring a holistic understanding of the data lifecycle.
  • Scalability and Robustness: One of the unsung benefits of RisingWave was its inherent design for scalability. Whether it was handling sudden data influxes or ensuring fault tolerance in distributed settings, RisingWave demonstrated resilience. Its distributed architecture, combined with load balancing and failover mechanisms, ensured Tencent’s data streams remained uninterrupted and agile.
  • Optimized Resource Utilization: RisingWave’s intelligent resource allocation and task scheduling meant that Tencent could optimize its hardware and cloud resources. By minimizing unnecessary computations and prioritizing critical tasks, RisingWave ensured that Tencent got the best bang for its buck in terms of computational power and storage costs.

Conclusion

In a digital era, where data speed and accuracy is paramount, RisingWave proved its mettle, transforming Tencent Cloud’s QoS system into a powerhouse. Through seamless integration, cutting-edge technology, and financial prudence, RisingWave has set a gold standard in cloud streaming solutions.

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.