Imagine this: a data engineering team, after years of dedicated effort, builds a sophisticated real-time streaming platform, a robust data warehousing system, and countless dashboards. But as the business expands, they're not satisfied with traditional metrics alone. They aim to dive deeper into data analysis using AI/ML.

MindsDB steps in as the solution to meet this growing need. In essence, MindsDB acts as a missing link. It seamlessly ties together databases and model training. This empowers AI & data teams to streamline their workflow.

Close
Featured Showcasing the seamless connection of databases and AI/ML model training for streamlined data analysis and workflow optimization.

One of MindsDB's standout features is its support for a wide range of data sources. By standardizing the data access layer, it eliminates the necessity of juggling multiple tools for data extraction.

In line with this vision, MindsDB's integration with RisingWave becomes an organic fit. For developers, the collaboration between RisingWave and MindsDB brings about distinctive advantages:

  • Training your streaming data: RisingWave’s fundamental concept of materialized views grants users the capability to access streaming processing results. Before diving into model training, users often perform real-time data cleaning, transformation, and joins across multiple data sources to enrich the data. RisingWave consolidates this preprocessing, while MindsDB effortlessly reads from RisingWave using the Postgres interface. With just two systems in this setup, system maintenance becomes remarkably streamlined.
  • Decoupling model training workloads from other operations: In modern data stack, a system is typically tailored to handle a specific type of workload, like OLAP analysis, stream processing, or OLTP processing. Injecting model training workloads directly into the core online database can impact its stability. RisingWave’s storage-compute decoupled architecture stores data in remote object storage (e.g., S3), and allows for dedicated compute nodes for query serving. This physical isolation ensures that background stream processing remains unaffected by model training, guaranteeing stability without interference.

Next, we'll illustrate how the combined force of RisingWave and MindsDB tackles real-world challenges.

Close
Featured Demonstrating the practical application of RisingWave and MindsDB.


Demo 1: Helping Renters Understand Expected Rent for Ideal House


The aim of this demo is to analyze all housing rental records and summarize the relationship between prices and property types. Ultimately, when you provide conditions for the ideal house type, our model will be able to infer the expected rent.

  • Prepare the data: Firstly, we recommend launching RisingWave using Docker Compose and MindsDB for this demo. If you wish to try it hands-on, you can use the environment we've set up for you here. Next, imagine the data is ready and let’s import some data into RisingWave using the following table schema 😉:
CREATE TABLE home_rentals (
    number_of_rooms integer,
    number_of_bathrooms integer,
    sqft integer,
    location varchar,
    days_on_market integer,
    rental_price integer
);
  • Next, create the MindsDB model. We'll use the Postgres connector to access it:
CREATE DATABASE example_data
WITH ENGINE = "postgres",
PARAMETERS = {
  "user": "root",
  "host": "risingwave-standalone",
  "port": "4566",
  "database": "dev"
};

CREATE MODEL mindsdb.home_rentals_model
FROM example_data
    (SELECT * FROM home_rentals)
PREDICT rental_price;


At this step, we've trained a regression model based on the rental_price column (note the PREDICT keyword). The model's training won't finish immediately after the command ends; it might take some time to complete, depending on the data volume and machine performance. You can check the model's status via the DESCRIBE mindsdb.home_rentals_model command.

  • Finally, let's deduce the rent 💰 based on property requirements.
SELECT rental_price FROM home_rentals_model 
WHERE number_of_bathrooms = 2 AND sqft = 1000;

--- rental_price
--- 3968


Demo 2: Forecasting the Future


It's likely that most scenarios utilizing RisingWave involve dealing with time. Similar to the previous steps, we're preparing data, training models, and making predictions. However, in this case, we'll leverage MindsDB's time-series model.

CREATE MATERIALIZED VIEW house_sales AS
SELECT
  saledate,
  ma,
  type,
  bedrooms
...;

Let's imagine that we've already processed both the house-type table and sales records in RisingWave, resulting in a materialized view named house_sales. This view contains the Moving Average (MA) values reflecting the past quarter sales.

In the subsequent steps, we'll have the model forecast future trends, specifically, predicting the MA for the next 4 quarters based on the saledate.

CREATE MODEL mindsdb.house_sales_predictor
FROM example_data
  (SELECT * FROM house_sales)
PREDICT ma
ORDER BY saledate
GROUP BY bedrooms, type
WINDOW 8
HORIZON 4;

Once the model is successfully created, we can query it on demand.

SELECT m.saledate AS date, m.ma AS forecast
FROM mindsdb.house_sales_predictor AS m
JOIN example_data.house_sales AS t
WHERE t.saledate > LATEST
AND t.type = 'house'
AND t.bedrooms = 2
LIMIT 4;

conclusion

Stream data processed by RisingWave unlocks high analytical potential for many businesses. RisingWave’s architecture decouples storage from compute. This design makes it feasible and stable for MindsDB to train models on top of RisingWave. Additionally, MindsDB integrates flawlessly with RisingWave thanks to RisingWave’s robust PostgreSQL support.

As real-time stream processing and AI/ML continue to accelerate, the synergy between RisingWave and MindsDB promises to empower data engineering teams with even greater value.

Avatar

Tao Wu

Product Manager

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.