As cities continue to grow and urbanization rates rise, efficient parking management has become a crucial challenge for municipalities and businesses alike. With limited space and increasing demands, maximizing parking lot utilization is key to reducing congestion, minimizing emissions, and enhancing the overall experience for drivers.
Enter RisingWave, a streaming database that can revolutionize the way we approach parking lot utilization. By ingesting real-time data from various sensors, cameras, and other IoT devices, RisingWave can process and analyze this information on-the-fly, enabling intelligent decision-making and dynamic resource allocation. Through data transformation capabilities, RisingWave can identify patterns, predict demand, and optimize parking space allocation, ensuring that every available spot is utilized effectively.
In this blog post, we’ll look at two scenarios where RingWave can be used to monitor the parking space utilization in real time. We’ll also look at techniques that we can use to optimize the performance of RisingWave in this scenario.
Schema
Assume that the signal data is stored in a Kafka topic. And the parking lot data is stored locally in RisingWave.
Let’s look at the schema of the sample data.
CREATE TABLE signals (
start_at timestamp, -- when does the parking space get occupied
end_at timestamp, -- when does the parking space get freed (can be null)
last_updated_at timestamp, -- when the signal is generated and sent
space_id int, -- the id of the parking space
level int -- the level where the parking space is at
) with (
connector = 'kafka',
topic = 'signals'
);
CREATE TABLE parking_lot (
space_id int, -- the id of the parking space
level int, -- the level where the parking space is at
primary key (space_id)
);
Let's take a look at how a data record is generated in this scenario. Imagine that at time T0, a sensor located above the 45th parking space on the 2nd level detects the presence of a vehicle underneath it. This detection will result in the creation of a record (T0, NULL, T0, 45, 2)
.
Due to potential network glitches or temporary service interruptions, the sensor may need to retransmit the record to the central unit to ensure the integrity of the data. For example, it might send a duplicate record (T0, NULL, T1, 45, 2)
after a random interval of time, denoted as T1.
When the vehicle leaves the parking space at time T2, the sensor will generate a new record: (T0, T2, T2, 45, 2)
, indicating that the space is now available. Similarly, to prevent any loss of data, the sensor may retransmit this information after a short delay.
Scenario 1: Real-time parking space availability
In order to create the "Parking Spaces Available" dashboard shown in the figure, RisingWave needs to determine the current number of vacant parking spaces. This involves checking if the most recent signal for a parking space has a NULL value in the end_at
field.
Initially, our goal is to extract the latest signal for each parking space, keeping the signal with the highest last_updated_at
timestamp.
create materialized view latest_signal_for_each_parking_space as
with ranked as (
select *, row_number() over (partition by space_id order by last_updated_at desc) as rn
from signals
)
select * except (rn)
from ranked
where rn = 1;
To utilize the row_number()
with rank = 1
approach, you can refer to the information provided at this doc: Top-N by group. In essence, this involves partitioning the data by the space_id
field, and within each partition, ordering the data by last_updated_at
to retain only the latest signal for further processing.
Afterwards, you can determine if a parking space is occupied based on its latest signal. Then, you can aggregate the availability for each level within the parking lot.
create materialized view occupied_on_each_level as
select
level,
count(case when end_at is NULL then 1
else 0
end) as occupied
from
latest_signal_for_each_parking_space
group by
level;
If the latest signal shows that a parking space is occupied and has a NULL
value in the end_at
field, then that parking space is still occupied. On the other hand, if the end_at
field is not NULL
, it indicates that the parking space is empty.
To calculate the count of occupied parking spaces on each level (occupied_on_each_level
), we subtract this count from the total number of parking spaces on each level to determine the number of available parking spaces. Therefore:
create materialized view available_on_each_level as
with
total_on_each_level as (
select
level,
count(*) as total
from parking_lot
group by level
)
select
t.level as level,
total - occupied as available
from
total_on_each_level t
inner join
occupied_on_each_level o
on t.level = o.level;
Scenario 2: Unavailability of each parking space per hour
In order to calculate the percentage of time that each parking space is occupied within each hour, we need to take into account overlapping time intervals for each signal.
For example, if a parking space is occupied from 7:00 to 7:10 and then from 7:40 to 8:00, there are 30 minutes out of the total 60 minutes from 7:00 to 8:00 when the space is occupied. This results in an unavailability of 50%.
Similarly, if a parking space has been occupied since 7:50 and remains occupied until the current time at 8:30, within the hour from 7:00 to 8:00, only the duration from 7:50 to 8:00, which is 10 minutes out of 60 minutes, is considered. This results in an unavailability of 16.7%.
For subsequent hours, such as from 8:00 to 9:00, we only consider the portion from the beginning of the hour (8:00) to the current time (8:30), resulting in a full hour of occupancy (100%).
To incorporate this logic into SQL, we need to calculate the percentage of time that each parking space is occupied within each hour without initially considering the progression of time.
-- First, we still deduplicate the signals
create materialized view deduplicated_signals as
with ranked as (
-- The data are partitioned by space_id, level_id and start_at,
-- and we only keep the latest signal in each group.
select *, row_number() over (partition by space_id, level_id, start_at order by last_updated_at desc) as rn
from signals
)
select * except (rn)
from ranked
where rn = 1;
-- Then and we put them into two groups:
-- 1. the parking space is still be occupied
-- 2. the parking space is not occupied anymore
create materialized view occupying as
-- let's use the current time as the ended time of occupying
select space_id, level_id, start_at, timestamp '2024-05-13 14:54:02.91714' as end_at from deduplicated_signals where end_at is NULL;
create materialized view occupied as
select space_id, level_id, start_at, end_at from deduplicated_signals where end_at is not NULL;
-- Next, we merge these two types of events together
create materialized view both as
select * from occupying
union all
select * from occupied;
-- We acquire all the 1-hour windows that the signal overlaps
create materialized view split_window
as
select
*,
generate_series(
date_trunc('HOUR', start_at - interval '0 HOUR', 'Europe/Stockholm')::timestamp,
date_trunc('HOUR', end_at - interval '0 HOUR', 'Europe/Stockholm')::timestamp,
interval '1 HOUR'
)::timestamptz as window_start
from both;
-- The SQL above misses the end of each window. We add it.
create materialized view occupy_window
as
select *, window_start + '1 HOUR' as window_end
from split_window;
-- As shown in the example above,
-- for intervals such as 1:30~2:00 and 4:00~4:10,
-- they fall in the window of 1:00~2:00 and 4:00~5:00.
-- However, we only want to count the 1:30~2:00 and 4:00~4:10 part in the
-- unavailability calculation.
create materialized view occupy_start_end
as
select
*, greatest(window_start, start_at) as occupy_start, least(window_end, end_at) as occupy_end
from occupy_window;
-- Finally, we can calculate the unavailability
create materialized view unavailability
as
select
space_id,
level,
window_start,
window_end,
'HOURLY' as window_type,
extract(epoch from (window_end - window_start)) as total_seconds,
sum(extract(epoch from (occupy_end - occupy_start))) as occupy_seconds
from
occupy_start_end
group by 1, 2, 3, 4;
-- If you are interested in the percentage representation
create materialized view percentage
as
select
occupy_seconds::double / total_seconds as percentage
from
unavailability;
To replace the fixed and constant "now" time with the current time, we can use the maximum value of "last_updated_at" among all the signals to represent the end of all ongoing occupation intervals. This allows us to dynamically determine the end time for ongoing occupancy intervals.
In SQL, you can achieve this by using a subquery to calculate the maximum value of "last_updated_at" from your dataset, and then use that value as the end time for ongoing occupancy intervals in your calculations. Here's an example of how you can do it:
-- This is a materialized view with only one row and only two column.
-- 1 as a constant is a dummy variable used for joining records later.
create materialized view current_max_time
as
select max(last_updated_at) as now, 1 as constant from signals;
Then we modify the definition of “occupying” as follows:
create materialized view occupying as
with tmp as (
select space_id, level_id, start_at, 1 as constant
from
deduplicated_signals where end_at is NULL
)
select space_id, level_id, start_at, now as end_at from
-- Here we abuse the constant column to do the cross-product.
-- This is because RW by default disallows nested loop join due to its inefficiency.
tmp inner join current_max_time on tmp.constant = current_max_time.constant;
The rest of the materialized views stay the same.
Optimization: Reduce re-computation frequency
When the column last_updated_at
frequently updates, relying on max(last_updated_at)
can cause problems. This is because each change in max(last_updated_at)
requires refreshing all subsequent materialized views, resulting in significant computational burden.
To address this issue and minimize unnecessary refreshes, we can use a window function in the following manner:
create materialized view current_max_time
as
select max(window_end) as now, 1 as constant
from
TUMBLE(signals, last_updated_at, INTERVAL '15 MINUTES')
For more usage of time windows, please refer to tumble()
time window function.
We can expect that the re-computation is triggered about every 15 minutes.
Optimization: Suppress re-computation during backfilling
The previous optimization is effective for handling new incoming data. However, it may not be sufficient for historical data. The last_updated_at
value for new data progresses in real-time, while for historical data, it advances at a much faster pace. This can result in frequent recomputations, especially when dealing with large volumes of historical data.
To address this issue, we can make additional modifications to the "current_max_time" materialized view as follows:
-- Suppose the wall clock timestamp is 2024-05-20 12:05:35.607001+00
create materialized view current_max_time
as
select max('2024-05-20 12:05:35.607001+00', max(window_end)) as now, 1 as constant
from
TUMBLE(signals, last_updated_at, INTERVAL '15 MINUTES');
Note that now
will not change until last_updated_at
catches up.
>
In this blog post, we demonstrated a simple application using RisingWave to monitor the availability of parking spaces in real-time. What you can achieve with RisingWave can be much more advanced as long as necessary data is available. You are welcome to share with us your solutions leveraging RisingWave for real-time monitoring and analytics scenarios. > >
>
If you would like to learn more about the features that RisingWave offers, see the official documentation site. You can also sign up for RisingWave Cloud for free to test it out. If you need additional help with setting up this integration, join our active Slack community. > >
###