This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.
In this step, we calculate the time each user spent in a livestream session.
- Combine the
livestream_enter
andlivestream_exit
tables using aLEFT JOIN
. - For users who are still in the livestream, use the current timestamp as their
exit_time
. - Ensure that
exit_time
is always greater than or equal toenter_time
. - Group by the
user_id
,livestream_id
, andenter_time
to compute session-specific watch times.
WITH LivestreamWatchTime AS (
SELECT
le.user_id,
le.livestream_id,
le.event_time AS enter_time,
COALESCE(MIN(le_exit.event_time), CURRENT_TIMESTAMP()) AS exit_time
FROM xxx.livestream_enter le
LEFT JOIN xxx.livestream_exit le_exit ON le.user_id = le_exit.user_id
AND le.livestream_id = le_exit.livestream_id
AND le_exit.event_time >= le.event_time
GROUP BY le.user_id, le.livestream_id, le.event_time
)
Here, we map users to their respective regions based on their app entry times.
- Use the
user_entry
table to determine the region for each user session. - Ensure that the
livestream_enter
time is greater than or equal to theuser_entry
time. - Use the
LEAD
function to calculate the next entry time, which helps identify the active region during overlapping sessions.
UserRegion AS (
SELECT
ue.user_id,
ue.event_time AS entry_time,
ue.region,
LEAD(ue.event_time) OVER (PARTITION BY ue.user_id ORDER BY ue.event_time) AS next_entry_time
FROM xxx.user_entry ue
)
In this step, we combine the watch time data with region information.
- Join the
LivestreamWatchTime
andUserRegion
tables to associate each livestream session with the correct region. - Ensure that the
livestream_enter
time falls between theentry_time
and thenext_entry_time
for the region.
WatchTimeWithRegion AS (
SELECT
lwt.user_id,
ur.region,
TIMESTAMP_DIFF(lwt.exit_time, lwt.enter_time, SECOND) / 60 AS watch_time_minutes
FROM LivestreamWatchTime lwt
LEFT JOIN UserRegion ur ON lwt.user_id = ur.user_id
AND lwt.enter_time >= ur.entry_time
AND (lwt.enter_time < ur.next_entry_time OR ur.next_entry_time IS NULL)
)
Next, we calculate the total watch time for each user in each region.
- Group the data by
region
anduser_id
. - Sum up the watch time in minutes for all sessions.
TotalWatchTime AS (
SELECT
region,
user_id,
SUM(watch_time_minutes) AS total_watch_time_minutes
FROM WatchTimeWithRegion
GROUP BY region, user_id
)
Finally, we extract the top 5 users with the highest watch times in each region.
- Filter the ranked data to include only users with ranks 1 through 5.
- Sort the results by
region
and descendingtotal_watch_time_minutes
.
SELECT
region AS Region,
user_id AS UserID,
total_watch_time_minutes AS TotalWatchTimeMinutes
FROM RankedWatchTime
WHERE rank <= 5
ORDER BY region ASC, total_watch_time_minutes DESC;
The full query integrates all the steps outlined above, with the aim to retrieve the top 5 users by total watch time in each region. You can find the full SQL code in the main final-query
.
The logic involves the following key steps:
- LivestreamWatchTime: This table captures each user's entry and exit time for livestreams.
- UserRegion: Maps users to their regions based on entry times, and calculates the next region entry to handle overlapping sessions.
- WatchTimeWithRegion: Combines the watch time data with regions to calculate total minutes watched by each user in each region.
- TotalWatchTime: Aggregates the total watch time for each user per region.
- RankedWatchTime: Ranks users within each region based on their total watch time in descending order.
- Final Output: Filters the top 5 ranked users per region and outputs their details.
The query produces a table with the following columns:
- Region: The region where the user accessed the livestream.
- UserID: The unique identifier for the user.
- TotalWatchTimeMinutes: The total watch time (in minutes) for the user in the region.
You can then use the results to identify the top users per region, which is useful for targeting specific regions or rewarding high-engagement users.
This query is designed for analysis in live streaming platforms or applications that track user interactions across different regions. It helps:
- Identify the most engaged users in each region.
- Tailor content, advertisements, or rewards to specific regions with high user engagement.
- Optimize livestream strategies by focusing on regions or users that contribute the most watch time.
This project provides valuable insights into user behavior during livestream events, specifically by identifying the top users in each region based on their total watch time. The query can be customized further to meet your specific needs, such as adjusting the number of top users or including additional user metadata.