Skip to content

Latest commit

 

History

History
127 lines (92 loc) · 4.13 KB

google_1_hard.md

File metadata and controls

127 lines (92 loc) · 4.13 KB

Solving Google Interview Question: Finding Median Searches

You can find the problem by following the link.

Problem Description

Google's marketing team is making a Superbowl commercial and needs a simple statistic to put on their TV ad: the median number of searches a person made last year.

However, at Google's scale, querying the 2 trillion searches is too costly. Luckily, you have access to the summary table which tells you the number of searches made last year and how many Google users fall into that bucket.

Write a query to report the median of searches made by a user. Round the median to one decimal point.

Given Data

search_frequency Table:

searches num_users
1 2
2 2
3 3
4 1

Problem Constraints

  • Find the median of searches made by a user.
  • Round the median to one decimal point.

Example

By expanding the search_frequency table, we get the list [1, 1, 2, 2, 3, 3, 3, 4], which has a median of 2.5 searches per user.

searches
2.5

Approach

To find the median, we need to

  • expand the search_frequency table
  • calculate the median from the resulting list. Very simple!

SQL Query

Here's the SQL query to solve the problem:

WITH RECURSIVE cte_numbers(searches, num_users) 
AS (
    SELECT 
        searches, num_users
    FROM 
        search_frequency
    UNION ALL
    SELECT    
        searches, num_users - 1
    FROM    
        cte_numbers
    WHERE 
        num_users > 1
)
SELECT 
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY searches)::NUMERIC, 1) AS "Percentile_Cont"
FROM 
    cte_numbers;

Explanation

The query uses a recursive common table expression (CTE) to generate the expanded list of searches. Every time we 'go over' the search_frequency table, su subtract 1 from num_users, and as a recursive query we give it an ending point, the > 1 to overcome printing the same row twice. At the end, we get the expanded table of searches. The PERCENTILE_CONT function is then used to calculate the median, and the result is rounded to one decimal point.

P.S.

This was my first attempt at constructing a RECURSIVE query. The way I imagine it is that I can write a UNION ALL but give it a condition that will make it dynamic. After solving it I looked for other solutions.

This is the solution the site provides, which in its turn is very slick

WITH searches_expanded AS (
  SELECT searches
  FROM search_frequency
  GROUP BY 
    searches, 
    GENERATE_SERIES(1, num_users))

SELECT 
  ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY searches)::DECIMAL, 1) AS  median
FROM searches_expanded;

As you can see the only difference is the use of GENERATE_SERIES(1, num_users) which is a rarely used technique, and is usually used in DDL, when generating an index for a table.

Another version was the one I got from AI. This is not the only output it gave, but the other one was even longer, so I didn't include it.

WITH expanded_searches AS (
    SELECT searches
    FROM search_frequency
    CROSS JOIN generate_series(1, num_users) AS users
    ORDER BY searches
),

sorted_searches AS (
    SELECT searches,
           ROW_NUMBER() OVER (ORDER BY searches) AS row_num,
           COUNT(*) OVER () AS total_rows
    FROM expanded_searches
)

SELECT ROUND(AVG(searches)::numeric, 1) AS median
FROM sorted_searches
WHERE row_num BETWEEN (total_rows + 1) / 2 AND (total_rows + 2) / 2;

The majority of the solutions I saw were very overcomplicated like this. Although this is an interesting approach to consider.


Connect with Me:

Thank you for joining me on FAANG around series. I hope these solutions and insights prove helpful in your SQL journey. Feel free to explore the provided SQL queries, try them out on your own, and don't hesitate to reach out for discussions, feedback, or further clarification.