较难 通过率:51.47% 时间限制:1秒 空间限制:256M
现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):
author_id | author_level | sex |
---|---|---|
101 | 6 | m |
102 | 1 | f |
103 | 1 | m |
104 | 3 | m |
105 | 4 | f |
106 | 2 | f |
107 | 2 | m |
108 | 5 | f |
109 | 6 | f |
110 | 5 | m |
创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
---|---|---|---|
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
请你统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下:
author_id | author_level | days_cnt |
---|---|---|
101 | 6 | 3 |
select F.author_id, author_level, days_cnt
from
### table F
(select author_id, sum(ddiff) + 1 as days_cnt
from
(select author_id, datediff(date_lead, answer_date) as ddiff
from
(select author_id, answer_date, lead(answer_date) over (partition by author_id order by answer_date) as date_lead
from
(select distinct author_id, answer_date from answer_tb order by author_id) as x) as y having ddiff=1) as z
group by author_id
having days_cnt >= 3
) as F
### table F
join author_tb as auth
on F.author_id = auth.author_id
本题的难点在于计算每个用户的「最大连续回答天数」。
可以看到,最后的答案像套娃一样😂 套了5个select。那么我们就从最里面开始,一层一层的分解。
首先要明确的是,answer_tb
里面有用的column是answer_date
和author_id
,其他的都没用。author_tb
只是用来最后贴标签的,除此之外也没有用。
以下步骤用用户101进行示意。
1️⃣ 把要计算的两个column answer_date
和author_id
从answer_tb
里面提取出来。
(select distinct author_id, answer_date from answer_tb
order by author_id) as x
用了distinct
,就可以去重一个用户在同一天回答多个问题的情况。
当然可以在最后输出答案的时候再order by,这里为了看得更清楚直观,先按author_id排序。
author_id | answer_date |
---|---|
101 | 2021-11-01 |
101 | 2021-11-02 |
101 | 2021-11-03 |
102 | 2011-11-01 |
103 | ... |
... | ... |
2️⃣ 我们用lead
这个window function,得到一个新的column,值是把每个用户的answer_date
这个column上移一格。
(select author_id, answer_date,
lead(answer_date) over (partition by author_id order by answer_date) as date_lead
from x) as y
author_id | answer_date | date_lead |
---|---|---|
101 | 2021-11-01 | 2021-11-02 |
101 | 2021-11-02 | 2021-11-03 |
101 | 2021-11-03 | null |
102 | 2021-11-01 | null |
... | ... | ... |
110 | 2021-11-02 | 2021-11-05 |
... | ... | ... |
3️⃣ 用datediff
这个date function,把两个日期column相减,得到ddiff
这个column,值为1的才起码是连续的。
(select author_id, datediff(date_lead, answer_date) as ddiff
from y having ddiff=1) as z
author_id | ddiff |
---|---|
101 | 1 |
101 | 1 |
4️⃣把每个用户的ddiff
加起来(再加1)即是最大连续回答天数。
(select author_id, sum(ddiff) + 1 as days_cnt
from z
group by author_id
having days_cnt >= 3
) as F
author_id | days_cnt |
---|---|
101 | 3 |
5️⃣最后再按题目要求贴上author_level
的标签即可。
select F.author_id, author_level, days_cnt
from F
join author_tb as auth
on F.author_id = auth.author_id
author_id | author_level | days_cnt |
---|---|---|
101 | 6 | 3 |