Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

Merged
merged 5 commits into from
May 10, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ yutto <url> -c "d8bc7493%2C2843925707%2C08c3e*81"
#### 存放子路径模板

- 参数 `-tp` 或 `--subpath-template`
- 可选参数变量 `title | id | name | username | series_title | pubdate` (以后可能会有更多)
- 可选参数变量 `title | id | name | username | series_title | pubdate | download_date | owner_uid` (以后可能会有更多)
- 默认值 `"{auto}"`

通过配置子路径模板可以灵活地控制视频存放位置。
Expand All @@ -316,18 +316,19 @@ yutto <url> -c "d8bc7493%2C2843925707%2C08c3e*81"

另外,该功能语法由 Python format 函数模板语法提供,所以也支持一些高级的用法,比如 `{id:0>3}{name}`。

值得注意的是,并不是所有变量在各种场合下都会提供,比如 `username` 变量当前仅在 UP 主全部投稿视频/收藏夹才提供,在其它情况下不应使用它。各变量详细作用域描述见下表:
值得注意的是,并不是所有变量在各种场合下都会提供,比如 `username`, `owner_uid` 变量当前仅在 UP 主全部投稿视频/收藏夹才提供,在其它情况下不应使用它。各变量详细作用域描述见下表:

<!-- prettier-ignore -->
|Variable|Description|Scope|
|-|-|-|
|title|系列视频总标题(番剧名/投稿视频标题)|全部|
|id|系列视频单 p 顺序标号|全部|
|name|系列视频单 p 标题|全部|
|username|UP 主用户名|个人空间、收藏夹、合集、视频列表下载|
|username|UP主用户名|个人空间、收藏夹、合集、视频列表下载|
SigureMo marked this conversation as resolved.
Show resolved Hide resolved
|series_title|合集标题|收藏夹、视频合集、视频列表下载|
|pubdate|投稿日期|仅投稿视频|
|download_date|下载日期|全部|
|owner_uid|UP主UID|个人空间、收藏夹、合集、视频列表下载|

> **Note**
>
Expand Down
17 changes: 10 additions & 7 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ python = "^3.9.0"
aiohttp = "^3.8.3"
aiofiles = "^23.0.0"
biliass = "1.3.7"
dicttoxml = "^1.7.15"
colorama = { version = "^0.4.6", markers = "sys_platform == 'win32'" }
typing-extensions = "^4.4.0"
dict2xml = "1.7.3"

[tool.poetry.group.dev.dependencies]
pytest = "^7.2.2"
Expand Down
2 changes: 2 additions & 0 deletions tests/test_api/test_ugc_video.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,14 @@ async def test_get_ugc_video_list():
assert ugc_video_list[0]["cid"] == CId("222190584")
assert ugc_video_list[0]["metadata"] is not None
assert ugc_video_list[0]["metadata"]["title"] == "bilili 特性以及使用方法简单介绍"
assert ugc_video_list[0]["metadata"]["website"] == "https://www.bilibili.com/video/BV1vZ4y1M7mQ"

assert ugc_video_list[1]["id"] == 2
assert ugc_video_list[1]["name"] == "bilili 环境配置方法"
assert ugc_video_list[1]["cid"] == CId("222200470")
assert ugc_video_list[1]["metadata"] is not None
assert ugc_video_list[1]["metadata"]["title"] == "bilili 环境配置方法"
assert ugc_video_list[0]["metadata"]["website"] == "https://www.bilibili.com/video/BV1vZ4y1M7mQ"


@pytest.mark.api
Expand Down
6 changes: 5 additions & 1 deletion yutto/api/bangumi.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,12 @@ def _parse_bangumi_metadata(item: dict[str, Any]) -> MetaData:
show_title=item["share_copy"],
plot=item["share_copy"],
thumb=item["cover"],
premiered=get_time_str_by_stamp(item["pub_time"]),
premiered=get_time_str_by_stamp(item["pub_time"], "%Y-%m-%d"),
dateadded=get_time_str_by_now(),
source="", # TODO
actor=[], # TODO
genre=[], # TODO
tag=[], # TODO
website="", # TODO
original_filename="", # TODO
)
6 changes: 5 additions & 1 deletion yutto/api/cheese.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,12 @@ def _parse_cheese_metadata(item: dict[str, Any]) -> MetaData:
show_title=item["title"], # 无此字段,用 title 代替
plot=item["title"], # 无此字段,用 title 代替
thumb=item["cover"],
premiered=get_time_str_by_stamp(item["release_date"]),
premiered=get_time_str_by_stamp(item["release_date"], "%Y-%m-%d"),
dateadded=get_time_str_by_now(),
source="", # TODO
actor=[], # TODO
genre=[], # TODO
tag=[], # TODO
website="", # TODO
original_filename="", # TODO
)
75 changes: 71 additions & 4 deletions yutto/api/ugc_video.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import json
import re
from typing import TypedDict
from typing import Any, TypedDict

from aiohttp import ClientSession

Expand All @@ -24,7 +24,7 @@
)
from yutto.utils.console.logger import Logger
from yutto.utils.fetcher import Fetcher
from yutto.utils.metadata import MetaData
from yutto.utils.metadata import Actor, MetaData
from yutto.utils.time import get_time_str_by_now, get_time_str_by_stamp


Expand All @@ -45,6 +45,9 @@ class _UgcVideoInfo(TypedDict):
pubdate: int
description: str
pages: list[_UgcVideoPageInfo]
genre: list[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问 genre 是什么的缩写呢?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看到 commit message 里的描述了 genre 是分区是嘛?好奇怪的缩写 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有一说一我也觉得奇怪,在emby刮削的时候,genre被认为是「流派」...而且我自己把ugc内容当做movie来处理的,确实genre能识别;
于是使用genre对应分区名字;
如果有更好的选择,我觉得也可以改掉,目前这个nfo我还没有在emby和infuse上测试,尚不清楚genre能不能工作

actor: list[Actor]
tag: list[str]


class UgcVideoListItem(TypedDict):
Expand All @@ -62,6 +65,17 @@ class UgcVideoList(TypedDict):
pages: list[UgcVideoListItem]


async def get_ugc_video_tag(session: ClientSession, avid: AvId) -> list[str]:
tags: list[str] = []
tag_api = "http://api.bilibili.com/x/tag/archive/tags?aid={aid}&bvid={bvid}"
res_json = await Fetcher.fetch_json(session, tag_api.format(**avid.to_dict()))
if res_json is None or res_json["code"] != 0:
raise NotFoundError(f"无法获取视频 {avid} 标签")
for tag in res_json["data"]:
tags.append(tag["tag_name"])
return tags


async def get_ugc_video_info(session: ClientSession, avid: AvId) -> _UgcVideoInfo:
regex_ep = re.compile(r"https?://www\.bilibili\.com/bangumi/play/ep(?P<episode_id>\d+)")
info_api = "http://api.bilibili.com/x/web-interface/view?aid={aid}&bvid={bvid}"
Expand All @@ -81,6 +95,10 @@ async def get_ugc_video_info(session: ClientSession, avid: AvId) -> _UgcVideoInf
episode_id = EpisodeId("")
if res_json_data.get("redirect_url") and (ep_match := regex_ep.match(res_json_data["redirect_url"])):
episode_id = EpisodeId(ep_match.group("episode_id"))

actors = _parse_actor_info(res_json_data)
genres = _parse_genre_info(res_json_data)
tags: list[str] = await get_ugc_video_tag(session, avid)
return {
"avid": BvId(res_json_data["bvid"]),
"aid": AId(str(res_json_data["aid"])),
Expand All @@ -99,6 +117,9 @@ async def get_ugc_video_info(session: ClientSession, avid: AvId) -> _UgcVideoInf
}
for page in res_json_data["pages"]
],
"actor": actors,
"tag": tags,
"genre": genres,
}


Expand Down Expand Up @@ -241,19 +262,65 @@ async def get_ugc_video_subtitles(session: ClientSession, avid: AvId, cid: CId)
return []


def _parse_ugc_video_metadata(video_info: _UgcVideoInfo, page_info: _UgcVideoPageInfo) -> MetaData:
def _parse_ugc_video_metadata(
video_info: _UgcVideoInfo,
page_info: _UgcVideoPageInfo,
) -> MetaData:
return MetaData(
title=page_info["part"],
show_title=page_info["part"],
plot=video_info["description"],
thumb=page_info["first_frame"] if page_info["first_frame"] is not None else video_info["picture"],
premiered=get_time_str_by_stamp(video_info["pubdate"]),
premiered=get_time_str_by_stamp(video_info["pubdate"], "%Y-%m-%d"),
dateadded=get_time_str_by_now(),
actor=video_info["actor"],
genre=video_info["genre"],
tag=video_info["tag"],
source="", # TODO
original_filename="", # TODO
website=video_info["bvid"].to_url(),
)


def _parse_actor_info(video_info: dict[str, Any]):
actors: list[Actor] = []
if video_info.get("staff") and isinstance(video_info["staff"], list):
_index: int = 0
staff_list: list[dict[str, Any]] = video_info["staff"]
for staff in staff_list:
actors.append(
Actor(
name=staff["name"],
role=staff["title"],
thumb=staff["face"],
profile=f"https://space.bilibili.com/{staff['mid']}",
order=_index,
)
)
_index += 1
elif video_info.get("owner") and isinstance(video_info["owner"], dict):
staff_info: dict[str, Any] = video_info["owner"]
actors.append(
Actor(
name=staff_info["name"],
role="UP主",
thumb=staff_info["face"],
profile=f"https://space.bilibili.com/{staff_info['mid']}",
order=0,
)
)
else:
Logger.warning("未找到演职人员信息")
return actors


def _parse_genre_info(video_info: dict[str, Any]) -> list[str]:
genres: list[str] = []
if video_info.get("tname") and isinstance(video_info["tname"], str):
genres.append(video_info["tname"])
return genres


def _is_meaningless_name(name: str) -> bool:
"""检测名称是否为无意义的名称"""
# name 为空
Expand Down
8 changes: 8 additions & 0 deletions yutto/extractor/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ async def extract_bangumi_data(
"series_title": UNKNOWN,
"pubdate": UNKNOWN,
"download_date": bangumi_info["metadata"]["dateadded"],
"owner_uid": UNKNOWN,
}
subpath_variables_base.update(subpath_variables)
subpath = resolve_path_template(args.subpath_template, auto_subpath_template, subpath_variables_base)
Expand Down Expand Up @@ -103,6 +104,7 @@ async def extract_cheese_data(
"series_title": UNKNOWN,
"pubdate": UNKNOWN,
"download_date": UNKNOWN,
"owner_uid": UNKNOWN,
}
subpath_variables_base.update(subpath_variables)
subpath = resolve_path_template(args.subpath_template, auto_subpath_template, subpath_variables_base)
Expand Down Expand Up @@ -139,6 +141,11 @@ async def extract_ugc_video_data(
subtitles = await get_ugc_video_subtitles(session, avid, cid) if args.require_subtitle else []
danmaku = await get_danmaku(session, cid, args.danmaku_format) if args.require_danmaku else EmptyDanmakuData
metadata = ugc_video_info["metadata"] if args.require_metadata else None
owner_uid: str = (
ugc_video_info["metadata"]["actor"][0]["profile"].split("/")[-1]
if ugc_video_info["metadata"]["actor"]
else UNKNOWN
)
subpath_variables_base: PathTemplateVariableDict = {
"id": id,
"name": name,
Expand All @@ -147,6 +154,7 @@ async def extract_ugc_video_data(
"series_title": UNKNOWN,
"pubdate": UNKNOWN,
"download_date": ugc_video_info["metadata"]["dateadded"],
"owner_uid": owner_uid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也在 cheese 和 bangumi 里补充一下吧,直接 UNKNOWN 即可,另外文档(README.md)里的「存放子路径模板」需要加一下这一个字段,并在表格里说明下什么情况会有这个字段~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

稍晚一些,可能要等几天,我先测试emby的nfo,一起调整

}
subpath_variables_base.update(subpath_variables)
subpath = resolve_path_template(args.subpath_template, auto_subpath_template, subpath_variables_base)
Expand Down
4 changes: 3 additions & 1 deletion yutto/processor/path_resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@

from yutto.utils.console.logger import Logger

PathTemplateVariable = Literal["title", "id", "name", "username", "series_title", "pubdate", "download_date"]
PathTemplateVariable = Literal[
"title", "id", "name", "username", "series_title", "pubdate", "download_date", "owner_uid"
]
PathTemplateVariableDict = dict[PathTemplateVariable, Union[int, str]]
UNKNOWN: str = "unknown_variable"

Expand Down
23 changes: 16 additions & 7 deletions yutto/utils/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@
from typing import TypedDict
from xml.dom.minidom import parseString # type: ignore

import dicttoxml # type: ignore
from dict2xml import dict2xml # type: ignore


class Actor(TypedDict):
name: str
role: str
thumb: str
profile: str
order: int


class MetaData(TypedDict):
Expand All @@ -14,16 +22,17 @@ class MetaData(TypedDict):
thumb: str
premiered: str
dateadded: str
actor: list[Actor]
genre: list[str]
tag: list[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata 新增了字段,Bangumi 和 Cheese 是否可以对齐一下呢?可以先空着,记个 TODO 即可

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

担心Bangumi和Cheese和UGC的MetaData格式不同,我觉得可以先不加,不加有影响吗?如果不加会导致bangumi报错的话,那就加个可好了...实际上MetaData应该区分UGCMetaData, BangumiMetaData吧?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不加有影响吗

不加会影响类型提示吧,Linter(pyright)应该会过不了

实际上MetaData应该区分UGCMetaData, BangumiMetaData吧?

Metadata 这块因为我没有深度使用过细节上不太清楚,不过确实是可以这样子的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本地没lint到...ok,刚刚看了下,我准备加上这几个actor genre tag website

source: str
original_filename: str
website: str


def write_metadata(metadata: MetaData, video_path: Path):
metadata_path = video_path.with_suffix(".nfo")
custom_root = "episodedetails"

xml_content = dicttoxml.dicttoxml(metadata, custom_root=custom_root, attr_type=False) # type: ignore
dom = parseString(xml_content) # type: ignore
pretty_content = dom.toprettyxml() # type: ignore
custom_root = "episodedetails" # TODO: 不同视频类型使用不同的root name
xml_content = dict2xml(metadata, wrap=custom_root, indent=" ") # type: ignore
with metadata_path.open("w", encoding="utf-8") as f: # type: ignore
f.write(pretty_content) # type: ignore
f.write(xml_content) # type: ignore