Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing crash due to None author #650

Merged
merged 1 commit into from
Oct 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions paperqa/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from collections.abc import Collection
from contextlib import contextmanager
from datetime import datetime
from typing import Any, ClassVar
from typing import Any, ClassVar, cast
from uuid import UUID, uuid4

import litellm # for cost
Expand Down Expand Up @@ -456,8 +456,11 @@ def inject_clean_doi_url_into_data(data: dict[str, Any]) -> dict[str, Any]:
def remove_invalid_authors(cls, data: dict[str, Any]) -> dict[str, Any]:
"""Capture and cull strange author names."""
if authors := data.get("authors"):
# On 10/29/2024 while indexing 19k PDFs, a provider (unclear which one)
dakoner marked this conversation as resolved.
Show resolved Hide resolved
# returned an author of None. The vast majority of the time authors are str
authors = cast(list[str | None], authors)
data["authors"] = [
a for a in authors if a.lower() not in cls.AUTHOR_NAMES_TO_REMOVE
a for a in authors if a and a.lower() not in cls.AUTHOR_NAMES_TO_REMOVE
]

return data
Expand Down
Loading