Skip to content

Commit

Permalink
Copy Parse Markdown and Generate JSON from Source Repo
Browse files Browse the repository at this point in the history
  • Loading branch information
DmitryRyumin authored and github-actions[bot] committed Jan 21, 2024
1 parent 1206bcd commit eb73061
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions code/markdown_to_json_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,17 @@ def extract_paper_data(paper_section, columns):
title = re.sub(r"<(?:br\s*/?>|img[^>]*>)", "", title)
title = title.strip()

html_entities = {
"&amp;": "&",
"&lt;": "<",
"&gt;": ">",
"&quot;": '"',
"&apos;": "'",
}
title = re.sub(
r"(&\w+;)", lambda x: html_entities.get(x.group(0), x.group(0)), title
)

title_link = title_column.find("a")
title_page = title_link["href"] if title_link else None

Expand Down

0 comments on commit eb73061

Please sign in to comment.