Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look at DOIs for frontmatter #2724

Open
mjpost opened this issue Aug 7, 2023 · 8 comments · May be fixed by #3707
Open

Look at DOIs for frontmatter #2724

mjpost opened this issue Aug 7, 2023 · 8 comments · May be fixed by #3707
Assignees
Milestone

Comments

@mjpost
Copy link
Member

mjpost commented Aug 7, 2023

We didn't get DOIs in frontmatter for EMNLP 22 or ACL 23.

@mjpost mjpost mentioned this issue Aug 7, 2023
@anthology-assist
Copy link
Collaborator

I thought we do not ingest frontmatter anymore?

@mjpost
Copy link
Member Author

mjpost commented Aug 8, 2023

We ingest it if it's there. Most of ACL had frontmatter. We need to check why we didn't assign DOIs. I am think we did in the past.

@akoehn
Copy link
Member

akoehn commented Aug 8, 2023

You changed the logic floor them in the doi generation script recently -- maybe that has something to do with it? Did we change how frontmatters are represented in the xml and then had the doi generation script in am outdated state until you changed it?

@mjpost
Copy link
Member Author

mjpost commented Aug 8, 2023

DOI ingestion is two steps:

  1. bin/generate_crossref_doi_metadata.py produces a big nasty XML file that we upload and use to generate DOIs
  2. bin/add_dois.py goes through each paper in a volume, checks if its DOI works, and if so, adds it to our XML

What I changed is (2) which was broken because it assumed there was always a <frontmatter> block, which there wasn't for EMNLP 2022, because they never delivered it. I didn't change (1). Looking to past frontmatters, we don't in fact generate a DOI for the volume itself. We probably should.

@mjpost
Copy link
Member Author

mjpost commented Aug 8, 2023

Actually, though, this reminds me that I also change the ingestion script (post-EMNLP) to always generate the <frontmatter>. If there's no frontmatter PDF, we still need the block, we just don't generate the <url> tag inside it. We need to add this to EMNLP.

@mbollmann
Copy link
Member

Whatever the reasoning for always generating the <frontmatter> block was, I still suspect it's the wrong solution to a problem I don't yet understand.

@mjpost
Copy link
Member Author

mjpost commented Aug 8, 2023

<frontmatter> is just the special stub for paper 0. If we don't generate it, then no bibtex is generated for the volume itself. We want to generate this volume bibtex even if there is no PDF. (If there is a PDF, we add a <url> tag within frontmatter, as we do for papers.) This is all a separate issue.

We don't generate DOIs for the complete volume or frontmatter, and haven't for some time. If we want to, we just need to figure it out. I haven't had time to do this. See also #726.

@mjpost
Copy link
Member Author

mjpost commented Mar 12, 2024

Re-upping this for this quarter—we should generate DOIs for front matter. This involves:

  • Figuring out the XML that's required
  • Updating the script to generate it

@mjpost mjpost modified the milestones: 2024Q1, 2024Q2 Mar 12, 2024
@yufei118liu yufei118liu self-assigned this Jul 31, 2024
@yufei118liu yufei118liu linked a pull request Jul 31, 2024 that will close this issue
@yufei118liu yufei118liu linked a pull request Jul 31, 2024 that will close this issue
@mjpost mjpost modified the milestones: 2024Q2, 2024Q4 Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants