Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to aclanthology.org #1278

Closed
mjpost opened this issue Apr 19, 2021 · 5 comments
Closed

Switch to aclanthology.org #1278

mjpost opened this issue Apr 19, 2021 · 5 comments

Comments

@mjpost
Copy link
Member

mjpost commented Apr 19, 2021

Sometime this year, I am hoping we can switch to aclanthology.org as our primary hosting site. There are technical reasons—ACL IT has been pushing for this, since we often come close to our hosting and bandwidth limits, and the Anthology is a key piece of that. But I also like it better aesthetically; aclanthology.org (versus aclweb.org/anthology) is more parsimonious (16 versus 20 characters); and using a top-level domain reflects its status.

The main question is whether we change the canonical URL as well. My thinking is that yes, we do, with permanent 301 redirects for papers existing at the time of the switch.

I welcome discussion.

@knmnyn
Copy link
Collaborator

knmnyn commented Apr 19, 2021

I strongly endorse this. I think the team has made a few change to canonical URLs without much (any?) negative repercussions, so this makes sense. I'll be very happy if the team can make this as the canonical version because this was the original intent of registering this domain under ACL auspices over a decade ago.

@mjpost mjpost pinned this issue May 22, 2021
@mjpost
Copy link
Member Author

mjpost commented May 22, 2021

Here are some TODOs:

  1. Update the canonical URL in bin/anthology/data.py
  2. Update the <link rel=canonical> header for all pages (this follows from 1)
  3. Update all scripts that refer to aclweb.org (ideally have them use variables from bin/anthology.utils.py)
  4. Generate thumbnails on aclanthology.org
  5. Come up with a better backup solution

After merging:

  1. Update the Google Search console to crawl aclanthology.org instead of aclweb.org
  2. Create 301 redirects for all pages and PDFs on aclweb.org
  3. Consider updating all DOIs to point to the new canonical
  4. Update the URL on the Github page
  5. Advertise on Twitter
  6. Announce on the Google Group

Here is the rule we can use at https://aclweb.org/anthology for general redirects:

RewriteEngine On
RewriteRule ^(.*) https://aclanthology.org/$1 [R=301,L]

@Genius1237
Copy link
Contributor

Genius1237 commented Jul 19, 2021

Wasn't sure whether to bring this up here or open a separate issue about this. Google has started showing results from aclanthology.org in it's search results. For papers on aclweb.org, it shows the last modified/updated date correctly. For papers from the new site, it shows something like 4 days ago (when it last crawled it I guess). I've attached screenshots below.

Old
image

New
image

Being able to look at the year is useful in getting to know when a paper is from. Anything that can be done to address this in the future?
Edit: I should add that it shows "4 days ago" for a lot of papers. This makes it hard to distinguish between old and new papers within a search result (you could do this with the old site)

@akoehn
Copy link
Member

akoehn commented Jul 19, 2021

We deliver all metadata we can (though there is some more to come in #1407, which might fix the extracted text) . Maybe google stores the date it first found the site?

@mjpost
Copy link
Member Author

mjpost commented Jul 19, 2021

Yes, I'm guessing this is what is happening. Though it might be that for new files, Google uses metadata from the file's timestamp itself. In that case we could manually set those dates to, say, the ingestion date. However, I'm not sure we'll have bandwidth to look into this.

@mjpost mjpost closed this as completed Mar 29, 2022
@mjpost mjpost unpinned this issue May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants