-
Notifications
You must be signed in to change notification settings - Fork 287
Name Variants
Names in the Anthology are handled using the following conventions:
- Names are split into "first" and "last" fields, which correspond more generally to "given" and "family" names
- If an author has only one name, the first name field is omitted
- Names on the PDF files are considered authoritative; the metadata in the XML database should match it (if the names don't match, please file a metadata fix issue)
- The full name is produced by concatenating the first and last names with an intervening space
- A separate author page is created for every unique name that appears in paper metadata
The author pages are created by first slugifying them, e.g., for Aravind Joshi.
For authors whose name has no variants and never changes, the pages are generated without any issues, simply by aggregating all instances of the author's papers across the Anthology XML database. But there are two common problems that occur with this approach:
- A person might have one or more name variants, like
Aravind Joshi
andAravind K. Joshi
. These should be combined under a single page. - Two or more different people might have the same name. These should be split across multiple pages.
We manage these two issues using the file data/yaml/name_variants.yaml. Each entry in the file describes a different person; an example entry would be:
- canonical: {first: Aravind, last: Joshi}
id: aravind-joshi
comment: Penn
similar: Aravind Q. Joshi
variants:
- {first: Aravind K., last: Joshi}
- {first: A. K., last: Joshi}
Only the canonical
field is required. The variants
lines specify different names that should be grouped together under this page. So long as these variants are globally unique to this person, papers with these variants will all be grouped together under a slugified version of the canonical name.
When a name (whether canonical or variant) is shared between different persons, the id
field is required to separate them into different author pages. This is done by adding the id
attribute to each <author>
instance on papers in the Anthology database, as in the following:
<author id="aravind-joshi"><first>A.</first><last>Joshi</last></author>
This id
is then used to create that author's page, e.g., Yang Liu. For this reason, we attempt to use the institution of the author's highest degree, which is normally the Ph.D.-granting institution. (We haven't yet dealt with a name conflict within the same institution).
Once an id
attribute is added in this manner, it must be added to every paper belonging to that author. This is to reduce the chance of a newly ingested paper having an author name that is not resolved correctly.
There are two other fields used to help identify people:
-
The
comment
field is displayed under a person's name on their Anthology page. It can be any text. Usually it lists past and current affiliations. -
If two people have the same canonical name, the Anthology automatically adds them to each other's "People with similar names" list. If there other people who have almost the same name, you can add their IDs to the
similar
field.
If you need to merge or split your name, you have two options. First, you can submit an issue, and Anthology volunteers will help you with it. Second, you can expedite the process by creating the pull request yourself, in which case we can simply approve it. Changes go live within about a half hour of approval.