Name Variants

Names in the Anthology are handled using the following conventions:

Names are split into "first" and "last" fields, which correspond more generally to "given" and "family" names
If an author has only one name, the first name field is omitted
Names on the PDF files are considered authoritative; the metadata in the XML database should match it (if the names don't match, please file a metadata fix issue)
The full name is produced by concatenating the first and last names with an intervening space
A separate author page is created for every unique name that appears in paper metadata

The author pages are created by first slugifying them, e.g., for Aravind Joshi.

For authors whose name has no variants and never changes, the pages are generated without any issues, simply by aggregating all instances of the author's papers across the Anthology XML database. But there are two common problems that occur with this approach:

A person might have one or more name variants, like Aravind Joshi and Aravind K. Joshi. These should be combined under a single page.
Two or more different people might have the same name. These should be split across multiple pages.

We manage these two issues using the file data/yaml/name_variants.yaml. Each entry in the file describes a different person; an example entry would be:

- canonical: {first: Aravind, last: Joshi}
  id: aravind-joshi
  comment: Penn
  similar: Aravind Q. Joshi
  variants:
  - {first: Aravind K., last: Joshi}
  - {first: A. K., last: Joshi}

Only the canonical field is required. The variants lines specify different names that should be grouped together under this page. So long as these variants are globally unique to this person, papers with these variants will all be grouped together under a slugified version of the canonical name.

When a name (whether canonical or variant) is shared between different persons, the id field is required to separate them into different author pages. This is done by adding the id attribute to each <author> instance on papers in the Anthology database, as in the following:

<author id="aravind-joshi"><first>A.</first><last>Joshi</last></author>

This id is then used to create that author's page, e.g., Yang Liu. For this reason, we attempt to use the institution of the author's highest degree, which is normally the Ph.D.-granting institution. (We haven't yet dealt with a name conflict within the same institution).

Once an id attribute is added in this manner, it must be added to every paper belonging to that author. This is to reduce the chance of a newly ingested paper having an author name that is not resolved correctly.

There are two other fields used to help identify people:

The comment field is displayed under a person's name on their Anthology page. It can be any text. Usually it lists past and current affiliations.
If two people have the same canonical name, the Anthology automatically adds them to each other's "People with similar names" list. If there other people who have almost the same name, you can add their IDs to the similar field.

If you need to merge or split your name, you have two options. First, you can submit an issue, and Anthology volunteers will help you with it. Second, you can expedite the process by creating the pull request yourself, in which case we can simply approve it. Changes go live within about a half hour of approval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Name Variants

Clone this wiki locally