-
Notifications
You must be signed in to change notification settings - Fork 287
Name Variants
The Anthology needs to know how to map names of authors/editors to individuals. It does this mainly via the file data/yaml/name_variants.yaml
. Each entry in this file describes a different person; an example entry would be:
- canonical: Aravind Joshi
id: aravind-joshi
comment: Penn
similar: Aravind Q. Joshi
variants:
- Aravind K. Joshi
- A. K. Joshi
Only the canonical
field is required.
There are two ways to indicate a mapping from a name to a person.
Variant method. If a person goes by multiple names, like Aravind Joshi
and Aravind K. Joshi
, all names should be entered into the file
The canonical
name is the one that the Anthology displays by default. The variants
must be globally unique.
ID method. Alternatively, the referent of the name can be indicated in the XML file itself using the id
attribute:
<author id="aravind-joshi"><first>A.</first> <last>Joshi</last></author>
When should each of the two methods be used?
-
The variant method is better for names that are likely to be unique (because variant names must be unique) and likely to be reused (because variant names don't need special annotation to be resolved correctly).
-
The ID method is better for names that are either likely to be non-unique (for example, a name abbreviated to use just a first initial) or unlikely to be reused (for example, a misspelling).
The Anthology enforces the constraint that each name must always use the variant method or always use the ID method. This is to reduce the chance of a newly ingested paper having an author name that is not resolved correctly.
An ID can be any unique string that uses only characters allowed in URLs. Usually it is based on the author's canonical name, but in the case of two authors with a name in common, the ID could add a middle initial to distinguish them, or failing that, the current convention is to append the author's PhD institution to their ID (e.g., aravind-joshi-upenn
).
There are two other fields used to help identify people:
-
The
comment
field is displayed under a person's name on their Anthology page. It can be any text. Usually it lists past and current affiliations. -
If two people have the same canonical name, the Anthology automatically adds them to each other's "People with similar names" list. If there other people who have almost the same name, you can add their IDs to the
similar
field.