Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to support multiple author affiliations #7135

Open
kmccurley opened this issue Jun 16, 2021 · 19 comments
Open

Need to support multiple author affiliations #7135

kmccurley opened this issue Jun 16, 2021 · 19 comments
Assignees
Labels
Enhancement:1:Minor A new feature or improvement that can be implemented in less than 3 days.
Milestone

Comments

@kmccurley
Copy link

kmccurley commented Jun 16, 2021

Apologies if this is already present in another issue or an internal development plan. I searched in issues but could not find anything related to multiple affiliations for an author.

A recent study of 22 million articles published in 2019 showed that "almost one in three publications was (co-)authored by authors with multiple affiliations..." and "the share of authors with multiple affiliations increased from around 10% to 16% since 1996." The fact that OJS does not support multiple affiliations for authors means it is increasingly out of step with the realities of academic publishing, and my organization is reluctant to continue using OJS for this reason (among others).

I believe that an author should be able to specify multiple affiliations for a submission. This meshes nicely with the need to uniquely identify affiliations through the use of ROR identifiers. The use of a free text field alone for affiliations makes it difficult for machines to determine that "UC Berkeley" and "University of California - Berkeley" and "University of California, Berkeley" are in fact the same institution.

Having just installed the latest version of OJS, I noticed that affiliation information is stored in the underlying database as a row in author_settings using setting_name of affiliation, but the underlying table has a unique key of (author_id,locale,setting_name), which makes it impossible to store multiple affiliations unless the information is encoded in some way within the setting_value field. Our authors are currently listing multiple affiliations with ; to separate them, but this is a bad practice for the future (much like journals entering bogus email addresses when the field was required).

As I mentioned before, the use of multiple affiliations is already an extremely common practice. The listing of affiliations has multiple purposes, including citation analysis to rank institutions, which strongly affects their funding. The attachment of an affiliation to a paper also strongly influences the reputation of the paper itself, and the inability to list multiple affiliations contributes to a "winner-take-all" attribution of credit, which is damaging to second-tier institutions and their authors.

Accuracy of affiliations is also important for identifying potential conflict of interest among reviewers.

Multiple affiliations are already supported by the following:

  1. the schema for the native plugin XSD allows multiple affiliation tags per author, but when you import to OJS it appears to discards this information. Obviously export fails to report multiple affiliations since the database only holds one.
  2. the pubmed/medline XML format
  3. the doaj xml format
  4. the datacite xml format
  5. the dublin core supports multiple affiliations, but Recommentation 6 strangely says they should be associated to papers rather than authors. Dublin core is lagging in other things like ORCID IDs.
  6. the crossref api supports them (see their schema)
  7. the DOAJ schema supports it, but the DOAJ export plugin is limited to a single affiliation. They are now compatible with crossref.
  8. medra/onix but the medra/ONIX export plugin cannot supply them.

Obviously other publishers have already embraced multiple affiliations. ACM has started capturing structured representations in their LaTeX class:

\affiliation{%
\institution{University of New South Wales}
\department{School of Biomedical Engineering}
\streetaddress{Samuels Building (F25), Kensington Campus}
\city{Sidney}
\state{NSW}
\postcode{2052}
\country{Australia}})

This can be useful in case the affiliation does not have a ROR ID, or the author wishes to define it as within a department or institute of a ROR entity (ROR does not catalogue these).

Unfortunately the schema of having a single affiliation is buried deeply in the codebase for OJS. Obviously the core developers of OJS are best able to understand a path forward for addressing this. IOne possible interim solution is to define a new field in author_settings with setting_name of affiliationList. Then populate this with a JSON encoding that can have version information inside it. The code that uses $author->getAffiliation can over time be migrated to $author->getAffiliationList() to return a list of affiliations (perhaps with different locales!). An alternative is to allow author_settings to have multiple values for a given setting_name.


PRs:

@asmecher
Copy link
Member

(Somewhat cross-posted: #5912 (comment))

@kmccurley, we often support a "dual-track" toolset for features in OJS:

  • a bone-simple approach that's usable by pretty much anyone (sort of like the balance struck with DC, which is universally applicable but rarely specific enough), and
  • a more thoroughbred approach for those who need it and can live with the additional constraints that are often involved.

For example, the built-in OJS search engine works out of the box, but is not very feature-rich and has limited scalability, but the Lucene/SOLR plugin is available for those who need the additional tools and have the capacity to run the necessary service.

I'm hesitant to make the current "affiliation" field any smarter than it currently is because of some inherent constraints:

  • There is no disambiguation in author records, i.e. if Jane Smith authors two articles in the same OJS installation, there will be two Jane Smith records. (This is done so that an individual's profile can "evolve" over time, but those changes won't affect past authorships that may have been done at a different institution or under a different name, which should preserve the value at time of publication.)
  • The affiliation field is not machine-readable, so even if it were extended to support multiple institutions, its functionality would be limited anyway. (An author could already enter "Simon Fraser University; Stanford University" in the existing field, which is only slightly less helpful than having them stored in an array.)

I think machine-readability is a precursor to the work you propose, thus working with RORs directly (where you mentioned this before) or possibly via the ORCID API and the author's ORCID record.

For your own use case, are RORs or author ORCIDs a workable approach?

@kmccurley
Copy link
Author

Let's stay focused on the issue at hand: multiple affiliations per author. That's a critical deficiency of the OJS schema for what is stored about an author.

The reason I mentioned the other issues is partly because OJS is falling behind, and when you make a schema change, you should anticipate all of the requirements for publishing. Machine-readable metadata is absolutely necessary for any serious publishing platform, because all reputation scores are based upon it, and research funding agencies are increasingly demanding it. As I mentioned earlier, almost every metadata format for publishing now supports multiple affiliations.

@NateWr NateWr added the Enhancement:1:Minor A new feature or improvement that can be implemented in less than 3 days. label Jun 16, 2021
@NateWr
Copy link
Contributor

NateWr commented Jun 16, 2021

As @asmecher says, we are likely to pursue support for multiple affiliations through extensions to the ROR plugin and/or the ORCID plugin. That's because these approaches offer the possibility to support affiliation disambiguation and machine-readability.

Our schema is extensible using plugins, which means that we can store a single affiiliation record by default:

University of Bern, University of Pisa

And plugins can enrich that by storing additional data alongside author records:

[
  {
    "ror": "03rjyp183",
    "name": {
        "en_US": "University of Bern",
        ".._..": "...",
    },
  },
  {
    "ror": "03y4dt428",
    "name": {
        "en_US": "University of Pisa",
        ".._..": "...",
    },
  }
]

That can then be used to enrich records sent to downstream consumers, like Crossref, Datacite, etc. Keeping the plain text field as a base provides flexibility that's important to satisfy all of the different use cases of our community, as you can see in this example.

@kmccurley
Copy link
Author

I've spent some time reading the plugin documentation, looking at other plugins, and reading the core code. I've concluded that it's probably not trivial to write a plugin, and may not even be possible without modifying the core code. The problem is that there are many parts of the code that depend upon $author->getAffiliation() returning a string. The whole point would be to capture more sensible metadata, but that means every other plugin that exports metadata would need to be modified. That includes doaj, native, googleScholar, users, ROR, and perhaps others. This would introduce far too many dependencies between existing plugins.

I'm surprised that nobody has flagged this before, given the reality of publishing practice. It sounds like it won't happen soon and I should look toward developing our own alternative.

@asmecher
Copy link
Member

Heads-up that we're likely going to be implementing multiple affiliation support in the core (OJS, OMP, OPS) as part of the work to integrate ROR support into the applications. The thinking goes like this:

  • It'll probably never be a good idea to force affiliation data to conform to ROR. (Legacy data; incomplete ROR organization list; missing translations on ROR data; disagreements over appropriate level of affiliation specificity; etc.)
  • So, we'll have to support a mix of ROR and plain-text affiliation data.
  • Users will expect to be able to select multiple affilations when using RORs, however...
  • It would be confusing and incomplete to support multiple affiliations for RORs (necessary) but not for plain-text affiliations.

@kmccurley
Copy link
Author

This is good news. While you're making a change, it's worth thinking about why you collect affiliations at all. Possible reasons are:

  1. for reporting to institutions and funding agencies who want to track their publications. crossref provides search on this. This is where ROR becomes important, because free text search can have too many false positives and negatives (e.g., what is USC?).
  2. for readers to see the affiliation information on the website. This is where free-text becomes important, because there are a large number of relationships to institutions that can only be expressed in text (e.g., "work started while visiting" or joint appointments in two departments, etc).
  3. for alignment to other schemas that might be required in a publisher's workflow. Notable examples include JATS and crossref.

I'd particularly recommend looking over the JATS aff tag and the crossref institution tag to see how they structure affiliations and their identifier. They have thought carefully about how to structure this information.

This is also related to how authors express their funding relationships, which is different than an affiliation. crossref has announced that they will be transitioning from using their Open funder registry to ROR, so ROR identifiers will be useful there.

@Devika008
Copy link

Hello,

Here's my proposal for adding multiple affiliations for authors and users in OJS. This includes support for both ROR and non-ROR affiliations.

You van view the workflow here: https://youtu.be/FHwF4yBwzEA

Some considerations:

  1. As the user types the name of the institute, suggestions from ROR-affiliated institutes will appear. ROR-affiliated institutes will be marked with the ROR logo. A URL symbol in the dropdown will link to the institute's page for verification.
  2. When a ROR-affiliated institute is selected, the multilingual/translations section will indicate that all translations are complete, as they will be pulled from the ROR database. These fields will be non-editable.
  3. After clicking "Add," the institute will be added, and a new row will appear, allowing the user to input another affiliation.
  4. If no suggestions appear while typing, the institute is non-ROR affiliated. The user can still input the institute, but they will need to provide translations for the multilingual section.
  5. Users can add as many affiliations as needed.

@asmecher @GaziYucel @bozana please add more considerations if I have missed any

@kmccurley
Copy link
Author

I'm not a member of the OJS development team, but I think your UI looks quite nice.

The goal of any UI is to help the user complete the task at hand with minimal fuss. There are several side constraints to consider here:

  1. how much is preloaded in the browser? The ROR database is quite large (47MB for our JSON version). We started preloading a snapshot of minimal information, but even that proved to be too much so we switched to a server-side search index that responds very quickly (50ms or so) so we can hit it with every keystroke (and kill previous attempts that are still in flight). It has to respond fast (<100ms) or else it will annoy the user.
  2. We use a variant where our dropdown also shows acronyms and alt names for an organization. This can help the user to disambiguate the alternatives. If someone types UCSD, the response has the full name and acronym. A query like USC is really ambiguous so it helps to show more information to the user. Ideally you would select the language of the user, but in our case we only deal with English.
  3. The ROR database has relationships in it (children, parents, and related) and a variety of "alt names". It's not easy to show all of these in a dropdown, so people may choose the lowest common denominator. If you take the example of UCSD, it has six children organizations so they could be considered a match for the query UCSD.

One of our sites uses a dropdown but encourages user's to click on something to refine their query. We opted for this because we wanted to tell users how to encode the most accurate information possible into their LaTeX document. I think you could figure out how to merge this into the OJS workflow, but perhaps it's too complicated to drill down on the relationships of an organization. I think it just illustrates how complex the choice is for an author to select an affiliation from the ROR database.

@mpbraendle
Copy link
Contributor

It's now three years since this issue has been opened, and still no progress. As @kmccurley pointed out in his OP, it is a clear requirement due to publishing practice, especially for journals in the medical and sciences field (and we have a lot of articles examples where we need to use a specific separator such as "; " to distinguish multiple affiliations and use the separator to split them e.g. for PubMed export).

This issue needs to relabelled to a major enhancement and a milestone should be set.

As long as this is not solved, I can't recommend to my teams to install the ROR plugin (although ROR is a fine solution for organization disambiguation).

@asmecher
Copy link
Member

@mpbraendle, development on this is currently underway as part of the RoR integration into OJS 3.5. @GaziYucel, maybe you can share a couple quick details?

@GaziYucel
Copy link
Contributor

Hi @mpbraendle, thank you for your interest.

As @asmecher pointed out above, I am currently working on this. Plan is to release this with the OJS 3.5 release.

You can view the workflow here: https://youtu.be/FHwF4yBwzEA

The PR where I referenced this issue is a part of the ROR / multiple affiliations integration into the core. This PR is solely to get the ROR dataset data dump into the OJS database. This will be used for lookups, because using the ROR api for lookups seems to slow. This will add approximately 40MB in the database, refreshed bi-weekly.

If you are interested in the development flow, this is the branch I am working on https://github.com/GaziYucel/pkp-lib/tree/multiple-author-affiliations

We decided to implement the new UI interfaces as you can see in the video, which we think is much better than before. This will make the interface more future proof and more accessible.

@bozana
Copy link
Collaborator

bozana commented Nov 7, 2024

Hi all,
I will document here the 'difficult/unclear' cases of multiple affiliation and ROR export, and how we currently decided to solve it:

<subfield code="u">affiliation 1</subfield> 
<subfield code="u">affiliation 2</subfield> 
<subfield code="u">https://ror.org/rorId</subfield>
  • for oai_rfc1807 we will remove the affiliation fully -- earlier we provided the affiliation together/concatenated with the author full name (in the element author) which is not correct according the this document https://www.rfc-editor.org/rfc/rfc1807, and it seems there is no other element for author affiliations.
  • in the article report plugin we will use one column for all affiliations (separated by ';')
  • native import/export schema will be extended to support multiple and ROR affiliations: We will introduce the additional element rorAffiliation for the author, that could occur several times and that would contain the element ror and localized element name. The existing localized element affiliation could be used for the authors affiliations without ROR.

@GaziYucel
Copy link
Contributor

GaziYucel commented Nov 7, 2024

Today I analysed the ROR data dump again. Out of 111325 entries, 32641 entries have "no_lang_code". This means that ROR does not know the language for almost a third. This is much worse than I previously thought. This introduces a problem localizing the data.

I evaluated several possibilities:

  1. replace "no_lang_code" with a default locale, for example English (en);
  2. use the country code to find the language (all country codes are filled in);
  3. use ror_display_name if language is unknown, ignore language.

First option will add more noise to the data dump. (thanks Bozana)

In my opinion, the second option is something ROR should do for:

  • country codes with a Latin language;
  • countries with a single language.

Assuming that those entries are in their primary language is a good assumption.

For our end, I think the last one is the best solution.

If anyone is interested, a recent dump can be found here: https://zenodo.org/api/records/14020449/files/v1.55-2024-10-31-ror-data.zip/content

@Devika008
Copy link

I too think the last option detailed by @GaziYucel seems like our best bet.

@bozana
Copy link
Collaborator

bozana commented Nov 8, 2024

Hi @GaziYucel, I understand the ror_display_name = name in ror_display_lang, correct?
We were saving/inserting the column 'names.types.label' and the 'ror_diplay_lang'. -- I believe this is what you are importing now (when I went through the code). Do you suggest that we save/insert ror_display_name, instead of ror_display_lang? -- I do not think we would need both... 🤔
When we need to match the ROR name with our UI and form locales: Lets say we have UI or form locale DE but there is no ROR name in DE, we would take the ROR name in the ror_display_lang. I believe this is the same as you suggest, to then take/display the ror_display_name, no?
So I think it is the same if we are saving ror_display_lang or ror_display_name. When something changes in ROR DB both columns should be accordingly changed, I suppose.

If we would insert ror_display_name then we would maybe not need to insert the names in no_lang_code. Lets think if this would be better (e.g. performance) or if we could then miss something... 🤔

And yes, we should not try to change or interpret/guess anything, and just take the ROR DB as it is.

@bozana
Copy link
Collaborator

bozana commented Nov 8, 2024

And when we are speaking about columns that we would like to have locally: what about the column 'active'? Are there any inactive RORs? And what does this exactly mean?
We do not consider the column anywhere. How should we consider it? E.g should we allow new affiliations with inactive RORs?

@GaziYucel
Copy link
Contributor

@bozana I thought about inserting the ror_display_name into rors table, but this would be a clone of what is in ror_settings table. And we are already querying ror_settings table, I don't think we will gain much in performance if any. I think this is correct as it is.

According to ROR, the name for ror_diplay_lang should also be in the names (names.types.label) field.
I check if this is the case, otherwise add a new row in ror_settings with ror_display_lang => ror_display_name.
Because this is not always the case I found out.

@GaziYucel
Copy link
Contributor

GaziYucel commented Nov 8, 2024

Regarding isActive.

I forgot to mention this in the questions/notes I wrote in the PR.

This field can have three values: active, inactive or withdrawn. See https://ror.readme.io/docs/data-structure#status.
Now active (value 1) and inactive/withdrawn (value 0) is copied to rors table, and as you already noticed don't do anything with it.

I didn't fully implement this, because I couldn't figure out how to use these values for rors tables and the already saved affiliations for authors.

Statistics:

  • active: 109340 records
  • inactive: 742 records
  • withdrawn: 1243 records

Total: 111325 records (2024-10-31 data dump)

@bozana
Edit:
I have been thinking about this and want to propose the following.

I will update the rors table as it is now, all active gets a "1", others will become a "0". On the contributor screen, I will filter out all which is not active. Result will be that only active ones can be added, ignoring all "inactive" and "withdrawn".

All affiliations, which are already added to authors will be left alone. This way, we will follow ROR again, because this will mean that older publications will be historically accurate.

During an upgrade from an earlier version, the ROR cache is filled (tables rors and ror_settings). After this a migration is done for all affiliations currently saved for the authors. First a match is searched for all ROR on an exact match for the names. After this all remaining is migrated as a non-ROR affiliation.
I want to propose to ignore the isActive field for these, because this will also reflect an accurate history. This is how it is implemented now.

@bozana
Copy link
Collaborator

bozana commented Nov 11, 2024

Hi @GaziYucel, let me check something, then I will come back to you again regarding ROR statuses.
Else, yes, ideally, the inactive and withdrawn could not be selected for new affiliations. However, for existing affiliations I believe we will need to display them -- else the user will see empty field.

@mreiko mreiko moved this from Backlog to Under Development in Metadata and Distribution Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement:1:Minor A new feature or improvement that can be implemented in less than 3 days.
Projects
Status: Under Development
Status: In Progress
Development

No branches or pull requests

7 participants