Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

Funder identifier issue fix #3

Open
wants to merge 112 commits into
base: master
Choose a base branch
from

Conversation

MicheleMorelli
Copy link

@MicheleMorelli MicheleMorelli commented Jan 14, 2019

Hello,

This is a small change in the light of the fact that Datacite schema 4.0 and 4.1 specify that the funder's ID should map to funderIdentifier and not to funderId.

https://schema.datacite.org/meta/kernel-4.0/doc/DataCite-MetadataKernel_v4.0.pdf

Thanks!
Michele

drtjmb and others added 30 commits December 19, 2014 15:58
thus taking advantage of config options
and datacite_core.pl code too
plus use of new config options
better name handling, use of new config etc
Added config param to check for DataCiteXML shcema version #2 #3 (Ideally the event will ask which schema version datacite will use, but for now we can set it in the config)
Updating the schema version to 4.0
goetzk and others added 30 commits September 27, 2018 09:57
If a document has no 'content' value $content will be empty and the
condition will generate a warning.
I attempted to include this in the original version of the changes but couldn't
get it working until I stumbled upon
https://wiki.eprints.org/w/index.php?title=My_First_Bazaar_Package and its
example Hello.pm.

Message now includes the matched regex and doi prefix in the error message,
bolded for readability.
"Text15" appears to have been copied verbaitum from
DataCite-MetadataKernel_v4.0.pdf , where "Text" appears with superscript "15"
to indicate a footnote.

For anyone playing along at home, in v4.1 the relevant footnote is 18.
Offer WWW::Curl as an option for API connections

The cause of this development was the age of LWP in RHEL6.10 can't handle SNI servers - used by DataCite for their API.

For more on the LWP age issue and curl updates which ensure it works see these issues.
libwww-perl/LWP-Protocol-https#17 LWP SNI fix from 6.07
curl/curl#700 - Curl updates in RHEL 6.7 and 6.8

This branch/change allows curl to be used opt in, defaulting to LWP, and should not change any existing behaviours (those breaking changes were in #23 and #33).

There are also some other changes which have come through as I've tried to keep various files (like configuration and README) in sync.
Somehow this went missing during my merge, adding back in.
In #16 (specifically cbf2a2c) the codebase was
changed to forcibly add the "Default publisher" value to all submissions as it
wasn't being (reliably) added. In the rework for 2.1.0 (I'm looking at
fd41e34 but its other commits are relevant
too) the behaviour changed and now publisher is added if available.

This has resulted in the unfortunate situation that:
* If you have no publisher, none is added
* If you do have a publisher, a second is added

Not the situation we want:
* If you have no publisher, one is added
* If you have a publisher, nothing changes
Turns out the correct way to conditionally import - use - a module is to eval()
them. I've now set up all modules required by DataCiteEvent to eval() in and to
print a log message if something is missing.
This change allows diagnosing the XML supplied to DataCite by logging the
entire lot rather than just mentioning the DOI.

Authentication wise this should be safe to do, as the username and password are
not included directly in the XML.
This file needs to be regenerated in its entirity but I'm updating its version
field as I'm about to tag the release candidate.
Unlike other mapping functions, funders assumed that its values were available
and required. This change makes checking eprints for funder and project
information optional to better reflect the standard and to facilitate a switch
from eprint value based attributes to mapping based attributes.
In essense, this change removes the depencence on specific eprints fields
having values when performing the XML export.

This expands on the change in 336a27c
("reworked Export plugin so that *any* EPrint field can be mapped if
corresponding sub is found in zzz_datacite_mapping") and removes the change
required in f8a3259 becuase now we run all
maps.

It moves from an 'eprint first' configuration to a 'mapping function first'
which permits:
- Setting default values (like Publisher)
- Choosing the eprint fields data comes from (You have a custom date field? no worries).
- Adding new values to output XML by adding a new mapping function
- Basically, running arbitrary code. But don't do that.

Of course, anyone adding new mappings or overriding existing mappings will need
to handle the resulting validation problems themselves...

This is the export components work required to complete #35; optional changes
to validation would facilitiate the second part but validation can be done via
overrides per site.

Closes: #35
Firstly (in terms of execution order) is a change to the date validation to
make explicit the intent - that date and date type must be set and date type
must equal published.
I believe this also resolves a matching bug.

Secondly is a relaxing of the date requirements fro datacite_mapping_date to
stop checking date_type is published. If date type is was unset this would pass
validation then cause an error to be printed due to an empty value being used
in string equality testing.
Further tightening the check in datacite_mapping_date was a possibility but
since validation is run first it seems reasonable that datacite_mapping_date be
more trusting (and may even make things more flexibile for those who customise
their validation).
corp_creators is a multi value field but was being treated as a string; its now
looped and all corp_creators are added to <creators>.
I've just had a situation where the type of my eprint wasn't known in typemap
so things fell over at the api submission stage. I've added some validation in
this commit but haven't included the item type - I haven't checked if its site
specific yet.
Minor wording change
Somehow this slipped through my transferring from dev to repository - this
particular validation routine uses 'eprint' not 'dataobj' as its eprint
identifier.
Now mentions the problem EPrint so looking through the submitted XML is no
longer required.
The original warning system meant a warning was generated on every import, even
if the relevant libraries weren't intended for use. By importing once we load
sub datacite_doi we can check the repository configuration and only warn
conditionally based on configuration.
Change SSL library imports to reduce verbosity
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants