Skip to content

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Camel Component

License

Notifications You must be signed in to change notification settings

cbadenes/camel-oaipmh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

For more details about OAI-PMH see the documentation: http://www.openarchives.org/pmh/

OAI-PMH Component

The oaipmh component is used for polling OAI-PMH data providers. Camel will default poll the provider every 60th seconds.

Maven users will need to add the following dependency to their pom.xml for this component:

<dependency>
    <groupId>es.upm.oeg.camel</groupId>
    <artifactId>camel-oaipmh</artifactId>
    <version>x.x.x</version>
</dependency>

Note: The component currently only supports polling (consuming) feeds.

Note: You must include this repository in your pom.xml:

<repositories>
    <!-- GitHub Repository -->
    <repository>
        <id>camel-oaipmh-mvn-repo</id>
        <url>https://raw.github.com/cbadenes/camel-oaipmh/mvn-repo/</url>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>always</updatePolicy>
        </snapshots>
    </repository>
</repositories>

URI format

oaipmh:oaipmhURI

Where oaipmhURI is the URI to the OAI-PMH data provider to poll.

You can append query options to the URI in the following format, ?option=value&option=value&...

Options

Property Default Description
delay 60000 Delay in milliseconds between each poll
initialDelay 1000 Milliseconds before polling starts
userFixedDelay false Set to true to use fixed delay between pools, otherwise fixed rate is used. See ScheduledExecutorService in JDK for details.
verb ListRecords Future versions will handle ListIdentifiers, Identify, GetRecord, ListSets and ListMetadataFormats.
metadataPrefix oai_dc Specifies the metadataPrefix of the format that should be included in the metadata part of the returned records.
from Specifies a lower bound for datestamp-based selective harvesting. UTC DateTime value. After first request, this value is updated to current time if no upper bound is defined
until Specifies an upper bound for datestamp-based selective harvesting. UTC DateTime value.
set Specifies membership as a criteria for set-based selective harvesting.

Exchange data types

Camel initializes the IN body on the Exchange with a response message in XML format. For ListXX requests, Camel will return a message for each element of the list received.

OAI-PMH Data Format

The oaipmh component ships with an OAIPMH dataformat that can be used to convert between String (XML) and OAIPMHType model object (JaxB).

  • marshal = from OAIPMHType to XML String
  • unmarshal = from XML String to OAIPMHType More details about these xsd here.

A route using this would look something like this:

from("oaipmh://aprendeenlinea.udea.edu.co/revistas/index.php/ingenieria/oai?delay=60000").unmarshal().jaxb("es.upm.oeg.camel.oaipmh.model").to("mock:result");

The purpose of this feature is to make it possible to use Camel's lovely built-in expressions for manipulating OAI-PMH messages. As show below, an XPath expression can be used to filter the OAI-PMH message:

from("oaipmh://aprendeenlinea.udea.edu.co/revistas/index.php/ingenieria/oai?delay=60000").unmarshal().jaxb("es.upm.oeg.camel.oaipmh.model").filter().xpath("//item/request/set[contains(.,'physics')]").to("mock:result");

This work is funded by the EC-funded project DrInventor (www.drinventor.eu).

About

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Camel Component

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages