GitHub - marktriggs/nla-browse-handler: VuFind browse Solr extension

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
browse-handler/java/au/gov/nla/solr/handler		browse-handler/java/au/gov/nla/solr/handler
browse-indexing		browse-indexing
common/java/au/gov/nla/util		common/java/au/gov/nla/util
libs		libs
tests		tests
AUTHORS		AUTHORS
LICENSE		LICENSE
README		README
build.xml		build.xml

Repository files navigation

=======================================================================
 Moved!
=======================================================================

 This repository has moved to a new home under the VuFind project.
 Find the new repository here:

   https://github.com/demiankatz/vufind-browse-handler

=======================================================================




       Care and feeding of the NLA Solr browse request handler

			     Mark Triggs

	       The National Library of Australia, 2009


0.  Background

 This Solr plugin was developed to support the browse functionality of
 the National Library of Australia's Catalogue
 (http://catalogue.nla.gov.au).  Please read the LICENSE file that
 accompanies this file for details regarding the distribution of this
 software.



1.  Compiling it

 You'll need Ant to get everything compiled:

  ant jars -Dsolr.war=/path/to/my/webapps/solr.war

 should give you the two required jar files:

   browse-handler.jar
   browse-indexing.jar

  

2.  Creating your browse indexes

 2.1.  Index your authority data

  This step creates a Lucene index of the "see also" and "use
  instead" linkages from the authority data.  Note that if you're using VuFind
  you can stil this step because VuFind has its own authority index we use
  instead.

    java -cp browse-indexing.jar IndexAuth /path/to/a/dump/of/your/authority-data.mrc authority-index


 2.2.  Create lists of headings for browsing.

  Now we produce a list of the headings we want to browse over.  We want to browse on:

    * Any term that appears in a particular index of our bib data (e.g. subject-browse)

    * Any non-preferred term from our authority index whose preferred
      form is linked to from our bib data (i.e. appears in the above index).

  The PrintBrowseHeadings class does this: grabs headings from these
  sources, produces a sort key for each heading and prints out a big
  file with lines of the form:

    <Sort key>^A<Heading>

  Running it:

    java -cp browse-indexing.jar PrintBrowseHeadings /path/to/your/bib/data/index subject-browse authority.index subjects.tmp
    java -cp browse-indexing.jar PrintBrowseHeadings /path/to/your/bib/data/index author-browse authority.index names.tmp

  By default this assumes you're using my default field names in your authority index, which are:

    * preferred (1xx)
    * insteadOf (4xx)

  If you're not, you can provide the field names using Java system properties
  on the above command lines.  For example, VuFind uses:

    -Dfield.preferred=heading -Dfield.insteadof=use_for


  Next we just need to remove any duplicates.  I do this using the GNU
  sort program from the command-line because it's amazingly fast even on
  big files:

    sort -T /var/tmp -u --field-separator=$'\1' -k1 subjects.tmp -o sorted-subjects.tmp
    sort -T /var/tmp -u --field-separator=$'\1' -k1 names.tmp -o sorted-names.tmp



 2.3.  Creating the SQLite DB

  The last step is to load all the headings into an SQLite database
  (which acts as the browse index, effectively).  CreateBrowseSQLite
  does this:

    java -cp browse-indexing.jar CreateBrowseSQLite sorted-names.tmp namesbrowse.db
    java -cp browse-indexing.jar CreateBrowseSQLite sorted-subjects.tmp subjectsbrowse.db


  And that's the indexing process.  At the end of this you should have
  one SQLite database per browse type, and an index of your authority
  data.  Everything else is disposable!




3.  Configuring Solr

 3.1.  Jar files

  Now that we've got our indexes built, we just need to configure the
  Browse request handler to use them.  Start by copying the
  browse-handler to Solr's lib directory.

    cp browse-handler.jar solr/WEB-INF/lib



 3.2.  Solr configuration

  Then configure your browse types in solrconfig.xml:

    <requestHandler name="/browse" class="au.gov.nla.solr.handler.BrowseRequestHandler">
       <str name="authIndexPath">/path/to/your/authority.index</str>
       <str name="bibIndexPath">/path/to/your/bib/data/index</str>

       <str name="sources">names,subjects</str>

       <!-- These definitions should match the field names used in the authority index. -->
       <str name="preferredHeadingField">preferred</str>
       <str name="useInsteadHeadingField">insteadOf</str>
       <str name="seeAlsoHeadingField">seeAlso</str>
       <str name="scopeNoteField">scopeNote</str>

       <lst name="names">
	 <str name="DBpath">/path/to/your/namesbrowse.db</str>
	 <str name="field">author-browse</str>
       </lst>

       <lst name="subjects">
	 <str name="DBpath">/path/to/your/subjectsbrowse.db</str>
	 <str name="field">subject-browse</str>
         <str name="dropChars">[]()',</str>
       </lst>
    </requestHandler>



 3.3.  Testing

  Finally, start up Solr and test that things are working:

    http://yourhost.example.com:8080/solr/browse?source=subjects&from=boats&rows=20



4.  Running updates

 At the time of writing we are updating our authority and browse
 indexes once per night at the same time we update our bib indexes.
 The browse request handler has been designed to automatically detect
 updates to these indexes and reloads them as required.  The steps are
 simple:

   mv mybrowse.db mybrowse.db.old;  mv mybrowse.db.new mybrowse.db
   my authority.index authority.index.old; mv authority.index.new authority.index