-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor LCSH to be a separate detector
Squash refactor Squash refactor Update metrics tests and fixtures More tests
- Loading branch information
1 parent
c533e30
commit 8693617
Showing
15 changed files
with
199 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# frozen_string_literal: true | ||
|
||
class Detector | ||
# Detector::LCSH is a very rudimentary detector for the separator between levels of a Library of Congress Subject | ||
# Heading (LCSH). These subject headings follow this pattern: "Social security beneficiaries -- United States" | ||
class Lcsh | ||
attr_reader :identifiers | ||
|
||
def initialize(term) | ||
@identifiers = {} | ||
term_pattern_checker(term) | ||
end | ||
|
||
def self.record(term) | ||
results = Detector::Lcsh.new(term.phrase) | ||
|
||
results.identifiers.each_key do |k| | ||
Detection.find_or_create_by( | ||
term:, | ||
detector: Detector.where(name: 'LCSH').first | ||
) | ||
end | ||
end | ||
|
||
private | ||
|
||
def term_pattern_checker(term) | ||
subject_patterns.each_pair do |type, pattern| | ||
@identifiers[type.to_sym] = match(pattern, term) if match(pattern, term).present? | ||
end | ||
end | ||
|
||
def match(pattern, term) | ||
pattern.match(term).to_s.strip | ||
end | ||
|
||
def subject_patterns | ||
{ | ||
separator: /(.*)\s--\s(.*)/ | ||
} | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,9 @@ isbn: | |
issn: | ||
name: 'ISSN' | ||
|
||
lcsh: | ||
name: 'LCSH' | ||
|
||
pmid: | ||
name: 'PMID' | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'test_helper' | ||
|
||
class Detector | ||
class LcshTest < ActiveSupport::TestCase | ||
test 'lcsh detector activates when a separator is found' do | ||
true_samples = [ | ||
'Geology -- Massachusetts', | ||
'Space vehicles -- Materials -- Congresses' | ||
] | ||
|
||
true_samples.each do |term| | ||
actual = Detector::Lcsh.new(term).identifiers | ||
|
||
assert_includes(actual, :separator) | ||
end | ||
end | ||
|
||
test 'lcsh detector does nothing in most cases' do | ||
false_samples = [ | ||
'orange cats like popcorn', | ||
'hyphenated names like Lin-Manuel Miranda do nothing', | ||
'dashes used as an aside - like this one - do nothing', | ||
'This one should--also not work' | ||
] | ||
|
||
false_samples.each do |term| | ||
actual = Detector::Lcsh.new(term).identifiers | ||
|
||
refute_includes(actual, :separator) | ||
end | ||
end | ||
|
||
test 'record method does relevant work' do | ||
detection_count = Detection.count | ||
t = terms('lcsh') | ||
|
||
Detector::Lcsh.record(t) | ||
|
||
assert_equal(detection_count + 1, Detection.count) | ||
end | ||
|
||
test 'record does nothing when not needed' do | ||
detection_count = Detection.count | ||
t = terms('isbn_9781319145446') | ||
|
||
Detector::Lcsh.record(t) | ||
|
||
assert_equal(detection_count, Detection.count) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.