-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor LCSH to be a separate detector
Squash refactor Squash refactor Update metrics tests and fixtures
- Loading branch information
1 parent
c533e30
commit 6592996
Showing
14 changed files
with
152 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# frozen_string_literal: true | ||
|
||
class Detector | ||
# Detector::LCSH is a very rudimentary detector for the separator between levels of a Library of Congress Subject | ||
# Heading (LCSH). These subject headings follow this pattern: "Social security beneficiaries -- United States" | ||
class Lcsh | ||
attr_reader :identifiers | ||
|
||
def initialize(term) | ||
@identifiers = {} | ||
term_pattern_checker(term) | ||
end | ||
|
||
def self.record(term) | ||
foo = Detector::Lcsh.new(term.phrase) | ||
|
||
foo.identifiers.each_key do |k| | ||
Detection.find_or_create_by( | ||
term:, | ||
detector: Detector.where(name: 'LCSH').first | ||
) | ||
end | ||
end | ||
|
||
private | ||
|
||
def term_pattern_checker(term) | ||
subject_patterns.each_pair do |type, pattern| | ||
@identifiers[type.to_sym] = match(pattern, term) if match(pattern, term).present? | ||
end | ||
end | ||
|
||
def match(pattern, term) | ||
pattern.match(term).to_s.strip | ||
end | ||
|
||
def subject_patterns | ||
{ | ||
separator: /(.*)\s--\s(.*)/ | ||
} | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,9 @@ isbn: | |
issn: | ||
name: 'ISSN' | ||
|
||
lcsh: | ||
name: 'LCSH' | ||
|
||
pmid: | ||
name: 'PMID' | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'test_helper' | ||
|
||
class Detector | ||
class LcshTest < ActiveSupport::TestCase | ||
test 'lcsh detector activates when a separator is found' do | ||
true_samples = [ | ||
'Geology -- Massachusetts', | ||
'Space vehicles -- Materials -- Congresses' | ||
] | ||
|
||
true_samples.each do |term| | ||
actual = Detector::Lcsh.new(term).identifiers | ||
|
||
assert_includes(actual, :separator) | ||
end | ||
end | ||
|
||
test 'lcsh detector does nothing in most cases' do | ||
false_samples = [ | ||
'orange cats like popcorn', | ||
'hyphenated names like Lin-Manuel Miranda do nothing', | ||
'dashes used as an aside - like this one - do nothing', | ||
'This one should--also not work' | ||
] | ||
|
||
false_samples.each do |term| | ||
actual = Detector::Lcsh.new(term).identifiers | ||
|
||
refute_includes(actual, :separator) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters