-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Why are these changes being introduced: * Implement data models for counting algorithm matches for all Terms Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TCO-17 See also: * https://github.com/MITLibraries/tacos/blob/main/docs/architecture-decisions/0005-use-multiple-minimal-historical-analytics-models.md How does this address that need: * Creates a new model `AggregateMatch` * Adds methods to run each (current) StandardIdentifier algorithm on each Term (via the SearchEvents) * Adjusts `MontlyMatch` counting algorithm to be useful for both cases and extracts it to a module which is imported into both Classes Document any side effects to this change: * A schedulable job to run this automatically is out of scope and will be added under a separate ticket * The tests are identical between this and `MontlyMatch`. There may be a way to avoid the duplication and thus ensure both get relevant updates but it was not clear to me how to do that in an obvious way at the time of this work.
- Loading branch information
Showing
8 changed files
with
198 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# frozen_string_literal: true | ||
|
||
# == Schema Information | ||
# | ||
# Table name: aggregate_matches | ||
# | ||
# id :integer not null, primary key | ||
# doi :integer | ||
# issn :integer | ||
# isbn :integer | ||
# pmid :integer | ||
# unmatched :integer | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
|
||
# AggregateMatch aggregates statistics for matches for all SearchEvents | ||
# | ||
# @see MonthlyMatch | ||
class AggregateMatch < ApplicationRecord | ||
include MatchCounter | ||
|
||
# generate data for all SearchEvents | ||
# | ||
# @note This is expected to only be run once per month, ideally at the beginning of the following monthto ensure as | ||
# accurate as possible statistics. Running further from the month in question will work, but matches will use the | ||
# current versions of all algorithms which may not allow for tracking algorithm performance | ||
# over time as accurately as intended. | ||
# @todo Prevent running more than once by checking if we have data and then erroring? | ||
# @return [AggregateMatch] The created AggregateMatch object. | ||
def generate | ||
matches = count_matches(SearchEvent.all) | ||
AggregateMatch.create(doi: matches[:doi], issn: matches[:issn], isbn: matches[:isbn], | ||
pmid: matches[:pmid], unmatched: matches[:unmatched]) | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# frozen_string_literal: true | ||
|
||
# Counts matches supplied events | ||
module MatchCounter | ||
# Counts matches supplied events | ||
# | ||
# @note We currently only have StandardIdentifiers to match. As we add new algorithms, this method will need to | ||
# expand to handle additional match types. | ||
# @param events [Array of SearchEvents] An array of SearchEvents to check for matches. | ||
# @return [Hash] A Hash with keys for each known standard identifier and the count of matched search events. | ||
def count_matches(events) | ||
matches = Hash.new(0) | ||
known_ids = %i[unmatched pmid isbn issn doi] | ||
|
||
events.each do |event| | ||
ids = StandardIdentifiers.new(event.term.phrase) | ||
|
||
matches[:unmatched] += 1 if ids.identifiers.blank? | ||
|
||
known_ids.each do |id| | ||
matches[id] += 1 if ids.identifiers[id].present? | ||
end | ||
end | ||
|
||
matches | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
class CreateAggregateMatches < ActiveRecord::Migration[7.1] | ||
def change | ||
create_table :aggregate_matches do |t| | ||
t.integer :doi | ||
t.integer :issn | ||
t.integer :isbn | ||
t.integer :pmid | ||
t.integer :unmatched | ||
t.timestamps | ||
end | ||
end | ||
end |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# == Schema Information | ||
# | ||
# Table name: aggregate_matches | ||
# | ||
# id :integer not null, primary key | ||
# doi :integer | ||
# issn :integer | ||
# isbn :integer | ||
# pmid :integer | ||
# unmatched :integer | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
|
||
# This model initially had no columns defined. If you add columns to the | ||
# model remove the "{}" from the fixture names and add the columns immediately | ||
# below each fixture, per the syntax in the comments below | ||
# | ||
one: {} | ||
# column: value | ||
# | ||
two: {} | ||
# column: value |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# == Schema Information | ||
# | ||
# Table name: aggregate_matches | ||
# | ||
# id :integer not null, primary key | ||
# doi :integer | ||
# issn :integer | ||
# isbn :integer | ||
# pmid :integer | ||
# unmatched :integer | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
require 'test_helper' | ||
|
||
class AggregateMatchTest < ActiveSupport::TestCase | ||
test 'dois counts are included in aggregation' do | ||
aggregate = MonthlyMatch.new.generate(DateTime.now) | ||
assert aggregate.doi == 1 | ||
end | ||
|
||
test 'issns counts are included in aggregation' do | ||
aggregate = MonthlyMatch.new.generate(DateTime.now) | ||
assert aggregate.issn == 1 | ||
end | ||
|
||
test 'isbns counts are included in aggregation' do | ||
aggregate = MonthlyMatch.new.generate(DateTime.now) | ||
assert aggregate.isbn == 1 | ||
end | ||
|
||
test 'pmids counts are included in aggregation' do | ||
aggregate = MonthlyMatch.new.generate(DateTime.now) | ||
assert aggregate.pmid == 1 | ||
end | ||
|
||
test 'unmatched counts are included are included in aggregation' do | ||
aggregate = MonthlyMatch.new.generate(DateTime.now) | ||
assert aggregate.unmatched == 2 | ||
end | ||
|
||
test 'creating lots of searchevents leads to correct data' do | ||
# drop all searchevents to make math easier and minimize fragility over time as more fixtures are created | ||
SearchEvent.delete_all | ||
|
||
doi_expected_count = rand(1...100) | ||
doi_expected_count.times do | ||
SearchEvent.create(term: terms(:doi), source: 'test') | ||
end | ||
|
||
issn_expected_count = rand(1...100) | ||
issn_expected_count.times do | ||
SearchEvent.create(term: terms(:issn_1075_8623), source: 'test') | ||
end | ||
|
||
isbn_expected_count = rand(1...100) | ||
isbn_expected_count.times do | ||
SearchEvent.create(term: terms(:isbn_9781319145446), source: 'test') | ||
end | ||
|
||
pmid_expected_count = rand(1...100) | ||
pmid_expected_count.times do | ||
SearchEvent.create(term: terms(:pmid_38908367), source: 'test') | ||
end | ||
|
||
unmatched_expected_count = rand(1...100) | ||
unmatched_expected_count.times do | ||
SearchEvent.create(term: terms(:hi), source: 'test') | ||
end | ||
|
||
aggregate = MonthlyMatch.new.generate(DateTime.now) | ||
|
||
assert doi_expected_count == aggregate.doi | ||
assert issn_expected_count == aggregate.issn | ||
assert isbn_expected_count == aggregate.isbn | ||
assert pmid_expected_count == aggregate.pmid | ||
assert unmatched_expected_count == aggregate.unmatched | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters