Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds CSV loader for SearchEvents #54

Merged
merged 3 commits into from
Jul 11, 2024
Merged

Adds CSV loader for SearchEvents #54

merged 3 commits into from
Jul 11, 2024

Conversation

JPrevost
Copy link
Member

@JPrevost JPrevost commented Jul 1, 2024

https://mitlibraries.atlassian.net/browse/TCO-35

This loader is not meant to be used in production, but instead be a convenience for loading data into development environments to allow for easier development of detection algorithms or reporting.

There is a Dataclip that exports data in the expected format, but it is also easy to generate your own file as it is just term and timestamp that is expected.

Currently, source is provided via CLI but if we find it would be better to include the source in the CSV that would be an easy change to make.

@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-54 July 1, 2024 18:21 Inactive
lib/tasks/loader.rake Outdated Show resolved Hide resolved
@JPrevost JPrevost marked this pull request as draft July 10, 2024 14:16
@JPrevost JPrevost marked this pull request as ready for review July 10, 2024 14:33
@jazairi jazairi self-assigned this Jul 10, 2024

CSV.foreach(args.path) do |row|
term = Term.create_or_find_by!(phrase: row.first)
term.search_events.create!(source: args.source, created_at: row.last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter that this will create duplicate search events if, for example, the same CSV is loaded twice, right? I assume not given that this is just for dev environments, but I want to make sure I'm not missing anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPrevost Tagging on ☝️, as it occurred to me that GitHub may not send notifications on comments until a review is complete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that this would load the same csv multiple times if run multiple times and doesn't try to detect that it has already been loaded. I suspect this is fine, but maybe at minimum we should note?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a note to that effect. Will approve now so you can merge this as soon as you like, but feel free to modify the note as needed. (I'm not sure what multiple notes for the same method look like in YARD, and for some reason my local docserver isn't showing me the CSV loader task.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the tasks don't show up in Yard sadly. I suspect we can nudge them in there with some config. Multiple notes are fine though from my poking around on other methods.


CSV.foreach(args.path) do |row|
term = Term.create_or_find_by!(phrase: row.first)
term.search_events.create!(source: args.source, created_at: row.last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a note to that effect. Will approve now so you can merge this as soon as you like, but feel free to modify the note as needed. (I'm not sure what multiple notes for the same method look like in YARD, and for some reason my local docserver isn't showing me the CSV loader task.)

JPrevost and others added 3 commits July 11, 2024 16:54
https://mitlibraries.atlassian.net/browse/TCO-35

This loader is not meant to be used in production, but instead be a
convenience for loading data into development environments to allow for
easier development of detection algorithms or reporting.

There is a Dataclip that exports data in the expected format, but it
is also easy to generate your own file as it is just `term` and
`timestamp` that is expected.

Currently, `source` is provided via CLI but if we find it would be
better to include the source in the CSV that would be an easy change
to make.
@JPrevost JPrevost merged commit df7a3ea into main Jul 11, 2024
1 check passed
@JPrevost JPrevost deleted the tco35-csv-data-loader branch July 11, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants