-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds CSV loader for SearchEvents #54
Conversation
1cf9848
to
bfd0b7c
Compare
|
||
CSV.foreach(args.path) do |row| | ||
term = Term.create_or_find_by!(phrase: row.first) | ||
term.search_events.create!(source: args.source, created_at: row.last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter that this will create duplicate search events if, for example, the same CSV is loaded twice, right? I assume not given that this is just for dev environments, but I want to make sure I'm not missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JPrevost Tagging on ☝️, as it occurred to me that GitHub may not send notifications on comments until a review is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that this would load the same csv multiple times if run multiple times and doesn't try to detect that it has already been loaded. I suspect this is fine, but maybe at minimum we should note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a note to that effect. Will approve now so you can merge this as soon as you like, but feel free to modify the note as needed. (I'm not sure what multiple notes for the same method look like in YARD, and for some reason my local docserver isn't showing me the CSV loader task.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the tasks don't show up in Yard sadly. I suspect we can nudge them in there with some config. Multiple notes are fine though from my poking around on other methods.
|
||
CSV.foreach(args.path) do |row| | ||
term = Term.create_or_find_by!(phrase: row.first) | ||
term.search_events.create!(source: args.source, created_at: row.last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a note to that effect. Will approve now so you can merge this as soon as you like, but feel free to modify the note as needed. (I'm not sure what multiple notes for the same method look like in YARD, and for some reason my local docserver isn't showing me the CSV loader task.)
https://mitlibraries.atlassian.net/browse/TCO-35 This loader is not meant to be used in production, but instead be a convenience for loading data into development environments to allow for easier development of detection algorithms or reporting. There is a Dataclip that exports data in the expected format, but it is also easy to generate your own file as it is just `term` and `timestamp` that is expected. Currently, `source` is provided via CLI but if we find it would be better to include the source in the CSV that would be an easy change to make.
ab8298c
to
a24e2ef
Compare
https://mitlibraries.atlassian.net/browse/TCO-35
This loader is not meant to be used in production, but instead be a convenience for loading data into development environments to allow for easier development of detection algorithms or reporting.
There is a Dataclip that exports data in the expected format, but it is also easy to generate your own file as it is just
term
andtimestamp
that is expected.Currently,
source
is provided via CLI but if we find it would be better to include the source in the CSV that would be an easy change to make.