-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RedditMap Data and Scripts #17
base: main
Are you sure you want to change the base?
Conversation
Thanks @freeformflow! Unfortunately Github won't let me see the full diff because there are too many changed files, so I'm going to have to awkwardly put notes in comments. To start, I have a couple questions:
|
Line 17 of
|
In |
Could you please put an "Add" note to the changelog briefly describing the new scripts and data? |
My understanding is that Jasmine would like to move RedditMap data curation to the IHOP repository entirely, and we will deprecate the iDPI repositories that have housed previous versions of the pipeline. However, as I understand it, we'd like to use infrastructure under the iDPI AWS account to host the RedditMap application and host data specific to RedditMap. We should confirm with @19mangatj and @chandrn7
I agree that dealing with the individual files is difficult. We'd like to ultimately serve them as individual files to benefit from fine-grained HTTP caching, but we can handle all sorts of preprocessing scenarios. That includes managing them in a different form and isolating them into smaller files just prior to publishing to iDPI infrastructure. I'd need to know more about the IHOP output to alter the publishing script to accomplish that goal.
I pushed an edit that corrects that sentence.
I pushed an edit that removed that file entirely. That was a temporary script. We copied over screenshots to the iDPI bucket so iDPI might host the images. But we determined it would be too costly to serve those images in the RedditMap application for now. Over time, we may want to work toward restoring updates to images as part of data pipeline work, but it's not a priority. The RedditMap application no longer needs public access to the IHOP bucket. You might want to speak with Jasmine to confirm, but as far as the application is concerned, you can restore the bucket access control. If you have any specific S3 configuration questions, I'd be happy to help there, too. |
I've drafted some notes and asked Jasmine to confirm. I'll add these notes to the changelog once Jasmine approves. |
Thanks, @freeformflow ! Everything looks good to me! I'll approve and @19mangatj can merge it in when she's ready. |
@19mangatj informed me that Virginia authorized us to add RedditMap materials to the IHOP repository. This pull request adds RedditMap data as well as supporting scripts to ready that data for use in its application. I've taken care to not disturb the existing conventions while maintaining RedditMap's publish pipeline. In addition to this pull request, I will need to provide the AWS access key out-of-band so the new GitHub action will be properly authorized.
I added RedditMap data to the directory
/data/redditmap
. This contains the RedditMap data and only the data.I added publish code to the directory
/scripts/redditmap
. This contains a README overview and the needed Node.js code to complete the publish task.I added a new GitHub action
publish-redditmap
. When you push to the branchpublish-redditmap
, that action will prepare a Node.js environment and sync the RedditMap data API with the local directory.