Skip to content

Commit

Permalink
Merge pull request #18 from mozilla/issue-17
Browse files Browse the repository at this point in the history
 Support loading sites with quotes into redis
  • Loading branch information
englehardt authored Aug 1, 2019
2 parents 5bb4242 + 7a3bfa4 commit c996054
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
2 changes: 1 addition & 1 deletion deployment/gcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Create a comma-separated site list as per:
echo "1,http://www.example.com
2,http://www.example.org
3,http://www.princeton.edu
4,http://citp.princeton.edu/" > site_list.csv
4,http://citp.princeton.edu/?foo='bar" > site_list.csv
../load_site_list_into_redis.sh crawl-queue site_list.csv
```
Expand Down
7 changes: 5 additions & 2 deletions deployment/load_site_list_into_redis.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@ echo -e "\nEnqueuing site list in redis"
echo "DEL $REDIS_QUEUE_NAME" > joblist.txt
echo "DEL $REDIS_QUEUE_NAME:processing" >> joblist.txt

# Add site list in reverse order since the queue gets worked upon from the bottom up
cat "$SITE_LIST_CSV" | sed '1!G;h;$!d' | sed "s/^/RPUSH $REDIS_QUEUE_NAME /" >> joblist.txt
# sed #1 = Add site list in reverse order since the queue gets worked upon from the bottom up
# sed #2 = Quote single quotes
# awk #1 = Add the RPUSH command with the site value within single quotes
cat "$SITE_LIST_CSV" | sed '1!G;h;$!d' | sed "s/'/\\\'/g" | awk -F ',' 'FNR > 0 {print "RPUSH '$REDIS_QUEUE_NAME' '\''"$1","$2"'\''"}' >> joblist.txt

kubectl cp joblist.txt redis-master:/tmp/joblist.txt
kubectl exec redis-master -- sh -c "cat /tmp/joblist.txt | redis-cli --pipe"

Expand Down

0 comments on commit c996054

Please sign in to comment.