v0.6
Release Notes - v0.6
v0.6
includes the following two major additions:
- publishing records, mapped fields, or tabular data to S3 buckets
- supports Docker deployment
- read more about that process here: https://github.com/WSULib/combine-docker
The route of building a server dedicated to Combine via Ansible will continue to be supported for the foreseeable future, but increased attention will likely go the Docker deployment that begins with this version v0.6
.
Upgrading to v0.6
The addition of S3 publishing, and some additional configurations needed to support Dockerization, requires a couple of specific changes to files.
- Update
/opt/spark/conf/spark-defaults.conf
. Add the following package to the settingspark.jars.packages
which allows Spark to communicate with S3:
org.apache.hadoop:hadoop-aws:2.7.3
- Add the following variables to the
/opt/combine/localsettings.py
file if your installation is Ansible server based (if you are deploying via Docker, these settings should be included automatically via thelocalsettings.py.docker
file):
# Deployment type (suggested as first variable, for clarity's sake)
COMBINE_DEPLOYMENT = 'server'
# (suggested as part of "Spark Tuning" section)
TARGET_RECORDS_PER_PARTITION = 5000
# Mongo server
MONGO_HOST = '127.0.0.1'
As always, you can see examples of these settings in /opt/combine/localsettings.py.template
.
Once these changes are made, it is recommended to run the update
management command to install any required dependencies, pull in GUI changes, and restart everything:
# from /opt/combine
./manage.py update