The DBMI Data Portal, codenamed Hypatio, is a data sharing and data challenge platform built and maintained by the DBMI Tech Core. It was built to make it easier to manage access to sensitive data sets by providing downloaders with a UI to request access and data owners with a UI to review and approve those requests.
Throughout the documentation, any mention of the word "project" refers to the DataProject
model which describes any data set, data challenge, or software tool that is listed on the DBMI Data Portal.
Infrastructure
App overview
Local development
DataProject management
Hypatio is hosted on DBMI's 68 AWS account. CodePipelines exist to look for pushes to the development
and master
branches, build the code with CodeBuild, and deploy the Hypatio docker container to a new task in ECS using a custom Lambda function.
Hypatio uses a few custom DBMI microservices:
-
DBMI-AuthN: Users get sent there to login. We use Auth0 to manage logins (it accepts HMS SAML logins) and create JWTs for our authentication backend.
-
DBMI-AuthZ: Where we store permissions users have to each project and admin permissions. The reason we use decided to put our permissions in this separate app is so that permissions could centralized and leveraged for other DBMI tools outside of Hypatio.
-
DBMI-Reg: Where we store user profile information (job title, address, phone number). Centralized in case other DBMI tools collect profile data too. Allows us to understand our user base across all our apps.
-
DBMI-Fileservice: Takes user submissions for challenges and puts them into S3 while tracking metadata about the files.
All these microservices run on the same EC2 and ECS clusters as Hypatio.
The files that authorized users of Hypatio can download are hosted in S3 on DBMI's 68 AWS account. Files for the DEV system are stored in the dbmi-hypatio-dev
bucket, while PROD uses dbmi-hypatio-prod
. Inside each bucket, folders separate files for each DataProject. These files must be described in HostedFile
objects in the app's database -- more on that below.
-
contact
- Powers the Contact Us form that appears at the top of the Hypatio UI. -
hypatio
- The base app, containing mostly just URL routing and some helper methods to integrate with various DBMI microservices. -
manage
- Contains views and API endpoints for project admins to manage their various projects. -
profile
- Profile management for users. -
projects
- Views, models, API endpoints, and helper methods to facilitate user interaction with projects hosted on Hypatio.
-
admin.py
- Where models are configured for access in the Django admin. -
api.py
- Where API endpoints should go. -
forms.py
- Where Django forms are defined. -
models.py
- Where database models are defined. -
panels.py
- Specifically for projects, we define here a few classes to describe how UI elements should be configured on project pages. -
tests.py
- Where unit tests go. -
urls.py
- Where URL routing is defined. -
utils.py
- Helper methods that views or API methods might need. -
views.py
- Where views are defined. -
(xyz)_extras.py
- Where filters for django templates are defined.
The HostedFile model describes each file that users can download. Key fields include:
-
file_location
: The folder within the S3 bucket. -
file_name
: The filename within the folder. -
long_name
anddescription
: How the file should be displayed to users. -
enabled
: if set to false, users will not be able to download the file at any time. -
opened_time
andclosed_time
: Ifenabled
is true, this window of time allows the file to only be listed to users and available for downloading between a certain period of time. -
hostedfileset
: Files can be grouped together for a clearer visual display to users. A HostedFileSet object must exist and be connected to the same the DataProject. -
order
: A manual way to set what the order (lowest number appearing first) of the files should be within the hostedfileset and project.
Permissions to each DataProject are stored in the DBMI-AuthZ app database, with the item = Hypatio.{DataProject_ProjectKey}
and the permission = VIEW
or MANAGE
.
Because Hypatio uses several micro-services, we need a docker-compose to start all of them up locally. Ask a DBMI Tech-Core developer for the hypatio-stack
(not in GitHub right now). hypatio-stack
is a Hypatio-specific version of Bryan's generic stack
(should be a DBMI git repo for this). This stack includes some custom shell commands built by Bryan to simplify various docker processes.
Setting up your hypatio-stack:
-
You'll need a
stack.env
file in yourhypatio-stack
folder which holds some secrets shared by all the micro-services. -
Clone all of the micro-service repos and switch to their develop branches.
-
Review the
docker-compose.yml
file and update all the local paths (the volume ones, in particular) to where you have cloned all of the micro-services' repos. -
Create a Python virtualenv for hypatio-stack, enter the virtualenv, and
cd
into thehypatio-stack
directory. Then install requirements withpip install -r requirements.txt
. -
Finally, run
docker-compose build
and thendocker-compose up
.
hypatio-stack
creates a stackdb
MySQL database server docker with databases for each of the apps. It will run all the Django migrations for all of those apps too.
The stackdb
will be mounted as a volume, which if you're on a Mac will be stored in the Docker host VM on your Mac (see https://timonweb.com/posts/getting-path-and-accessing-persistent-volumes-in-docker-for-mac/ for more info on what that means).
Basically, this just means that the database will persist on your laptop so you don't lose all your data if your stackdb container is deleted/recreated, which is nice.
Most of our micro-services look to DBMI-AuthZ to grant you access to their Django admins. Use a SQL client of your preference (Sequel Pro is a good free option for Mac) and go into the AuthZ database. Create an authorization_userpermission
record with permission = ADMIN
, item = DBMI
, and user_email = to your HMS email. You will need this record to access the AuthZ admin.
If needed, you can also create yourself as a superuser in all of the apps by bashing into each container and running the Django createsuper user command. Repeat these steps for hypatio
, dbmi-fileservice
, dbmi-reg
, dbmi-authz
, dbmi-reg
if you choose to do this:
-
With your
hypatio-stack
virtualenv activated, bash into each container by using the convenience method provided by the stack:stack shell {CONTAINER NAME}
. -
cd app
,python manage.py createsuperuser
, and follow the prompts. Use your HMS email as your username.
To create Django migration files and run them, bash into the Hypatio container with stack shell hypatio
, cd app
, python manage.py makemigrations
, and python manage.py migrate
. Because the stack has the source files mounted as a volume, the migrations you created within the container will appear in your local directory too and be available to commit to git.
hypatio-stack
uses a mail client to intercept emails before they get sent. You can reach it by going to localhost:8018
(or whatever port it is configured with in the docker-compose).
When you want someone to be able to access the management pages for projects, they will need to have a DBMI-AuthZ record with Hypatio.{project_key}
and MANAGE
for the email address they use to access Hypatio.
To list a new project on Hypatio, create a new DataProject
object in the Django admin. The project_key
field becomes a part of the URL for that page. The description
field allows for HTML to be entered, allowing project descriptions to be more engaging. The project_supervisors
field holds a comma separated list (no spaces) of email addresses of people who should receive email notifications related to the project (e.g. when a user requests access or when someone submits a question via Contact Us).
To allow some custom behavior for agreement forms (e.g. display extra form fields depending on prior answers), we create .html files for each agreement form and store them in the static directory. Once you have the HTML file ready, create a new AgreementForm
object in the Django admin and use the form_file_path
field to indicate where in the static directory the HTML file is. Go back to your DataProject
object and select the new agreement form in the many to many field. An agreement form can be used for many data projects.
The AgreementForm.external_link
field should only be used when the agreement form lives on another website and you simply want to force users to see that link and click on it before they can access the project.
Some projects are actually data challenges (e.g. most of the N2C2). The project requires users to form teams (toggle has_teams
in the DataProject
) and later submit solutions via an upload form. Like with agreement forms, create an HTML file in the static directory for every new task, and create a ChallengeTask
object that has that path. Ensure that the enabled
flag is set to true even if you have an opened and closed time. That time window specifies when users can submit solutions.
Please visit the following link to read and edit documentation tailored for project managers of Hypatio: https://docs.google.com/document/d/17h99OVvY1VyzJb-CqWrf4uqQMYUsB-iakV0NKOZv8OQ/edit?usp=sharing.