An AWS Lambda-based, GraphQl interface for performing CRUD operations on camera trap data stored in MongoDB.
Animl is an open, extensible, cloud-based platform for managing camera trap data. We are developing this platform because there currently are no software tools that allow organizations using camera traps to:
- ingest data from a variety of camera trap types (wireless, SD card based, IP, etc.)
- systematically store and manage images in a single centralized, cloud-based repository
- upload custom object detection and species clasification ML models and configure automated assisted-labeling pipelines
- Offer frontend web application to view images, review ML-assisted labels, perform manual labeling
- Offer an API for advanced querying and analysis of camera trap data
- Offer tools for exporting ML model training data
This repository contains an AWS Lambda-based, GraphQl API for storing and fetching cameratrap data from a MongoDB database. The stack and and it's associated deployment resources are managed with the Serverless Framework. The stack includes:
The instructions below assume you have the following tools globally installed:
- Node & npm
- Serverless
- aws-cli
- Docker
The name of the profile must be "animl" (because it's referenced in the serverless.yml file). Good instructions here.
mkdir animl-api
cd animl-api
git clone https://github.com/tnc-ca-geo/animl-api.git
cd animl-api
npm install
Important
Sharp, one of the dependencies that's essential for opening image files and performing inference, may have issues once deployed to Lambda. See this GitHub Issue and their documentation for more info on deploying Sharp to Lambda, but in short, unless your node_modules/@img
directory has the following binaries, the inference Lambda will throw errors indicating it can't open Sharp:
sharp-darwin-arm64
sharp-libvips-darwin-arm64
sharp-libvips-linux-x64
sharp-libvips-linuxmusl-x64
sharp-linux-x64
sharp-linuxmusl-x64
Running the following may help install the additional necessary binaries if they are not present, but it seems a bit inconsistent:
npm install --os=linux --cpu=x64 sharp
The API depends on remote secrets and parameters that are stored in AWS Secrets Manager and AWS Systems Manager Parameter Store, respectively. Most of the params are generated by this project's serverless config file and the config files of other services upon which this app depends, but some must be created manually via the AWS console. To make sure you have the correct secrets and parameters available, do the following:
-
Make sure you've deployed animl-ingest, animl-frontent, and mira-api in the same staging env (dev/prod) as the environtment you intend to deploy
animl-api
. -
We currently depend on a CloudFormation template ProductOps created called
UserPool
that creates and manages all of the resources related to Auth/Auth. This is not tracked in version control (but it probably should be), as it's critical and is responssible for generating SSM Params upon which this app depends. Make sure that that stack has been created. -
Two important SSM Params,
/ml/megadetector-api-key-[env]
and/db/mongo-db-url-[env]
contain secret keys so need to be created manually in the AWS console. Be sure to create versions for all envs you plan on deploying.
You'll need to create the DB in MongoDB Atlas, but once you have, a script for
seeding the DB with default records can be found at
animl-api/src/scripts/seedDB.js
. If the DB hasn't been seeded yet,
you can do so by running the following command from the root directory:
npm run seed-db-dev
# or, do seed the production db:
npm run seed-db-prod
- To test the Lambda locally with serverless-offline, run:
npm run start
Note: The first time running serverless will require you to login to the serverless console and be granted a seat from the TNC organization.
- To deploy the Cloudformation development stack, run:
npm run deploy-dev
As of version 3.0.0, this repo is written in TypeScript. There are a few things to know about the typings and how they're generated:
- Types for DB records are generated using Mongoose's
inferSchemaType
utility and exported from their respective schema definition files. e.g.:
// From src/api/db/schemas/Image.ts
export type ImageSchema = mongoose.InferSchemaType<typeof ImageSchema>;
- We generate types for inputs and outputs to/from the GraphQL layer using
graphql-codegen
, which generates types from our GraphQLtype-defs
. However, if you change any of the GraphQLtype-defs
, you'll need to re-run the codegen manually by running:
npm run codegen
The generated types can be found in src/@types/graphql.ts
There are a handful of scripts in the src/scripts/
directory to assist with
managing data in both the production and dev databases.
To create a complete JSON export of all collections in a DB, run:
npm run export-db-dev // export dev db
npm run export-db-prod // export prod db
TODO: write and test importDb.js
updateDocuments.js
is a working template for writing targeted data updates.
It can be adapted to perform specific deletions/updates. You can run it with
the following:
npm run update-docs-dev // update dev db
npm run update-docs-prod // update prod db
Use caution when deploying to production, as the application involves multiple stacks (animl-ingest, animl-api, animl-frontend), and often the deployments need to be synchronized. For major deployments to prod in which there are breaking changes that affect the other components of the stack, follow these steps:
-
Set the frontend
IN_MAINTENANCE_MODE
totrue
(inaniml-frontend/src/config.js
), deploy to prod, then invalidate its cloudfront cache. This will temporarily prevent users from interacting with the frontend (editing labels, bulk uploading images, etc.) while the rest of the updates are being deployed. -
Manually check batch logs and the DB to make sure there aren't any fresh uploads that are in progress but haven't yet been fully unzipped. In the DB, those batches would have a
created
: <date_time> property but wouldn't yet haveuploadComplete
orprocessingStart
oringestionComplete
fields. See this issue more info: #186 -
Set ingest-image's
IN_MAINTENANCE_MODE
totrue
(inaniml-ingest/ingest-image/task.js
) and deploy to prod. While in maintenance mode, any images from wireless cameras that happen to get sent to the ingestion bucket will be routed instead to theaniml-images-parkinglot-prod
bucket so that Animl isn't trying to process new images while the updates are being deployed. -
Wait for messages in ALL SQS queues to wind down to zero (i.e., if there's currently a bulk upload job being processed, wait for it to finish).
-
Backup prod DB by running
npm run export-db-prod
from theaniml-api
project root. -
Deploy animl-api to prod.
-
Turn off
IN_MAINTENANCE_MODE
in animl-frontend and animl-ingest, and deploy both to prod, and clear cloudfront cache. -
Copy any images that happened to land in
animl-images-parkinglot-prod
while the stacks were being deployed toaniml-images-ingestion-prod
, and then delete them from the parking lot bucket.
Animl is comprised of a number of microservices, most of which are managed in their own repositories.
Services necessary to run Animl:
Services related to ingesting and processing wireless camera trap data: