diff --git a/vault-quota/Design.md b/vault-quota/Design.md new file mode 100644 index 00000000..435360fc --- /dev/null +++ b/vault-quota/Design.md @@ -0,0 +1,66 @@ +# vault quota design/algorithms + +The definitive source of content-length (file size) of a DataNode coems from the +`inventory.Artifact` table and it not known until a PUT to storage is completed. +In the case of a `vault` service co-located with a single storage site (`minoc`), +the new Artifact is visible in the database as soon as the PUT to `minoc` is +completed. In the case of a `vault` service co-located with a global SI, the new +Artifact is visible in the database once it is synced from the site of the PUT to +`minoc` to the global database by `fenwick` (or worst case: `ratik`). + +## TODO +The design below only takes into account incremental propagation of space used +by stored files. It is not complete/verified until we also come up with a validation +algorithm that can detect and fix discrepancies in a live `vault`. + +## Event watcher algorithm: +``` +track progress using HarvestState (name: `Artifact`, source: `db:{bucket range}`) +incremental query for new artifacts in lastModified order +for each new Artifact: + query for DataNode (storageID = artifact.uri) + if Artifact.contentLength != Node.size: + start txn + lock datanode + compute delta + lock parent + apply delta to parent.delta + set dataNode.size + update HarvestState + commit txn +``` +The above sequence does the first step of propagation from DataNode to parent ContainerNode. +This can be done in parallel by using bucket ranges (smaller than 0-f). + +## Container size propagation algorithm: +``` +query for ContainerNode with non-zero delta +for each ContainerNode: + start txn + lock containernode + re-check delta + lock parent + apply delta to parent.delta + apply delta containernode.size, set containernode.delta=0 + commit txn +``` +The above sequence finds candidate propagations, locks (order: child-then-parent as above), +and applies the propagation. This moves the outstanding delta up the tree one level. If the +sequence acts on multiple child containers before the parent, the delta(s) naturally +_merge_ and there are fewer larger delta propagations in the upper part of the tree. It would +be optimal to do propagations depth-first but it doesn't seem practical to forcibly accomplish +that ordering. + +Container size propagation will be implemented as a single sequence (thread). We could add +something to the vospace.Node table to support subdividing work and enable multiple threads, +but there is nothing there right now. + +## database changes required +note: all field and column names TBD +* add `size` and `delta` fields to ContainerNode (transient) +* add `size` field to DataNode (transient) +* add `size` to the `vospace.Node` table +* add `delta` to the `vospace.Node` table +* incremental sync query/iterator (ArtifactDAO?) +* lookup DataNode by storageID (ArtifactDAO?) + diff --git a/vault-quota/README.md b/vault-quota/README.md new file mode 100644 index 00000000..c0f2db0e --- /dev/null +++ b/vault-quota/README.md @@ -0,0 +1,59 @@ +# Storage Inventory VOSpace quota support process (vault-quota) + +Process to maintain container node sizes so that quota limits can be enforced by the +main `vault` service. This process runs in incremental mode (single process running +continuously) to update a local vospace database. + +`vault-quota` is an optional process that is only needed if `vault` is configured to +enforce quotas, although it could be used to maintain container node sizes without +quota enforcement. + +## configuration +See the [cadc-java](https://github.com/opencadc/docker-base/tree/master/cadc-java) image +docs for general config requirements. + +Runtime configuration must be made available via the `/config` directory. + +### vault-quota.properties +``` +org.opencadc.vault.quota.logging = {info|debug} + +# inventory database settings +org.opencadc.inventory.db.SQLGenerator=org.opencadc.inventory.db.SQLGenerator +org.opencadc.vault.quota.nodes.schema={schema for inventory database objects} +org.opencadc.vault.quota.nodes.username={username for inventory admin} +org.opencadc.vault.quota.nodes.password={password for inventory admin} +org.opencadc.vault.quota.nodes.url=jdbc:postgresql://{server}/{database} + +org.opencadc.vault.quota.threads={number of threads to watch for artifact events} + +# storage namespace +org.opencadc.vault.storage.namespace = {a storage inventory namespace to use} +``` +The _nodes_ account owns and manages (create, alter, drop) vospace database objects and updates +content in the vospace schema. The database is specified in the JDBC URL. Failure to connect or +initialize the database will show up in logs. + +The _threads_ key configures the number of threads that watch for new Artifact events and initiate +the propagation of sizes to parent containers. These threads each monitor a subset of artifacts using +`Artifact.uriBucket` filtering; for simplicity, the following values are allowed: 1, 2, 4, 8, 16. + +In addition to the above threads, there is one additional thread that propagates size changes up +the tree of container nodes to the container node(s) where quotas are specified. + +## building it +``` +gradle clean build +docker build -t vault-quota -f Dockerfile . +``` + +## checking it +``` +docker run -it vault-quota:latest /bin/bash +``` + +## running it +``` +docker run --user opencadc:opencadc -v /path/to/external/config:/config:ro --name vault-quota vault-quota:latest +``` +