Merge pull request #51 from lsst-dm/tickets/DM-46556-deployment

DM-46556 Add consdb documentation
lsst-dm · Jan 6, 2025 · 8893545 · 8893545
2 parents 6889478 + 059952b
commit 8893545
Show file tree

Hide file tree

Showing 14 changed files with 392 additions and 38 deletions.
diff --git a/alembic-autogenerate.py b/alembic-autogenerate.py
@@ -2,14 +2,19 @@
 
 #
 # How to use this script:
-# 1. Load the LSST environment and setup sdm_schemas and felis.
-#        source loadLSST.bash
-#        setup felis
-#        setup -r /path/to/sdm_schemas
+# 1. Install required packages and sdm_schemas, set environment variables:
+#        pip install lsst-felis testing.postgresql alembic sqlalchemy pyyaml \
+#           black psycopg2-binary
+#        git clone https://github.com/lsst/sdm_schemas
+#        cd sdm_schemas
+#        export SDM_SCHEMAS_DIR=`pwd```
 # 2. From the root of the consdb git repo, invoke the script. Supply a
 #    revision message as the command line argument:
-#        python alembic-autogenerate.py DM-12345
+#        python alembic-autogenerate.py this is my revision message "\n" \
+#            the message can span multiple lines "\n" \
+#            if desired
 # 3. Revise your auto-generated code as needed.
+# 4. Remove the autogenerated creation of sql views (visit1, ccdvisit1).
 #
 
 import os

diff --git a/doc/.gitignore b/doc/.gitignore
@@ -8,3 +8,5 @@ doxygen.conf
 # Sphinx products
 _build
 py-api
+
+*.DS_Store
diff --git a/doc/contributor-guide/adding-columns.rst b/doc/contributor-guide/adding-columns.rst
@@ -7,7 +7,7 @@ Structure
 
 - ConsDB content must relate to exposures or visits or observations structured like exposures.  General time series should go in the Engineering and Facilities Database (EFD).
 - ConsDB content should generally be scalar values.  Large amounts of data, especially arrays or images or cubes, should generally go into the Large File Annex (LFA).
-- Avoid arrays expressed as individual columns (e.g. ``something0``, ``something1``, ``something2``) where possible, as this increases the number of columns drastically (and there is `a limit <https://www.postgresql.org/docs/current/limits.html>`_), makes it hard to query (``SELECT`` clauses need to list all of these individually, and ``WHERE`` clauses may need to include large ``OR`` or ``AND`` conditions), and potentially requires a lot of database storage space.
+- Avoid arrays expressed as individual columns (e.g. ``something0``, ``something1``, ``something2``) where possible, as this increases the number of columns drastically (and there is `a limit <https://www.postgresql.org/docs/current/limits.html>`__), makes it hard to query (``SELECT`` clauses need to list all of these individually, and ``WHERE`` clauses may need to include large ``OR`` or ``AND`` conditions), and potentially requires a lot of database storage space.
 - Columns should be named in all lowercase with underscore (``_``) separators, also known as "snake_case".
 
 Data sources

diff --git a/doc/developer-guide/consdbclient-summit-utils.rst b/doc/developer-guide/consdbclient-summit-utils.rst
@@ -2,4 +2,4 @@
 ConsDbClient in summit_utils
 ############################
 
-How to write and test code in summit_utils for ConsDbClient
+How to write and test code in summit_utils for ConsDbClient
diff --git a/doc/index.rst b/doc/index.rst
@@ -3,8 +3,8 @@
 ConsDB
 ======
 
-``lsst.consdb`` is developed at https://github.com/lsst-dm/consdb.
-You can find Jira issues for this module under the `consdb <https://jira.lsstcorp.org/issues/?jql=project%20%3D%20DM%20AND%20component%20%3D%20consdb>`_ component.
+``lsst.consdb`` is developed at `https://github.com/lsst-dm/consdb <https://github.com/lsst-dm/consdb>`__.
+You can find Jira issues for this module under the `ConsDB <https://jira.lsstcorp.org/issues/?jql=project%20%3D%20DM%20AND%20component%20%3D%20consdb>`__ component.
 
 #############
 Documentation

diff --git a/doc/operator-guide/deployment.rst b/doc/operator-guide/deployment.rst
@@ -2,5 +2,87 @@
 Deployment
 ###########
 
-* Database
-* REST API Server
+Database
+========
+
+Deployments of the Consolidated Database are currently located at
+
+-  Summit
+-  USDF (+ dev, use the same underlying database, a replication of Summit)
+-  Base Test Stand (BTS)
+-  Tucson Test Stand (TTS)
+
+Updates to these deployments may be needed when there are edits to the schema for any of the cdb_* tables defined in <link to> sdm_schemas.
+
+Tools:
+------
+
+- Argo-CD
+- LOVE
+- Felis
+
+Repositories:
+-------------
+
+- `phalanx <https://github.com/lsst-sqre/phalanx>`__
+- `sdm_schemas <https://github.com/lsst/sdm_schemas>`__
+- `consdb <https://github.com/lsst-dm/consdb>`__
+
+Access needed:
+--------------
+
+- NOIRLab VPN
+- Summit VPN
+- USDF
+
+Process:
+--------
+
+
+Deploy code to populate db at Summit and/or USDF
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Follow the testing steps above for testing alembic migration and code at TTS/BTS, before the you consider deploying at the summit.
+
+The steps to deploy at the summit mirror the steps to test on a test stand with coordination and permission from the observers and site teams.
+Access to argo-cd deployments is available via the Summit OpenVPN.
+To coordinate your deployment update on the summit, you must attend Coordination Activities Planning (CAP) meeting on Tuesday mornings and announce your request.
+
+Add your migration intentions to the CAP SITCOM confluence agenda `here <https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53765933/Agenda+Items+for+Future+CAP+Meetings>`__
+
+The CAP members may tell you a time frame that is acceptable for you to perform these changes.
+
+They may also tell you specific people to coordinate with to help you take images to test LATISS and LSSTCOMCAMSIM tables. There will be more tables to test eventually.
+
+Channels to note: #rubinobs-test-planning; #summit-announce; #summit-auxtel, and `channel usage guide  <https://obs-ops.lsst.io/Communications/slack-channel-usage.html>`__.
+
+When you get your final approval and designated time to perform the changes to ConsDB, announce on #summit-announce, and follow similar steps as test stand procedure above.
+
+USDF Deployment Steps
+^^^^^^^^^^^^^^^^^^^^^
+
+These steps must happen in synchrony with a Summit migration.
+
+1. Disable (pause) SUBSCRIPTION at USDF.
+2. Perform the migration at the summit with the steps below.
+3. Connect to the USDF database via psql and perform the alembic migration.
+4. Check or test as agreed upon with the ConsDB team.
+5. Enable and Refresh Subscription at USDF.
+
+If there is no impact or coordination with Summit needed: Run alembic migration at USDF, and test as appropriate.
+
+Summit Deployment Steps
+^^^^^^^^^^^^^^^^^^^^^^^
+
+1. Use a branch in ``phalanx`` to point to the ConsDB tag for deployment.
+2. Set the Argo-CD application ``consdb's`` target revision to your ``phalanx`` branch.
+3. Refresh the ConsDB application and review pod logs.
+4. Connect to the summit database via psql and perform the alembic migration.
+5. Have an image taken with the observing team, then verify database entries with a SQL query or Jupyter notebook.
+6. Check your new entries in the database using a jupyter notebook or SQL query in RSP showing your new image has been inserted to the database as expected.
+
+Once deployment succeeds, set the ``Target Revision`` in Argo-CD back to ``main`` and complete the ``phalanx`` PR for the tested ConsDB tag.
+
+
+REST API Server
+===============
diff --git a/doc/operator-guide/monitoring.rst b/doc/operator-guide/monitoring.rst
@@ -2,5 +2,33 @@
 Monitoring
 ###########
 
-* Database
-* REST API Server
+Reporting channels
+==================
+
+Users of ConsDB, ConsDBClient (``pqserver``) should report issues via #consolidated-database in rubin-obs.slack.com.
+
+ConsDB operators monitor this channel and #ops-usdf, #ops-usdf-alerts for issues and outages reported, as well as escalate verified database issues.
+
+Database
+========
+
+The ConsDB team is responsible for verifying whether or not the database is up when issues are reported.
+
+They can check the method reported by the users, check using ``psql``/ ``pgcli``, and check in the #ops-usdf slack channel for currently reported issues.
+
+Once the ConsDB team has confirmed there is an issue with the database, they should notify #ops-usdf slack channel and USDF DBAs should be responsible for fixing/restarting.
+
+REST API Server
+===============
+
+If we suspect the API server died, the ConsDB team should be responsible for checking and restarting it.
+
+Use the appropriate argo-cd deployment graph to check deployment logs, and potentially restart the service.
+
+
+Other issues
+------------
+
+If the K8s infrastructure died then the ConsDB team can verify the problem, but there are likely to be wider issues seen.
+
+USDF or Summit K8s/IT support should be responsible for fixing.
diff --git a/doc/operator-guide/runbook.rst b/doc/operator-guide/runbook.rst
@@ -2,4 +2,115 @@
 RunBook
 ########
 
-Maybe from ConsDb Usage Confluence page?
+Confluence runbook initial incomplete version `here <https://rubinobs.atlassian.net/wiki/spaces/LSSTOps/pages/45665320/Consolidated+Database+ConsDB+Runbook+draft+incomplete>`__
+
+Overview
+========
+
+This application does ...
+
+Its design and architecture are documented at ...
+
+Usage
+=====
+
+Most users
+----------
+
+Administration
+--------------
+
+Architecture
+============
+
+Kubernetes vclusters used
+
+Relevant policies
+
+S3DF Dependencies
+-----------------
+
+Kubernetes
+Weka storage for Kubernetes
+...
+
+Systems
+-------
+
+Components, Kubernetes namespaces, deployments
+
+Backups
+-------
+
+Associated Systems
+------------------
+
+IAM
+===
+
+Requesting Access
+-----------------
+
+Key Roles
+---------
+
+Service Accounts
+----------------
+
+Network
+=======
+
+External endpoints, IP and port, encryption, authentication, clients, API
+
+SLAC-internal endpoints, IP and port, encryption, authentication, clients, API
+
+Configuration
+=============
+
+GitHub repos with deployments
+
+Monitoring
+==========
+
+Grafana or other links
+
+Maintenance
+===========
+
+Periodic tasks
+
+Documentation and Training
+==========================
+
+Links to documentation and training resources
+
+LSST io page at `consdb.lsst.io <https://consdb.lsst.io>`__
+
+Support
+=======
+
+#consolidated-database
+
+Overall complaints:
+-------------------
+
+Kian-Tat Lim
+
+ConsDB services (hinfo, pqserver):
+--------------------------------------
+
+Brian Brondel , Valerie Becker
+
+Transformed EFD component:
+--------------------------
+
+Rodrigo Boufleur , Glauber Costa Vila Verde
+
+``consdb`` component in Jira.
+
+
+Known Issues
+============
+
+Standard Procedures
+===================
diff --git a/doc/operator-guide/schema-migration-process.rst b/doc/operator-guide/schema-migration-process.rst
@@ -2,8 +2,80 @@
 Schema Migration Process
 ########################
 
-* Add columns to sdm_schemas
-* Create alembic migration
-* Test migration and code to populate the new columns/tables at TTS/BTS if Summit schema is changing
-* Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
-* Deploy code to populate at Summit and/or USDF
+Add columns to sdm_schemas
+==========================
+
+First, add the requested database additions, justifications, and where they are generated to our `confluence entry table <https://rubinobs.atlassian.net/wiki/spaces/DM/pages/246644760/Consolidated+Database+Non-EFD+Entries>`__.
+Then, create a ticket and edit the `repository <https://github.com/lsst/sdm_schemas>`__ to apply your schema changes to any of the ``cdb_*.yml`` schemas.
+
+If your sdm_schemas PR has issues, check that the schema conforms to Felis's data model and valid SQL tables can be created with `felis validate/create <https://felis.lsst.io/user-guide/cli.html#felis-validate>`__
+Alembic migrations should be automatically created by a git workflow after your sdm_schemas pull request completes.
+
+
+Create an Alembic Migration (manually)
+======================================
+
+`Alembic <https://alembic.sqlalchemy.org/en/latest/front.html>`__ keeps track of versioning by autogenerated migrations to sync the test stands and summit databases.
+Versioning our database schema changes allows us to apply edits and move the database’s state forward or backward as needed.
+
+1. Create an Alembic migration on your ConsDB ticket branch.
+2. Use the script ``consdb/alembic-autogenerate.py`` to generate Alembic migrations.
+3. Follow the directions in the header of the script, then run ``python alembic-autogenerate.py`` to create version files in respective database-named directories in ``consdb/alembic/``.
+4. Manually edit the generated files in ``consdb/alembic/<table-name>/`` to:
+
+  - Remove the ``visit1`` and ``ccdvisit1`` views.
+  - Ensure constraints and renamed columns are correct.
+
+Test alembic migration
+======================
+Before merging your ConsDB migration PR or applying this migration to the Summit, you must test applying the migration in a test environment.
+Test both applying the migration and any code that populates the new columns/tables at TTS/BTS if Summit schema is changing.
+
+
+1. Update the deployment on the test stand:
+-------------------------------------------
+
+1. Choose the appropriate test stand (TTS, BTS)
+2. Create a branch in ``phalanx`` and edit the corresponding test stand environment file ``phalanx/applications/consdb/values-<test stand>.yaml`` to point to your branch's built docker image (tickets-DM-###).
+3. Coordinate and announce in the appropriate slack channel that you will begin testing your migrations.
+4. Update the ConsDB deployment in ``<url.to.teststand>/argo-cd`` to use your ``phalanx`` branch in the ``Target Revision``. Refresh and check pod logs.
+5. Verify the tables that you will be upgrading exist using ``psql``
+6. From the ``consdb/`` directory, (where ``alembic.ini`` file is) use the alembic commands to upgrade the existing database tables: ``alembic upgrade head -n <database name>``
+7. Deploy new ConsDB software (``hinfo``, ``pqserver``) and check the initial logs.
+
+2. Test with LATISS imaging in ATQueue:
+---------------------------------------
+
+See `TTS Start Guide <https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53739987/Tucson+Test+Stand+Start+Guide>`__ for guidelines on using the test stands.
+
+Access LOVE via ``<url.to.teststand>/love`` and use the 1Password admin information to sign in, or your SLAC username and password.
+Navigate to the ATQueue or Auxillary Telescope (AuxTel) Script Queue.
+
+- Before editing these scripts, note their starting configurations, as we will return the scripts to this configuration when we are done.
+
+Take a test/simulated picture with LATISS through the ATQueue using these three scripts:
+
+1. ``set_summary_state.py`` Change the configuration to set ATHeaderService and ATCamera to ENABLED.
+2. ``enable_latiss.py`` Remove any existing configuration.
+3. ``take_image_latiss.py`` Update the configuration to remove anything that is not 'nimages' (1) and 'image_type' (BIAS or DARK or FLAT)
+
+Once you have put these three scripts in the queue, click ``run``.
+Watch for errors in both the Script Queue and the Argo-CD ConsDB pod logs and ``hinfo-latiss`` deployment.
+Address any errors and retest.
+
+Check the database by using ``psql`` commands like ``\dt`` to display the table names and maybe even ``SELECT * from cdb_latiss.exposure where day_obs == <YYYYMMDD>;`` to view the most recent data.
+
+Run set_summary_state to set ATHeaderService and ATCamera back to STANDBY, and return LATISS back to STANDBY.
+Then return these three scripts to their original configurations.
+
+If you have encountered errors in this process, do not proceed to the summit, but address those errors and retest them with your ``phalanx`` branch pointing to your ConsDB branch with the fix to these errors.
+
+If tests are successful, create a pull request for the Alembic migration in ConsDB. Tag the release according to ``standards-practices`` guidelines.
+Update your existing ``phalanx`` branch to point the environment based deployments to this ConsDB tag.
+
+You are able to retest on the test stand at this point, hopefully there were no changes to your ConsDB pull request so this step is trivial.
+
+Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
+-----------------------------------------------------------------------------------------------
+
+See deployment page for specific environment deployment steps
diff --git a/doc/user-guide/consdb-client-library-in-summit-utils.rst b/doc/user-guide/consdb-client-library-in-summit-utils.rst
@@ -2,4 +2,4 @@
 ConsDB Client Library in summit_utils
 ######################################
 
-Querying using ConsDbClient
+Querying using ConsDbClient
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,3 +8,5 @@ doxygen.conf @@
     # Sphinx products
     _build
     py-api
+    *.DS_Store