Skip to content

Commit

Permalink
Merge pull request #51 from lsst-dm/tickets/DM-46556-deployment
Browse files Browse the repository at this point in the history
DM-46556 Add consdb documentation
  • Loading branch information
Vebop authored Jan 6, 2025
2 parents 6889478 + 059952b commit 8893545
Show file tree
Hide file tree
Showing 14 changed files with 392 additions and 38 deletions.
15 changes: 10 additions & 5 deletions alembic-autogenerate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,19 @@

#
# How to use this script:
# 1. Load the LSST environment and setup sdm_schemas and felis.
# source loadLSST.bash
# setup felis
# setup -r /path/to/sdm_schemas
# 1. Install required packages and sdm_schemas, set environment variables:
# pip install lsst-felis testing.postgresql alembic sqlalchemy pyyaml \
# black psycopg2-binary
# git clone https://github.com/lsst/sdm_schemas
# cd sdm_schemas
# export SDM_SCHEMAS_DIR=`pwd```
# 2. From the root of the consdb git repo, invoke the script. Supply a
# revision message as the command line argument:
# python alembic-autogenerate.py DM-12345
# python alembic-autogenerate.py this is my revision message "\n" \
# the message can span multiple lines "\n" \
# if desired
# 3. Revise your auto-generated code as needed.
# 4. Remove the autogenerated creation of sql views (visit1, ccdvisit1).
#

import os
Expand Down
2 changes: 2 additions & 0 deletions doc/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ doxygen.conf
# Sphinx products
_build
py-api

*.DS_Store
2 changes: 1 addition & 1 deletion doc/contributor-guide/adding-columns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Structure

- ConsDB content must relate to exposures or visits or observations structured like exposures. General time series should go in the Engineering and Facilities Database (EFD).
- ConsDB content should generally be scalar values. Large amounts of data, especially arrays or images or cubes, should generally go into the Large File Annex (LFA).
- Avoid arrays expressed as individual columns (e.g. ``something0``, ``something1``, ``something2``) where possible, as this increases the number of columns drastically (and there is `a limit <https://www.postgresql.org/docs/current/limits.html>`_), makes it hard to query (``SELECT`` clauses need to list all of these individually, and ``WHERE`` clauses may need to include large ``OR`` or ``AND`` conditions), and potentially requires a lot of database storage space.
- Avoid arrays expressed as individual columns (e.g. ``something0``, ``something1``, ``something2``) where possible, as this increases the number of columns drastically (and there is `a limit <https://www.postgresql.org/docs/current/limits.html>`__), makes it hard to query (``SELECT`` clauses need to list all of these individually, and ``WHERE`` clauses may need to include large ``OR`` or ``AND`` conditions), and potentially requires a lot of database storage space.
- Columns should be named in all lowercase with underscore (``_``) separators, also known as "snake_case".

Data sources
Expand Down
2 changes: 1 addition & 1 deletion doc/developer-guide/consdbclient-summit-utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
ConsDbClient in summit_utils
############################

How to write and test code in summit_utils for ConsDbClient
How to write and test code in summit_utils for ConsDbClient
4 changes: 2 additions & 2 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
ConsDB
======

``lsst.consdb`` is developed at https://github.com/lsst-dm/consdb.
You can find Jira issues for this module under the `consdb <https://jira.lsstcorp.org/issues/?jql=project%20%3D%20DM%20AND%20component%20%3D%20consdb>`_ component.
``lsst.consdb`` is developed at `https://github.com/lsst-dm/consdb <https://github.com/lsst-dm/consdb>`__.
You can find Jira issues for this module under the `ConsDB <https://jira.lsstcorp.org/issues/?jql=project%20%3D%20DM%20AND%20component%20%3D%20consdb>`__ component.

#############
Documentation
Expand Down
86 changes: 84 additions & 2 deletions doc/operator-guide/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,87 @@
Deployment
###########

* Database
* REST API Server
Database
========

Deployments of the Consolidated Database are currently located at

- Summit
- USDF (+ dev, use the same underlying database, a replication of Summit)
- Base Test Stand (BTS)
- Tucson Test Stand (TTS)

Updates to these deployments may be needed when there are edits to the schema for any of the cdb_* tables defined in <link to> sdm_schemas.

Tools:
------

- Argo-CD
- LOVE
- Felis

Repositories:
-------------

- `phalanx <https://github.com/lsst-sqre/phalanx>`__
- `sdm_schemas <https://github.com/lsst/sdm_schemas>`__
- `consdb <https://github.com/lsst-dm/consdb>`__

Access needed:
--------------

- NOIRLab VPN
- Summit VPN
- USDF

Process:
--------


Deploy code to populate db at Summit and/or USDF
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Follow the testing steps above for testing alembic migration and code at TTS/BTS, before the you consider deploying at the summit.

The steps to deploy at the summit mirror the steps to test on a test stand with coordination and permission from the observers and site teams.
Access to argo-cd deployments is available via the Summit OpenVPN.
To coordinate your deployment update on the summit, you must attend Coordination Activities Planning (CAP) meeting on Tuesday mornings and announce your request.

Add your migration intentions to the CAP SITCOM confluence agenda `here <https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53765933/Agenda+Items+for+Future+CAP+Meetings>`__

The CAP members may tell you a time frame that is acceptable for you to perform these changes.

They may also tell you specific people to coordinate with to help you take images to test LATISS and LSSTCOMCAMSIM tables. There will be more tables to test eventually.

Channels to note: #rubinobs-test-planning; #summit-announce; #summit-auxtel, and `channel usage guide <https://obs-ops.lsst.io/Communications/slack-channel-usage.html>`__.

When you get your final approval and designated time to perform the changes to ConsDB, announce on #summit-announce, and follow similar steps as test stand procedure above.

USDF Deployment Steps
^^^^^^^^^^^^^^^^^^^^^

These steps must happen in synchrony with a Summit migration.

1. Disable (pause) SUBSCRIPTION at USDF.
2. Perform the migration at the summit with the steps below.
3. Connect to the USDF database via psql and perform the alembic migration.
4. Check or test as agreed upon with the ConsDB team.
5. Enable and Refresh Subscription at USDF.

If there is no impact or coordination with Summit needed: Run alembic migration at USDF, and test as appropriate.

Summit Deployment Steps
^^^^^^^^^^^^^^^^^^^^^^^

1. Use a branch in ``phalanx`` to point to the ConsDB tag for deployment.
2. Set the Argo-CD application ``consdb's`` target revision to your ``phalanx`` branch.
3. Refresh the ConsDB application and review pod logs.
4. Connect to the summit database via psql and perform the alembic migration.
5. Have an image taken with the observing team, then verify database entries with a SQL query or Jupyter notebook.
6. Check your new entries in the database using a jupyter notebook or SQL query in RSP showing your new image has been inserted to the database as expected.

Once deployment succeeds, set the ``Target Revision`` in Argo-CD back to ``main`` and complete the ``phalanx`` PR for the tested ConsDB tag.


REST API Server
===============
32 changes: 30 additions & 2 deletions doc/operator-guide/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,33 @@
Monitoring
###########

* Database
* REST API Server
Reporting channels
==================

Users of ConsDB, ConsDBClient (``pqserver``) should report issues via #consolidated-database in rubin-obs.slack.com.

ConsDB operators monitor this channel and #ops-usdf, #ops-usdf-alerts for issues and outages reported, as well as escalate verified database issues.

Database
========

The ConsDB team is responsible for verifying whether or not the database is up when issues are reported.

They can check the method reported by the users, check using ``psql``/ ``pgcli``, and check in the #ops-usdf slack channel for currently reported issues.

Once the ConsDB team has confirmed there is an issue with the database, they should notify #ops-usdf slack channel and USDF DBAs should be responsible for fixing/restarting.

REST API Server
===============

If we suspect the API server died, the ConsDB team should be responsible for checking and restarting it.

Use the appropriate argo-cd deployment graph to check deployment logs, and potentially restart the service.


Other issues
------------

If the K8s infrastructure died then the ConsDB team can verify the problem, but there are likely to be wider issues seen.

USDF or Summit K8s/IT support should be responsible for fixing.
113 changes: 112 additions & 1 deletion doc/operator-guide/runbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,115 @@
RunBook
########

Maybe from ConsDb Usage Confluence page?
Confluence runbook initial incomplete version `here <https://rubinobs.atlassian.net/wiki/spaces/LSSTOps/pages/45665320/Consolidated+Database+ConsDB+Runbook+draft+incomplete>`__

Overview
========

This application does ...

Its design and architecture are documented at ...

Usage
=====

Most users
----------

Administration
--------------

Architecture
============

Kubernetes vclusters used

Relevant policies

S3DF Dependencies
-----------------

Kubernetes
Weka storage for Kubernetes
...

Systems
-------

Components, Kubernetes namespaces, deployments

Backups
-------

Associated Systems
------------------

IAM
===

Requesting Access
-----------------

Key Roles
---------

Service Accounts
----------------

Network
=======

External endpoints, IP and port, encryption, authentication, clients, API

SLAC-internal endpoints, IP and port, encryption, authentication, clients, API

Configuration
=============

GitHub repos with deployments

Monitoring
==========

Grafana or other links

Maintenance
===========

Periodic tasks

Documentation and Training
==========================

Links to documentation and training resources

LSST io page at `consdb.lsst.io <https://consdb.lsst.io>`__

Support
=======

#consolidated-database

Overall complaints:
-------------------

Kian-Tat Lim

ConsDB services (hinfo, pqserver):
--------------------------------------

Brian Brondel , Valerie Becker

Transformed EFD component:
--------------------------

Rodrigo Boufleur , Glauber Costa Vila Verde

``consdb`` component in Jira.


Known Issues
============

Standard Procedures
===================
82 changes: 77 additions & 5 deletions doc/operator-guide/schema-migration-process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,80 @@
Schema Migration Process
########################

* Add columns to sdm_schemas
* Create alembic migration
* Test migration and code to populate the new columns/tables at TTS/BTS if Summit schema is changing
* Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
* Deploy code to populate at Summit and/or USDF
Add columns to sdm_schemas
==========================

First, add the requested database additions, justifications, and where they are generated to our `confluence entry table <https://rubinobs.atlassian.net/wiki/spaces/DM/pages/246644760/Consolidated+Database+Non-EFD+Entries>`__.
Then, create a ticket and edit the `repository <https://github.com/lsst/sdm_schemas>`__ to apply your schema changes to any of the ``cdb_*.yml`` schemas.

If your sdm_schemas PR has issues, check that the schema conforms to Felis's data model and valid SQL tables can be created with `felis validate/create <https://felis.lsst.io/user-guide/cli.html#felis-validate>`__
Alembic migrations should be automatically created by a git workflow after your sdm_schemas pull request completes.


Create an Alembic Migration (manually)
======================================

`Alembic <https://alembic.sqlalchemy.org/en/latest/front.html>`__ keeps track of versioning by autogenerated migrations to sync the test stands and summit databases.
Versioning our database schema changes allows us to apply edits and move the database’s state forward or backward as needed.

1. Create an Alembic migration on your ConsDB ticket branch.
2. Use the script ``consdb/alembic-autogenerate.py`` to generate Alembic migrations.
3. Follow the directions in the header of the script, then run ``python alembic-autogenerate.py`` to create version files in respective database-named directories in ``consdb/alembic/``.
4. Manually edit the generated files in ``consdb/alembic/<table-name>/`` to:

- Remove the ``visit1`` and ``ccdvisit1`` views.
- Ensure constraints and renamed columns are correct.

Test alembic migration
======================
Before merging your ConsDB migration PR or applying this migration to the Summit, you must test applying the migration in a test environment.
Test both applying the migration and any code that populates the new columns/tables at TTS/BTS if Summit schema is changing.


1. Update the deployment on the test stand:
-------------------------------------------

1. Choose the appropriate test stand (TTS, BTS)
2. Create a branch in ``phalanx`` and edit the corresponding test stand environment file ``phalanx/applications/consdb/values-<test stand>.yaml`` to point to your branch's built docker image (tickets-DM-###).
3. Coordinate and announce in the appropriate slack channel that you will begin testing your migrations.
4. Update the ConsDB deployment in ``<url.to.teststand>/argo-cd`` to use your ``phalanx`` branch in the ``Target Revision``. Refresh and check pod logs.
5. Verify the tables that you will be upgrading exist using ``psql``
6. From the ``consdb/`` directory, (where ``alembic.ini`` file is) use the alembic commands to upgrade the existing database tables: ``alembic upgrade head -n <database name>``
7. Deploy new ConsDB software (``hinfo``, ``pqserver``) and check the initial logs.

2. Test with LATISS imaging in ATQueue:
---------------------------------------

See `TTS Start Guide <https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53739987/Tucson+Test+Stand+Start+Guide>`__ for guidelines on using the test stands.

Access LOVE via ``<url.to.teststand>/love`` and use the 1Password admin information to sign in, or your SLAC username and password.
Navigate to the ATQueue or Auxillary Telescope (AuxTel) Script Queue.

- Before editing these scripts, note their starting configurations, as we will return the scripts to this configuration when we are done.

Take a test/simulated picture with LATISS through the ATQueue using these three scripts:

1. ``set_summary_state.py`` Change the configuration to set ATHeaderService and ATCamera to ENABLED.
2. ``enable_latiss.py`` Remove any existing configuration.
3. ``take_image_latiss.py`` Update the configuration to remove anything that is not 'nimages' (1) and 'image_type' (BIAS or DARK or FLAT)

Once you have put these three scripts in the queue, click ``run``.
Watch for errors in both the Script Queue and the Argo-CD ConsDB pod logs and ``hinfo-latiss`` deployment.
Address any errors and retest.

Check the database by using ``psql`` commands like ``\dt`` to display the table names and maybe even ``SELECT * from cdb_latiss.exposure where day_obs == <YYYYMMDD>;`` to view the most recent data.

Run set_summary_state to set ATHeaderService and ATCamera back to STANDBY, and return LATISS back to STANDBY.
Then return these three scripts to their original configurations.

If you have encountered errors in this process, do not proceed to the summit, but address those errors and retest them with your ``phalanx`` branch pointing to your ConsDB branch with the fix to these errors.

If tests are successful, create a pull request for the Alembic migration in ConsDB. Tag the release according to ``standards-practices`` guidelines.
Update your existing ``phalanx`` branch to point the environment based deployments to this ConsDB tag.

You are able to retest on the test stand at this point, hopefully there were no changes to your ConsDB pull request so this step is trivial.

Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
-----------------------------------------------------------------------------------------------

See deployment page for specific environment deployment steps
2 changes: 1 addition & 1 deletion doc/user-guide/consdb-client-library-in-summit-utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
ConsDB Client Library in summit_utils
######################################

Querying using ConsDbClient
Querying using ConsDbClient
Loading

0 comments on commit 8893545

Please sign in to comment.