alan-turing-institute · EdwinB12 · Apr 22, 2024 · Apr 12, 2024
diff --git a/docs/docs.md b/docs/docs.md
@@ -2,13 +2,15 @@
 
 ## Summary and Table of Contents
 
-The root directory of this repository contains the Python package called `dtbase` that is installed if you do `pip install .` from the root directory. After doing that you should be able to open a Python session from anywhere and do `import dtbase`.
+DTBase consists of several subparts, such as the frontend and the backend, found in folders at the root level of the repository and under the `dtbase` folder. We list them here briefly and further below provide extensive documentation for each in turn.
 
-There are a few important subdirectories within this package. We list them here briefly and further below provide extensive documentation for each in turn.
+### DTBase as a Python Package
+
+The root directory contains the Python package called `dtbase` that is installed if you do `pip install .`. After doing that you should be able to open a Python session from anywhere and do `import dtbase`. This also installs all the Python dependencies for all the subparts of the repo. The importing structure is such that all of the subparts are free to import from `dtbase.core`, but none of them should otherwise import from each other. E.g. frontend doesn't import anything from backend, and vice versa.
 
 ### [backend](#dtbase-backend)
 
-This is a FastAPI application, providing API endpoints for interacting with the database (via the core functions).
+This is a FastAPI application, providing API endpoints for interacting with the database.
 
 ### [frontend](#dtbase-frontend)
 
@@ -24,7 +26,7 @@ This is where the code for specific models is located.
 
 ### [ingress](#dtbase-ingress)
 
-This is where code for specific data ingress is located. Data ingress is the act of pulling in data from another source such as an external API or database and inserting into the your digital twin database via the backend.
+This is where code for specific data ingress is located. Data ingress is the act of pulling in data from another source, such as an external API or database, and inserting into the your digital twin database via the backend.
 
 ### [functions](#dtbase-functions)
 
@@ -46,8 +48,7 @@ The backend is the heart of DTBase. The frontend is just a pretty wrapper for th
 The backend is a web app implemented using FastAPI. It takes in HTTP requests and returns responses.
 
 ### Code structure
-* `run.sh`. This is script you call to run the FastAPI app.
-* `run_localdb.sh`. Just like `run.sh` except sources a different file of secrets and ensures that a local PostgreSQL server is running.
+* `run.sh`. This is the script you call to run the FastAPI app.
 * `create_app.py`. A tiny script that calls `main.create_app()`.
 * `main.py`. The module that defines how the FastAPI app is set up, its settings, endpoints, etc.
 * `routers`. The API divides into subsections, such as `/user` for user management and `/sensor` for sensor data. Each of these is implemented in a separate file in `routers`.
@@ -63,7 +64,7 @@ The backend is a web app implemented using FastAPI. It takes in HTTP requests an
 
 ### API documentation
 
-Documentation listing all the API endpoints and their payloads, return values, etc., is automatically generated by FastAPI. If you are developing/running locally, and your backend is running at `http://localhost:5000`, you can find these docs at `http://localhost:5000/docs`. Correspondingly for an Azure deployment it will be something like `https://<your-azure-app-name>_backend.azurewebsites.net/docs`.
+Documentation listing all the API endpoints and their payloads, return values, etc., is automatically generated by FastAPI. If you are developing/running locally, and your backend is running at `http://localhost:5000`, you can find these docs at `http://localhost:5000/docs`. Correspondingly for an Azure deployment it will be something like `https://<your-azure-app-name>-backend.azurewebsites.net/docs`.
 
 #### Authentication
 
@@ -79,22 +80,20 @@ then you would call the other end points with the following in the header of the
 Authorization: Bearer abc
 ```
 
-If your token expires, you can use the refresh token to get a new for some time still, by calling the `/auth/refresh` end point. This one requires setting you header like above, but using the refresh token (`xyz`) rather than the access token (`abc`).
+If your token expires, you can use the refresh token to get a new for some time still, by calling the `/auth/refresh` end point. This one requires setting your header like above, but using the refresh token (`xyz`) rather than the access token (`abc`).
 
 #### Locations
 
-Locations can be defined using any combination of floating point, integer, or string variables.   These variables, known as `LocationIdentifiers` must be inserted into the database before an actual `Location` can be entered.  The set of `LocationIdentifiers` that is sufficient to define a `Location` is called a `LocationSchema`.   A `Location` will therefore have a `LocationSchema`, and one `LocationXYZValue` for each `LocationIdentifier` within that schema (where `XYZ` can be `Float`, `Integer` or `String`).
+Locations can be defined using any combination of floating point, integer, or string variables. These variables, known as `LocationIdentifiers` must be inserted into the database before an actual `Location` can be entered.  The set of `LocationIdentifiers` that is sufficient to define a `Location` is called a `LocationSchema`. A `Location` will therefore have a `LocationSchema`, and one `LocationXYZValue` for each `LocationIdentifier` within that schema (where `XYZ` can be `Float`, `Integer` or `String`).
 
 An example clarifies: Say you're making a digital twin of a warehouse. All locations in the warehouse are identified by which room they are in, and which rack and shelf in that room we are talking about. Room number, rack code, and shelf number would then be `LocationIdentifiers`, and the `LocationSchema` would simply say that to specify a location, these three variables need to be given. Room and shelf number might be integers, and rack code could be a string. Other examples of location schemas could be xyz coordinates, or longitude-latitude-altitude coordinates.
 
 #### Sensors
 
-The sensor data model is as follows.   Every `Sensor` has a `SensorType` which in turn specifies the variable(s) it can measure - these are known as `SensorMeasures`.  Each `SensorMeasure` specifies its datatype (float, int, string, or bool), and these are used to define the type of the corresponding `SensorXYZReadings`.   A `Sensor` may also have a `SensorLocation`, which specifies a `Location` as defined above, and a time window (possibly open-ended) when the sensor was at that location.
+The sensor data model is as follows. Every `Sensor` has a `SensorType` which in turn specifies the variable(s) it can measure - these are known as `SensorMeasures`.  Each `SensorMeasure` specifies its datatype (float, int, string, or bool), and these are used to define the type of the corresponding `SensorXYZReadings`. A `Sensor` may also have a `SensorLocation`, which specifies a `Location` as defined above, and a time window (possibly open-ended) when the sensor was at that location.
 
 For instance, weather station could be a `SensorType`, and it might record readings for the three different `SensorMeasures`: Temperature, humidity, and is-it-raining-right-now. The first two would be numbers, and the last one would be a boolean. These would go in the tables `SensorFloatReading` and `SensorBooleanReading`. You could then have two instances of this sensor type, i.e. two weather stations, associated with different locations.
 
-Currently (2024-03-21) the way we treat units of `SensorMeasures` is slightly inconsistent. It should be the case that each `SensorMeasure` is uniquely identified by a combination of name and units, but for some of the API end points we only care about the name, which would cause issues if there are two measures with the same name and different units. See [here](https://github.com/alan-turing-institute/DTBase/issues/245) for the relevant issue.
-
 #### Models
 
 `Model` objects come associated with `ModelMeasures`, that are exactly analogous to `SensorMeasures`, i.e. they specify different quantities a model may output. The model outputs, which can again be floats, ints, strings, or booleans, are always associated with a `ModelRun`, which comes with a timestamp for when this run of the model happened. Each run is also associated with a `ModelScenario`, which is DTBase's way of keeping track of model parameters or other variations in how models can be run.
@@ -148,9 +147,9 @@ Some notable Typescript dependencies are
 
 ### Our Approach to Typescript and Javascript
 
-The vast majority of client-side code is written in Typescript, and it should be in the `/app/base/static/typescript` (henceforth just `typescript`) folder as `.ts` files. Webpack, which gets run by `run.sh` when starting the frontend webserver, sorts out dependencies and transpiles the Typescript into `.js` files in the `/app/base/static/javascript` folder. There will be one `.js` file for every `.ts` file. The Jinja HTML templates can then include these transpiled Javascript files using `<script>` tags.
+The vast majority of client-side code is written in Typescript, and it should be in the `/app/base/static/typescript` folder as `.ts` files. Webpack, which gets run by `run.sh` when starting the frontend webserver, sorts out dependencies and transpiles the Typescript into `.js` files in the `/app/base/static/javascript` folder. There will be one `.js` file for every `.ts` file. The Jinja HTML templates can then include these transpiled Javascript files using `<script>` tags.
 
-The only pure, non-typed Javascript one should ever write should be minimal amounts in `<script>` tags in the Jinja templates. The reason we do this at all is that Flask passes some data to the Jinja templates which needs to be further be passed onto functions we've written in Typescript. The typical usage pattern looks something like this. In the HTML template we have
+The only pure, non-typed Javascript one should ever write should be minimal amounts in `<script>` tags in the Jinja templates. The reason we do this at all is that Flask passes some data to the Jinja templates which needs to be further passed onto functions we've written in Typescript. The typical usage pattern looks something like this. In the HTML template we have
 
 ```jinja-html
 {% block javascripts %}
@@ -211,8 +210,6 @@ Folder: `dtbase/models`
 
 This folder hosts two general purpose timeseries forecasting models, ARIMA and HODMD. They work both as useful additions to many digital twins and as examples for how to implement a model that interfaces with DTBase.
 
-Currently (as of 2024-03-08) this remains work in progress. ARIMA is fully functional, but it has some vestige in its code from a time when it was used for a more particular application, that needs to be cleaned up. HODMD works, but is not very well tested, so bugs may remain.
-
 The way to implement your own model is to use the `BaseModel` class as described below. We recommend also reading [services](#dtbase-services) section, since `BaseModel` is just an instance of `BaseService`, described there.
 
 ### BaseModel
@@ -238,12 +235,12 @@ The user then needs to write a `get_service_data` method in `CustomModel`. This
         return predictions
 ```
 
-The structure of `predictions`, i.e. the return value of `get_service_data`, should be as follows:
+The structure of `predictions`, i.e. the return value of `get_service_data`, should be in the following format:
 
 ```
-[(endpoint, payload), (endpoint, payload), etc.]
+[(endpoint name, payload), (endpoint name, payload), etc.]
 ```
-Here `endpoint` is a string that is the name of a DTBase API endpoint, and `payload` is a dictionary or a list that is the payload that that endpoint expects. For models, the endpoints that likely need to be returned are:
+Here `endpoint name` is a string that is the name of a DTBase API endpoint, and `payload` is a dictionary or a list that is the payload that that endpoint expects. For models, the endpoints that likely need to be returned are:
 
 - `/model/insert-model`
 - `/model/insert-model-scenario`
@@ -303,10 +300,10 @@ The user then needs to write a `get_service_data` method in the `CustomDataIngre
 The structure of return value should be as follows:
 
 ```
-[(endpoint, payload), (endpoint, payload), etc.]
+[(endpoint name, payload), (endpoint name, payload), etc.]
 ```
 
-Here each `endpoint` is a string for the name of a DTBase backend endpoint. Each `payload` should be in the specific format required by that endpoint. For more details about the backend endpoints see the [backend](#dtbase-backend) section.
+Here each `endpoint name` is a string for the name of a DTBase backend endpoint. Each `payload` should be in the specific format required by that endpoint. For more details about the backend endpoints see the [backend](#dtbase-backend) section.
 
 For example, if we would like to insert two different types of sensor readings, then the output of `get_service_data` should look something like this:
 
@@ -358,8 +355,8 @@ DT_BACKEND_URL="http://myownserver.runningdtbase.com" python my_very_own_ingress
 ```
 
 Behind the scenes calling `ingresser`
-1. Runs the `get_service_data` method to extract data from a source
-2. Logs into the backend
+1. Logs into the backend
+2. Runs the `get_service_data` method to extract data from a source
 3. Loops through the return value of `get_service_data` and posts it to the backend.
 
 ### OpenWeatherMap Example
@@ -398,7 +395,7 @@ sensor_readings_output = [
 return sensor_type_output + sensor_output + sensor_readings_output
 ```
 
-Note that the `get_service_data` method must returns a list of tuples structures as `(endpoint, payload)` for the ingress method to integrate into the rest of DTBase.
+Note that the `get_service_data` method must returns a list of tuples structures as `(endpoint name, payload)` for the ingress method to integrate into the rest of DTBase.
 
 #### 3. Uploading data to database
 
@@ -480,6 +477,6 @@ You are now ready to create a new stack. If you ever need to create a second sta
 
 8. Create a new Pulumi stack with `pulumi stack init --secrets-provider="azurekeyvault://<NAME OF KEY VAULT>.vault.azure.net/keys/<NAME OF KEY>"`
 9. Make sure you're in a Python virtual environment with Pulumi SDK installed (`pip install .[infrastructure]` should cover your needs).
-10. Set all the necessary configurations with `pulumi config set` and `pulumi config set --secret`. You'll find these in `__main__.py`, or you can keep adding them until `pulumi up` stops complaining. Do make sure to use `--secret` for any configuration variables the values of which you are not willing to make public, such as passwords. You can make all of them `--secret` if you want to play safe, there's no harm in that. These values are written to `Pulumi.name-of-our-stack.yaml`, but if `--secret` is used they are encrypted with the key from your vault, and are unreadable gibberish to outsiders.
+10. Set all the necessary configurations with `pulumi config set` and `pulumi config set --secret`. You'll find these in `__main__.py`, or you can keep adding them until `pulumi up` stops complaining. Do make sure to use `--secret` for any configuration variables the values of which you are not willing to make public, such as passwords. You can make all of them `--secret` if you want to play it safe, there's no harm in that. These values are written to `Pulumi.name-of-our-stack.yaml`, but if `--secret` is used they are encrypted with the key from your vault, and are unreadable gibberish to outsiders.
 11. Run `pulumi up` to stand up your new Pulumi stack.
 12. Optionally, you can set up continuous deployment for the webservers and Azure Functionapp. To do this for the frontend, select your frontend WebApp in the Azure Portal, navigate to Deployment Center, and copy the generated Webhook URL; then, head to Docker Hub, select the container used by the WebApp, and create a new webhook using the copied URL. You need to do this for each of the three WebApps: The frontend, the backend, and the function app. This makes it such that every time a new version of the container is pushed to Docker Hub (by e.g. the GitHub Action) the web servers automatically pull and run the new version.