Skip to content

Backend Architecture Documentation

Alex Langshur edited this page May 24, 2021 · 1 revision

The backend is designed to provide a seamless prediction and fine-tuning experience to the front-end user. To do so, it employs various difference services that are combined logically within the various available endpoints.

Flask Application

At the core of the backend system lies a Flask server application. This lightweight Python web server contains the logical endpoint mechanisms outlined in Endpoint Documentation, as well as all the connectors to the various caching and storage systems in use. Additionally, the Flask application provides access to a hierarchy of models for autocomplete prediction that can be easily substituted for one another due to a generic polymorphic implementation.

Authentication & Cache Sessions

Since the model prediction functionality is employing the use of potentially sensitive user data in order to better fine-tune the autocomplete predictions, all the routes must be appropriately protected with API key authentication. To properly authenticate a user upon receiving an HTTP request from the frontend with the appropriate authentication headers, we must do a lookup in a network-connected database to validate the user and retrieve their data. This entire chain of events is latency-constrained and is therefore not suitable for our main prediction endpoint, which must very quickly receive contextual data about the user’s cursor position and directly infer autocomplete predictions. Therefore, we must employ some method of both validating a user’s API key and then validation prediction requests directly in memory.

Our solution to these constraints is to implement a session authentication architecture. Specifically, we have a session endpoint that a user will send their API key to. This session endpoint will then authenticate the user by doing a lookup in our cloud database, based on DynamoDB NoSQL, and cache in-memory the user’s data under a novel session identifier that is returned to the user. This session in memory cache has a time-to-live, before which user’s can submit requests to the prediction endpoint with the new session token to quickly authenticate in-memory.

To reiterate the mechanics of this entire process — the user will submit an API key as a header to the /session endpoint. If the API key is validated in DynamoDB and returns an existing user, the server will create a new user session and cache that user's data in memory for fast access in subsequent requests. This will allow future calls to the /predict endpoint to authenticate simply by using the session identifier, which removes the network overhead of having to communicate with the database on latency-sensitive calls to the/predict endpoint. Note that the session ID that is returned by this endpoint will expire after an hour -- sessions that last longer than this in the frontend should start a new session.

DynamoDB and S3 Storage

As mentioned briefly above, DynamoDB is used to store non-bulk generic user metadata. This includes user identification details, API keys, and various other pieces of metadata related to fine-tuning and user-uploaded data. This flexible NoSQL system allows us to quickly authenticate users and provide the in-memory session an up-to-date version of the user data.

S3 is used for storing user-specific model weights/parameters and user-specific files for fine-tuning. Behind the scenes, the frontend will upload a user’s current working repository of Python files to the /mass-upload endpoint, which will then be stored in an S3 bucket dedicated to that user’s fine-tuning files. When a user starts a new authentication session, the backend will determine an appropriate time to fine-tune the currently supported model on the user’s personal repository data. Given that the model parameters for the transform model are too big to fit in DynamoDB, they will be stored in S3 as well and then downloaded into cache memory upon the creation of a new authentication session, once again eliminating the network latencies from the prediction problem.