Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] LLM Stack #191

Merged
merged 10 commits into from
Aug 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ on:
branches:
- main
- develop
- '**feature**'
- 'feature/*'
- 'hotfix/*'
- 'release/*'
- 'fixes/*'
push:
branches:
- main
Expand Down
52 changes: 52 additions & 0 deletions docs/resource-stacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Resource Stacks 📚

Machine Learning projects often vary in their size, from small-scale experimentation to large deployments, meaning that the infrastructure requirements also change and scale. For example, the infrastructure stack needed for deploying an LLM may require a GPU or vector database, which aren't usually needed in more general machine learning use-cases.

Matcha accommodates both of these requirements, and currently offers two infrastructure stacks which we'll discuss in more detail here and show how you can get started with either.

> Note: These stacks must be set before provisioning any resources and cannot be change whilst a Matcha deployment exists.

## Available stacks

### DEFAULT

The `DEFAULT` stack. This stack is ideal for generic machine learning training and deployments and a good starting point. It includes:
* [Azure Kubernetes Service](https://azure.microsoft.com/en-gb/products/kubernetes-service)
* [ZenML](https://www.zenml.io/home)
* [Seldon Core](https://www.seldon.io/solutions/open-source-projects/core) (deployment)
* [MLflow](https://mlflow.org/) (experiment tracking)
* Data version control storage bucket

This is the stack used in the [getting started page](getting-started.md). Follow the link for more information.

### LLM

The `LLM` stack: This includes everything found within the `DEFAULT` stack with the addition of a vector database - Chroma DB. This stack is modified for the training and deployment of Large Language Models (LLMs).

* [Azure Kubernetes Service](https://azure.microsoft.com/en-gb/products/kubernetes-service)
* [ZenML](https://www.zenml.io/home)
* [Seldon Core](https://www.seldon.io/solutions/open-source-projects/core) (deployment)
* [MLflow](https://mlflow.org/) (experiment tracking)
* Data version control storage bucket
* [Chroma DB](https://www.trychroma.com/) (vector database for document retrieval)


We use this stack for [MindGPT](https://github.com/fuzzylabs/MindGPT), our large language model for mental health question answering.

## How to switch your stack

To switch your stack to the 'DEFAULT' stack, run the following command:

```bash
$ matcha stack set default
```

or for the 'LLM' stack:

```bash
$ matcha stack set llm
```

If no stack is set Matcha will use the 'default' stack.

See the [API documentation](references.md) for more information.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ nav:
- Azure Costs: 'costings.md'
- Inside Matcha:
- How does Matcha work: 'inside-matcha.md'
- Resource stacks: 'resource-stacks.md'
- Why we collect usage data: 'privacy.md'
- Tools:
- Data Version Control: 'data-version-control.md'
Expand Down
924 changes: 476 additions & 448 deletions poetry.lock

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions src/matcha_ml/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,17 @@

app = typer.Typer(no_args_is_help=True, pretty_exceptions_show_locals=False)
analytics_app = typer.Typer(no_args_is_help=True, pretty_exceptions_show_locals=False)
stack_app = typer.Typer(no_args_is_help=True, pretty_exceptions_show_locals=False)
app.add_typer(
analytics_app,
name="analytics",
help="Enable or disable the collection of anonymous usage data (enabled by default).",
)
app.add_typer(
stack_app,
name="stack",
help="Configure the stack for Matcha to provision.",
)


def fill_provision_variables(
Expand Down Expand Up @@ -242,5 +248,23 @@ def opt_in() -> None:
core.analytics_opt_in()


@stack_app.command(help="Define the stack for Matcha to provision.")
def set(stack: str = typer.Argument("default")) -> None:
"""Define the stack for Matcha to provision.

Args:
stack (str): the name of the stack to provision.
"""
try:
core.stack_set(stack)
print_status(build_status(f"Matcha '{stack}' stack has been set."))
except MatchaInputError as e:
print_error(str(e))
raise typer.Exit()
except MatchaError as e:
print_error(str(e))
raise typer.Exit()


if __name__ == "__main__":
app()
16 changes: 16 additions & 0 deletions src/matcha_ml/config/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""Matcha state sub-module."""
from .matcha_config import (
DEFAULT_CONFIG_NAME,
MatchaConfig,
MatchaConfigComponent,
MatchaConfigComponentProperty,
MatchaConfigService,
)

__all__ = [
"MatchaConfigService",
"MatchaConfig",
"MatchaConfigComponentProperty",
"MatchaConfigComponent",
"DEFAULT_CONFIG_NAME",
]
226 changes: 226 additions & 0 deletions src/matcha_ml/config/matcha_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
"""The matcha.config.json file interface."""
import json
import os
from dataclasses import dataclass
from typing import Dict, List, Optional, Union

from matcha_ml.errors import MatchaError

DEFAULT_CONFIG_NAME = "matcha.config.json"


@dataclass
class MatchaConfigComponentProperty:
"""A class to represent Matcha config properties."""

name: str
value: str


@dataclass
class MatchaConfigComponent:
"""A class to represent Matcha config components."""

name: str
properties: List[MatchaConfigComponentProperty]

def find_property(self, property_name: str) -> MatchaConfigComponentProperty:
"""Given a property name, find the property that matches it.

Note: this only works under the assumption of none-duplicated properties.

Args:
property_name (str): the name of the property.

Raises:
MatchaError: if the property could not be found.

Returns:
MatchaConfigComponentProperty: the property that matches the property_name parameter.
"""
property = next(
filter(lambda property: property.name == property_name, self.properties),
None,
)

if property is None:
raise MatchaError(
f"The property with the name '{property_name}' could not be found."
)

return property


@dataclass
class MatchaConfig:
"""A class to represent the Matcha config file."""

components: List[MatchaConfigComponent]

def find_component(self, component_name: str) -> MatchaConfigComponent:
"""Given a component name, find the component that matches it.

Note: this only works under the assumption of none-duplicated properties.

Args:
component_name (str): the name of the component.

Raises:
MatchaError: if the component could not be found.

Returns:
MatchaConfigComponent: the component that matches the component_name parameter.
"""
component = next(
filter(lambda component: component.name == component_name, self.components),
None,
)

if component is None:
raise MatchaError(
f"The component with the name '{component_name}' could not be found."
)

return component

def to_dict(self) -> Dict[str, Dict[str, str]]:
"""A function to convert the MatchaConfig class into a dictionary.

Returns:
Dict[str, Dict[str, str]]: the MatchaState as a dictionary.
"""
state_dictionary = {}
for config_component in self.components:
state_dictionary[config_component.name] = {
property.name: property.value
for property in config_component.properties
}

return state_dictionary

@staticmethod
def from_dict(state_dict: Dict[str, Dict[str, str]]) -> "MatchaConfig":
"""A function to convert a dictionary representation of the Matcha config file into a MatchaConfig instance.

Args:
state_dict (Dict[str, Dict[str, str]]): the dictionary representation of the Matcha config file.

Returns:
MatchaConfig: the MatchaConfig representation of the MatchaConfig instance.
"""
components: List[MatchaConfigComponent] = []
for resource, properties in state_dict.items():
components.append(
MatchaConfigComponent(
name=resource,
properties=[
MatchaConfigComponentProperty(name=key, value=value)
for key, value in properties.items()
],
)
)

return MatchaConfig(components=components)


class MatchaConfigService:
"""A service for handling the Matcha config file."""

@staticmethod
def get_stack() -> Optional[MatchaConfigComponentProperty]:
"""Gets the current stack name from the Matcha Config if it exists.

Returns:
Optional[MatchaConfigComponentProperty]: The name of the current stack being used as a config component object.
"""
try:
stack = (
MatchaConfigService.read_matcha_config()
.find_component("stack")
.find_property("name")
)
except MatchaError:
stack = None

return stack

@staticmethod
def write_matcha_config(matcha_config: MatchaConfig) -> None:
"""A function for writing the local Matcha config file.

Args:
matcha_config (MatchaConfig): the MatchaConfig representation of the MatchaConfig instance.
"""
local_config_file = os.path.join(os.getcwd(), DEFAULT_CONFIG_NAME)

with open(local_config_file, "w") as file:
json.dump(matcha_config.to_dict(), file)

@staticmethod
def read_matcha_config() -> MatchaConfig:
"""A function for reading the Matcha config file into a MatchaConfig object.

Returns:
MatchaConfig: the MatchaConfig representation of the MatchaConfig instance.

Raises:
MatchaError: raises a MatchaError if the local config file could not be read.
"""
local_config_file = os.path.join(os.getcwd(), DEFAULT_CONFIG_NAME)

if os.path.exists(local_config_file):
with open(local_config_file) as config:
local_config = json.load(config)

return MatchaConfig.from_dict(local_config)
else:
raise MatchaError(
f"No '{DEFAULT_CONFIG_NAME}' file found, please generate one by running 'matcha provision', or add an existing ''{DEFAULT_CONFIG_NAME}'' file to the root project directory."
)

@staticmethod
def config_file_exists() -> bool:
"""A convencience function which checks for the existence of the matcha.config.json file.

Returns:
True if the matcha.config.json file exists, False otherwise.
"""
return os.path.exists(os.path.join(os.getcwd(), DEFAULT_CONFIG_NAME))

@staticmethod
def update(
components: Union[MatchaConfigComponent, List[MatchaConfigComponent]]
) -> None:
"""A function which updates the matcha config file.

If no config file exists, this function will create one.

Args:
components (dict): A list of, or single MatchaConfigComponent object(s).
"""
if isinstance(components, MatchaConfigComponent):
components = [components]

if MatchaConfigService.config_file_exists():
config = MatchaConfigService.read_matcha_config()
config.components += components
else:
config = MatchaConfig(components)

MatchaConfigService.write_matcha_config(config)

@staticmethod
def delete_matcha_config() -> None:
"""A function for deleting the local Matcha config file.

Raises:
MatchaError: raises a MatchaError if the local config file could not be removed.
"""
local_config_file = os.path.join(os.getcwd(), DEFAULT_CONFIG_NAME)

try:
os.remove(local_config_file)
except Exception:
raise MatchaError(
f"Local config file at path:{local_config_file} could not be removed."
)
2 changes: 2 additions & 0 deletions src/matcha_ml/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
get,
provision,
remove_state_lock,
stack_set,
)

__all__ = [
Expand All @@ -15,4 +16,5 @@
"remove_state_lock",
"destroy",
"provision",
"stack_set",
]
Loading