Skip to content

Latest commit

 

History

History
399 lines (300 loc) · 16.7 KB

README.md

File metadata and controls

399 lines (300 loc) · 16.7 KB

API to distribute hyperparameters optimization through HTTP requests

PyPI - Python versions PyPI - Version PyPI - Status PyPI - Downloads GitHub - Forks GitHub - Stars DOI

OptunAPI is a simple API designed for Machine Learning applications that allows to distribute an automatic hyperparameters optimization over different machines through HTTP requests. Each set of hyperparameters can be studied independently since the minima research does't require any gradients computation, but instead is performed through a Bayesian optimization based on Optuna. The machine running Optuna manages centrally the optimization studies -- the so-called "Optuna-server" -- providing sets of hyperparameters and assessing them by the scores evaluated and sent back by the single computing instance, named "Trainer-client". The HTTP requests underlying such client-server system are powered by FastAPI.

Key Features

OptunAPI inherits most of the modern functionalities of Optuna and FastAPI:

  • Lightweight and versatile
    • OptunAPI is entirely written in Python and has few dependencies.
  • Easy to configure
  • Easy to integrate
  • Easy parallelization
    • Different machines can run the hyperparameters study in parallel, centrally coordinated by the server.
  • Efficient optimization algorithms
    • The optimization task is headed by Optuna and its state-of-the-art algorithms.
  • Quick visualization for study analysis
    • TODO - OptunAPI provides a set of reports to monitor the status of the hyperparameters study.

Key Components

To understand how OptunAPI works, we need to spend a couple of words about its components:

  • Study and Trial objects from Optuna
  • Optuna's Ask-and-Tell interface
  • HTTP requests to map the hyperparameters space

Study and Trial

A study corresponds to an optimization task, i.e., a set of trials. This object provides interfaces to run a new Trial and access trials' history. OptunAPI is designed so that, when the first machine ask for a hyperparameters set, it starts a new study (create_study()) identified according to the HTTP request submitted. Any other machines referring to the same optimization session don't initialize a new study, but recover the previous one (load_study()) contributing to mapping the hyperparameters space.

A trial allows to prepare a particular set of hyperparameters and evaluate its capability of optimizing a objective function, not necessarily available in an explicit form as in the case of very complex Machine Learning algorithms. This object provides the following interfaces to get parameter suggestion:

With optional arguments of step and log, we can discretize or take the logarithm of integer and floating point parameters. The following code block is taken from the Optuna tutorial and shows a standard use of these features:

import optuna

def objective (trial):
    # Categorical parameter
    optimizer = trial.suggest_categorical ('optimizer', ['RMSprop', 'Adam'])

    # Integer parameter
    num_layers = trial.suggest_int ('num_layers', 1, 3)

    # Integer parameter (log)
    num_channels = trial.suggest_int ('num_channels', 32, 512, log = True)

    # Integer parameter (discretized)
    num_units = trial.suggest_int ('num_units', 10, 100, step = 5)

    # Floating point parameter
    dropout_rate = trial.suggest_float ('dropout_rate', 0.0, 1.0)

    # Floating point parameter (log)
    learning_rate = trial.suggest_float ('learning_rate', 1e-5, 1e-2, log = True)

    # Floating point parameter (discretized)
    drop_path_rate = trial.suggest_float ('drop_path_rate', 0.0, 1.0, step = 0.1)

OptunAPI uses these methods internally and requires only a configuration file correctly filled to run the studies.

Ask-and-Tell Interface

The Optuna's Ask-and-Tell interface provides a more flexible interface for hyperparameter optimization based on the two following methods:

OptunAPI uses these methods in two different moments. When a machine ask for a set of hyperparameters, that set belongs to a trial resulting from an ask instance. Then, once the objective function was evaluated with that particular set of hyperparameters, the machine sends a new request encoding the objective value allowing to close the corresponding trial with a tell instance.

HTTP Requests

OptunAPI provides a simple Python module to run a server able to centrally manage the optimization studies: optuna/optuna/server.py. It is equipped with a set of path operation functions relying on the FastAPI ecosystem:

  • ping_server
    • the path is /optunapi/ping
    • the operation is GET
    • the function allows to verify if the server is running
  • read_hparams
    • the path is /optuna/hparams/{model_name} (model_name is a path parameter)
    • the operation is GET
    • the function allows to start (or load) an Optuna study and send sets of hyperparameters
  • send_score
    • the path is /optuna/score/{model_name}?trail_id=TRIAL_ID&score=SCORE (with query parameters)
    • the operation is GET
    • the function allows to finish the trial identified by trial_id with the score value

Requirements

Python 3.6+

OptunAPI is based on two modern and highly performant frameworks:

  • Optuna for the optimization parts.
  • FastAPI for the HTTP requests parts.

Installation

OptunAPI is a public repository on GitHub.

$ git clone https://github.com/mbarbetti/optunapi.git

---> 100%

To run and use OptunAPI it's preferable to create a virtual environment with Python 3.6+ and install Optuna and FastAPI within it.

$ pip install optuna fastapi

---> 100%

Standing on the shoulder of FastAPI, OptunAPI needs an ASGI server to run the so-called Optuna-server, such as Uvicorn or Hypercorn.

$ pip install uvicorn[standard]

---> 100%

Example

Configuration file

The high-level functions provided by Optuna to suggest values for the hyperparameters are replaced with an appropriate configuration file in OptunAPI. Referring to the example reported in the Optuna tutorial, what follows is the corresponding YAML configuration file:

# Categorical parameter
optimizer:
  name    : optimizer
  type    : categorical
  choices : 
            - RMSprop
            - Adam

# Integer parameter
num_layers:
  name : num_layers
  type : int
  low  : 1
  high : 3

# Integer parameter (log)
num_channels:
  name : num_channels
  type : int
  low  : 32
  high : 52
  log  : True

# Integer parameter (discretized)
num_units:
  name : num_units
  type : int
  low  : 10
  high : 100
  step : 5

# Floating point parameter
dropout_rate:
  name : dropout_rate
  type : float
  low  : 0.0
  high : 1.0

# Floating point parameter (log)
learning_rate:
  name : learning_rale
  type : float
  low  : 1e-5
  high : 1e-2
  log  : True

# Floating point parameter (discretized)
drop_path_rate:
  name : drop_path_rate
  type : float
  low  : 0.0
  high : 1.0
  step : 0.1

Optuna-server

Prepared the configuration file for the optimization session and saved it into optunapi/optunapi/config, we are ready to run the Optuna-server.

$ uvicorn server:optunapi

INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [28720]
INFO:     Started server process [28722]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
What does the command uvicorn server:optunapi mean?

The command uvicorn server:optunapi refers to:

  • server: the file server.py (the Python "module") in optunapi/optunapi.
  • optunapi: the object created inside of server.py with the line optunapi = FastAPI().

Note that Uvicorn sets 127.0.0.1 and 8000 as default values for the server IP and port. To change the defaults it's enough launching the previous command with the arguments --host and --port followed by the chosen values.

Trainer-client

The optimization session is managed by an Optuna study, initialized with the first client HTTP request, or loaded and expanded by any other connecting machines. To refer to a particular optimization session a client has to encode the name of the corresponding configuration file within its HTTP request.

Consider the simple use-case provided by OptunAPI, where we want to find the minimum of a 2D-paraboloid: optunapi/tests/simple_client.py. Since the provided configuration file is named optuna-test.yaml, then the GET request submitted by the client to receive the hyperparameters set has to contain the string 'optuna-test':

import requests

HOST = 'http://127.0.0.1:8000'

read_hparams = requests.get (HOST + '/optunapi/hparams/optunapi-test')
hp_req = read_hparams.json()

TRIAL_ID = hp_req ['trial_id']
PARAMS   = hp_req [ 'params' ]

What happens behind the scenes is that the above HTTP request calls an ask instance to the Optuna study, stored in optunapi/optunapi/db once created and named optunapi-test.db. As already said, an ask instance is a trial equipped with a set of hyperparameters and the client can recover those values decoding the corresponding HTTP response. In the example above, hp_req is a dictionary containing, among others, the identifier number of the current trial (TRIAL_ID) and a dictionary for the hyperparameters values (PARAMS).

Having accessed to the hyperparameters values, we can perform whatever learning algorithm one prefers and evaluate the associated training score, that will be used as objective value to finish the trial instance. This is done with a new GET request referring to the same optimization session (again, 'optunapi-test' in the path) and passing TRIAL_ID and SCORE as query parameters:

import requests

HOST = 'http://127.0.0.1:8000'

send_score = requests.get (HOST + '/optunapi/score/optunapi-test?trial_id=TRIAL_ID&score=SCORE')
score_req  = send_score.json()

BEST_TRIAL_ID = score_req ['best_score_id']
BEST_PARAMS   = score_req [ 'best_params' ]

Each running client allows to refine the search for minima performed by the Optuna algorithms, focusing on smaller and smaller space portion and enhancing the mapping of the hyperparameters space.

Securing HTTP requests

OptunAPI is designed to be used within a VPN not directly opened to the public Internet. On the other hand, opening the Optuna-server to Internet allows to exploit easily a wide variety of computing resources, from on-premises machines to instances deriving from different cloud computing services (AWS, Azure, GCP, etc.). Such design raises a security issue since anyone can submit a request to the server or catch its response, opening the system to cyberattack.

A possible solution to this issue relies on the SSH protocol. The idea is to set up the Optuna-server as a private server (from the perspective of REMOTE SERVER) not directly visible from the outside (LOCAL CLIENT’s perspective). This configuration, schematically represented in the sketch below, allows a local client to still access the private server passing through the remote server authenticating with SSH credentials.

    ----------------------------------------------------------------------

                                |
    -------------+              |    +----------+               +---------
        LOCAL    |              |    |  REMOTE  |               | PRIVATE
        CLIENT   | <== SSH ========> |  SERVER  | <== local ==> | SERVER
    -------------+              |    +----------+               +---------
                                |
                             FIREWALL (only port 22 is open)

    ----------------------------------------------------------------------

OptunAPI provides a very simple implementation of this scheme: optunapi/tests/secured_client.py. It is based on sshtunnel and allows to submit a HTTP request to the private server after having specifying our SSH credentials (ssh_username, ssh_pkey).

import sshtunnel
import requests

with sshtunnel.open_tunnel (
  (REMOTE_SERVER_IP, 22),
  ssh_username = 'mbarbetti',
  ssh_pkey = '/home/mbarbetti/.ssh/id_rsa',
  remote_bind_address = (PRIVATE_SERVER_IP, PRIVATE_SERVER_PORT),
  local_bind_address  = ('127.0.0.1', 10022)
) as tunnel:
  ping_server = requests.get ('http://localhost:10022/optunapi/ping')
  ping_msg = ping_server.json()
  print (ping_msg)
How to run the server in this case?

In this configuration the Optuna-server acts as private server, then its IP and port are the ones declared within the with statement:

$ uvicorn server:optunapi --host PRIVATE_SERVER_IP --port PRIVATE_SERVER_PORT

License

This project is licensed under the terms of the MIT license.