id | title |
---|---|
design_philosophy |
Design Philosophy |
BoTorch adheres to the following main design tenets:
-
Modularity & Simplicity
- Make it easy for researchers to develop and implement new ideas by following
a modular design philosophy & making heavy use of auto-differentiation. Most
BoTorch components are
torch.nn.Module
instances, so that users familiar with PyTorch can easily implement new differentiable components. - Facilitate model-agnostic Bayesian Optimization by maintaining lightweight APIs and first-class support for Monte-Carlo-based acquisition functions.
- Make it easy for researchers to develop and implement new ideas by following
a modular design philosophy & making heavy use of auto-differentiation. Most
BoTorch components are
-
Performance & Scalability
- Achieve high levels of performance across different platforms with device-agnostic code by using highly parallelized batch operations.
- Expand the applicability of Bayesian Optimization to very large problems by harnessing scalable modeling frameworks such as GPyTorch.
Batching (as in batching data or batching computations) is a central component to all modern deep learning platforms and plays a critical role in the design of BoTorch. Examples of batch computations in BoTorch include:
- A batch of candidate points
$X$ to be evaluated in parallel on the black-box function we are trying optimize. In BoTorch, we refer to this kind of batch as a "q-batch". - A batch of q-batches to be evaluated in parallel on the surrogate model of the black-box function. These facilitate fast evaluation on modern hardware such as GPUs and multi-core CPUs with advanced instruction sets (e.g. AVX). In BoTorch, we refer to a batch of this type as "t-batch" (as in "torch-batch").
- A batched surrogate model, each batch of which models a different output (which is useful for multi-objective Bayesian Optimization). This kind of batching also aims to exploit modern hardware architecture.
Note that none of these notions of batch pertains to the batching of training data, which is commonly done in training Neural Network models (sometimes called "mini-batching"). BoTorch aims to be agnostic with regards to the particular model used - so while model fitting may indeed be performed via stochastic gradient descent using mini-batch training, BoTorch itself abstracts away from this.
For an in-depth look at the different batch notions in BoTorch, take a look at the Batching in BoTorch section.
While BoTorch tries to align as closely as possible with PyTorch when possible, optimization of acquisition functions requires a somewhat different approach. We now describe this discrepancy and explain in detail why we made this design decision.
In PyTorch, modules typically map (batches of) data to an output, where the mapping is parameterized by the parameters of the modules (often the weights of a Neural Network). Fitting the model means optimizing some loss (which is defined with respect to the underlying distribution of the data). As this distribution is unknown, one cannot directly evaluate this function. Instead, one considers the empirical loss function, i.e. the loss evaluated on all data available. In typical machine learning model training, a stochastic version of the empirical loss, obtained by "mini-batching" the data, is optimized using stochastic optimization algorithms.
In BoTorch, AcquisitionFunction
modules map an input design
For large Neural Network models, the number of optimization variables is very
high, and can be in the hundreds of thousands or even millions of parameters.
The resulting optimization problem is often solved using first-order
stochastic gradient descent algorithms (e.g. SGD and its many variants).
Many of these are implemented in the torch.optim
module. The typical way of
optimizing a model with these algorithms is by extracting the module's
parameters (e.g. using parameters()
), and writing a manual optimization loop
that calls step()
on a torch Optimizer
object.
Optimizing acquisition functions is different since the problem dimensionality
is often much smaller. Indeed, optimizing over torch.optim
package,
which is why BoTorch provides a custom interface that wraps the optimizers from
the scipy.optimize
module.