id | title |
---|---|
batching |
Batching |
BoTorch makes frequent use of "batching", both in the sense of batch acquisition functions for multiple candidates as well as in the sense of parallel or batch computation (neither of these should be confused with mini-batch training). Here we explain some of the common patterns you will see in BoTorch for exploiting parallelism, including common shapes and decorators for more conveniently handling these shapes.
BoTorch supports batch acquisition functions that assign a joint utility to a
set of
As discussed in the design philosophy, BoTorch has adopted the convention of referring to batches in the batch-acquisition sense as "q-batches", and to batches in the torch batch-evaluation sense as "t-batches".
Internally, q-batch acquisition functions operate on input tensors of shape
In order to simplify the user-facing API for evaluating acquisition functions,
BoTorch implements the
@t_batch_mode_transform
decorator, which allows the use of non-batch mode inputs. If applied to an
instance method with a single Tensor
argument, an input tensor to that method
without a t-batch dimension (i.e. tensors of shape batch_shape
torch.Size([1])
),
This is typically used on the forward
method of an AcquisitionFunction
.
The @t_batch_mode_transform
decorator takes an expected_q
argument that, if
specified, checks that the number of q-batches in the input is equal to the
one specified in the decorator. This is typically used for acquisition functions
that can only handle the case AnalyticAcqusitionFunction
).
BoTorch evaluates Monte-Carlo acquisition functions using (quasi-) Monte-Carlo
sampling from the posterior at the input features sample_shape
and event_shape
.
event_shape
is the shape of a single sample drawn from the underlying
distribution:
- Evaluating a single-output model at a
$1 \times n \times d$ tensor, representing$n$ data points in$d$ dimensions each, yields a posterior withevent_shape
being$1 \times n \times 1$ . Evaluating the same model at a$b_1 \times \cdots \times b_k \times n \times d$ tensor (representing a t-batch-shape of$b_1 \times \cdots \times b_k$ , with$n$ data points of$d$ -dimensions each in every batch) yields a posterior withevent_shape
being$b_1 \times \cdots \times b_k \times n \times 1$ . In most cases, the t-batch-shape will be single-dimensional (i.e.,$k=1$ ). - Evaluating a multi-output model with
$o$ outputs at a $b_1 \times \cdots \times b_k
\times n \times d$ tensor yields a posterior withevent_shape
equal to$b_1 \times \cdots \times b_k \times n \times o$ . - Recall from the previous section that internally, with the help of the
@t_batch_mode_transform
decorator, all acquisition functions are evaluated using at least one t-batch dimension.
sample_shape
is the shape (possibly multi-dimensional) of the samples drawn
independently from the distribution with event_shape
, resulting in a tensor
of samples of shape sample_shape
+ event_shape
:
- Drawing a sample with
sample_shape
being$s_1 \times s_2$ from a posterior withevent_shape
equal to$b_1 \times \cdots \times b_k \times n \times o$ results in a tensor of shape$s_1 \times s_2 \times b_1 \times \cdots \times b_k \times n \times o$ , where each of the$s_1 \times s_2$ tensors of shape$b_1 \times \cdots \times b_k \times n \times o$ are independent draws.
The GPyTorch models implemented in BoTorch support t-batched evaluation with arbitrary t-batch shapes.
In the simplest case, a model is fit to non-batched training points with shape
-
Non-batched evaluation on a set of test points with shape
$m \times d$ yields a joint posterior over the$m$ points. -
Batched evaluation on a set of test points with shape
$\textit{batch_shape} \times m \times d$ yields$\textit{batch_shape}$ joint posteriors over the$m$ points in each respective batch.
GPyTorch models can also be
fit on batched training points
with shape
-
Non-batched evaluation on test points with shape
$\textit{batch_shape'} \times m \times d$ , where each dimension of$\textit{batch_shape'}$ either matches the corresponding dimension of$\textit{batch_shape}$ or is 1 to support broadcasting, yields$\textit{batch_shape}$ joint posteriors over the$m$ points. -
Batched evaluation on test points with shape
$\textit{new_batch_shape} \times \textit{batch_shape'} \times m \times d$ , where$\textit{new_batch_shape}$ is the batch shape for batched evaluation, yields$\textit{new_batch_shape} \times \textit{batch_shape'}$ joint posteriors over the$m$ points in each respective batch (broadcasting as necessary over$\textit{batch_shape}$ )
The BatchedMultiOutputGPyTorchModel
class implements a fast multi-output model (assuming conditional independence of
the outputs given the input) by batching over the outputs.
Given training inputs with shape BatchedMultiOutputGPyTorchModel
permutes the training outputs to make the
output
When evaluating test points with shape
posterior
method, the test points are broadcasted to the model(s) for
each output. This results in the batched posterior where the mean has shape
BoTorch uses random restarts to optimize an acquisition function from multiple
starting points. To efficiently optimize an acquisition function for a
See the Using batch evaluation for fast cross validation tutorial for details on using batching for fast cross validation.