BayesianOptimization

Contents

Purpose:: The purpose of the driver.
Tutorials:: Tutorials demonstrating the application of this driver.
Driver Interface:: Driver-specific methods of the Python interface.
Configuration:: Configuration of the driver.

Purpose

The driver is based on the ActiveLearning driver and runs a standard Bayesian optimization (i.e. minimization) of an expensive scalar function.

Tutorials

Driver Interface

The driver instance can be obtained by Study.driver.

class jcmoptimizer.BayesianOptimization(host, study_id, session)[source]

property active_learning_configuration: dict[str, Any]

Return a configuration for the ActiveLearning driver that can be used to reproduce the behavior of the current driver. Example:

config = driver.active_learning_configuration
study_active_learning = client.create_study(
    driver="ActiveLearning",
    ...
)
study_active_learning.configure(**config)

property best_sample: dict[str, float | int | str]

Best sample with minimal objective value found during the minimization. Example:

for key, value in driver.best_sample.items():
   print(f"{key} = {value}")

describe()

Get description of all modules and their parameters that are used by the driver. Example:

description = driver.describe()
print(description["members"]["surrogates"]["0"])

Return type:: dict[str, Any]

Returns: A nested dictionary with description of submodules consisting: of a name and a descriptive text. If the entry describes a module, it has an additional "members" entry with dictionaries describing submodules and parameters.

get_minima(environment_value=None, num_initial_samples=None, num_output=10, epsilon=0.1, delta=0.1, ftol=1e-09, min_dist=0.0)

Get a list of information about local minima of the objective. The width \(\sigma\) in each parameter direction is determined by a fit of the minimum to a Gaussian function that goes asymptotically to the mean value of the object. The minima are found using predictions of the surrogate models. The validity of constraints is completely ignored. Example:

import pandas as pd
minima = driver.get_minima(num_output=10)
print(pd.DataFrame(minima))

Parameters:

environment_value (Optional[list[float]]) – Optional environment value for which local minima of design values are determined. If None, the local minima are also determined with respect to environment parameters.
num_initial_samples (Optional[int]) – Number of initial samples for searching (default: automatic determination).
num_output (int) – Maximum number of minima that are returned (Default: 10)
epsilon (float) – Parameter used for identifying identical minima (i.e. minima with distance < length scale * epsilon) and minima with non-vanishing gradient (e.g. minima at the boundary of the search space) (default: 0.1)
delta (float) – step size parameter used for approximating second derivatives. (default: 0.1)
ftol (float) – Precision goal for the minimum function value.
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

dict[str, list[float]]

Returns: A dictionary with information about local minima: with lists of object values, uncertainties of the objective values, the parameter values and the width \(\sigma\) in each parameter direction (i.e. standard deviation after a fit to a Gaussian function)

get_observed_values()

Get observed values. For noisy input data, the values are obtained on the basis of predictions of the surrogate models. Therefore, they can slightly differ from the input data. Example:

data = driver.get_observed_values()

Returns: Dictionary, with the following keys: :rtype: dict[str, Any]

samples:

The observed samples (design and possibly environment values).

means:

Mean values of observations. For noiseless observations, this is the observed value itself.

variance:

Variance of observed values. For noiseless observations, this is typically a negligibly small number.

get_state(path=None)

Get state of the driver. Example:

best_sample = driver.get_state(path="best_sample")

Parameters:: path (Optional[str]) – A dot-separated path to a submodule or parameter. If none, the full state is returned.
Return type:: dict[str, Any]

Returns: If path is None, a dictionary with information of driver state.

Note

A description of the meaning of each entry in the state can be retrieved by describe().

get_statistics(quantiles=None, rel_precision=0.001, abs_precision=1e-09, max_time=inf, max_samples=1000000.0, min_dist=0.0)[source]

Determines statistics like the mean and variance of the objective under a parameter distribution. By default the probability density of the parameters is a uniform distribution in the whole parameter domain. Other parameter distributions can be defined via study.configure(parameter_distribution = dict(…)).

Example:

study.configure(parameter_distribution = dict(
    distributions=[
        {type="normal", parameter="param1", mean=1.0, stddev=2.0},
        {type="uniform", parameter="param2", domain=[-1.0,1.0]}
    ]
))
stats = study.driver.get_statistics(abs_precision=0.001)

Parameters:

quantiles (Optional[list[float]]) – A list with quantiles. If not specified, the quantiles [0.16,0.5,0.84] are used.
abs_precision (float) – The Monte Carlo integration is stopped when the empiric absolute uncertainty of the mean value of all outputs is smaller than abs_precision.
rel_precision (float) – The Monte Carlo integration is stopped when the empiric relative uncertainty of the mean value of all outputs is smaller than rel_precision.
max_time (float) – The Monte Carlo integration is stopped when the time max_time has passed.
max_samples (float) – The Monte Carlo integration is stopped when the number of evaluated samples equals or exceeds the given value.
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

dict[str, Any]

Returns: A dictionary with the entries

mean:

Expectation values \(\mathbf{m}=\mathbb{E}[\mathbf{g}(\mathbf{x})]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

variance:

Variance \(\mathbf{v}=\mathbb{E}[(\mathbf{g}(\mathbf{x})-\mathbf{m})^2]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

quantiles:

A list of quantiles of shape (num_quantiles, num_outputs).

num_samples:

Number of sampling points \(N\) that were used in the Monte Carlo integration. The numerical uncertainty of the computed mean value is \(\sqrt{v/N}\).

historic_parameter_values(path)

Get the values of an internal parameter for each iteration of the study. Example:

min_objective_values = driver.historic_parameter_values(
    path="acquisition_function.min_objective")

Parameters:: path (str) – A dot-separated path to the parameter.
Return type:: list[Any]

Note

A description of the meaning of each parameter can be retrieved by describe().

property min_objective: float

Minimal objective value found during the minimization. Example:

min_objective = driver.min_objective

optimize_hyperparameters(num_samples=1, min_dist=0.0)

Optimize the hyperparameters of the driver. This is usually done automatically. Example:

driver.optimize_hyperparameters()

Parameters:

num_samples (int) – Number of initial start samples for optimization (default: automatic determination)
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

None

override_parameter(path, value)

Override an internal parameter of the driver that is otherwise selected automatically. Example:

driver.override_parameter(
    "surrogates.0.matrix.kernel.design_length_scales",
    [1.0, 2.0]
)

Parameters:

path (str) – A dot-separated path to the parameter to be overridden.
value (Any) – The new value of the parameter.

Return type:

None

Note

A description of the meaning of each parameter can be retrieved by describe().

predict(points, output_type='mean_var', min_dist=0.0, quantiles=None, num_samples=None)[source]

Make predictions on the objective value. Example:

prediction = driver.predict(points=[[1,0,0],[2,0,1]])

Parameters:

points (list[list[float]]) – Vectors of the space (design space + environment) of shape (num_points, num_dim)
output_type (Literal['mean_var', 'quantiles', 'samples']) –
The type of output. Options are:

mean_var:

Mean and variance of the posterior distribution. This describes the posterior distribution only for normally distributed posteriors. The function returns a dictionary with keywords "mean" and "variance" mapping to array of length num_points. If the posterior distribution is multivariate normal, for each point a covariance matrix of shape (output_dim, output_dim) is returned, otherwise a list of variances of shape (output_dim,) is returned.

quantiles:

A list of quantiles of the distribution is estimated based on samples drawn from the distribution. The function returns a dict with entry "quantiles" and a tensor of shape (num_quantiles, num_points, output_dim)

samples:

Random samples drawn from the posterior distribution. The function returns a dict with the entry "samples" and a tensor of shape (num_samples, num_points, output_dim)
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.
quantiles (Optional[list[float]]) – A list with quantiles. If not specified, the quantiles [0.16,0.5,0.84] are used.
num_samples (Optional[int]) – Number of samples used for posteriors that have a sampling-based distribution or if output_type is “samples”. If not specified, the same number as in the acquisition is used. If the posterior is described by a fixed number of ensemble points, the minimum of num_samples and the ensemble size is used.

Return type:

dict[str, list]

Returns: A dictionary with the entries “mean” and “variance” if: output_type = "mean_var and “samples” if output_type = "samples"

Configuration

The configuration parameters can be set by calling, e.g.

study.configure(example_parameter1 = [1,2,3], example_parameter2 = True)

max_iter (int)

Maximum number of evaluations of the studied system.

Default: Infinite number of evaluations.

max_time (float)

Maximum run time of study in seconds. The time is counted from the moment, that the parameter is set or reset.

Default: inf

num_parallel (int)

Number of parallel evaluations of the studied system.

Default: 1

scaling (float)

Scaling parameter of the model uncertainty. For scaling \(\gg 1.0\) (e.g. scaling=10.0) the search is more explorative. For scaling \(\ll 1.0\) (e.g. scaling=0.1) the search becomes more greedy (e.g. any local minimum is intensively exploited).

Default: 1.0

vary_scaling (bool)

If true, the scaling parameter is randomly varied between 0.1 and 10.

Default: True

parameter_distribution (dict)

Probability distribution of design and environment parameters.

Default: {'include_study_constraints': False, 'distributions': [], 'constraints': []}

Probability distribution of design and environment parameters defined by distribution functions and constraints. The definition of the parameter distribution can have several effects:

In a call to the method get_statistics of the driver interface the value of interest is averaged over samples drawn from the space distribution.

In a call to the method run_mcmc of the driver interface the space distribution acts as a prior distribution.

In a call to the method get_sobol_indices of the driver interface the space distribution acts as a weighting factor for determining expectation values.

In an ActiveLearning driver, one can access the value of the log-probability density (up to an additive constant) by the name 'log_prob' in any expression, e.g. in Expression variable, Linear combination variable.

See parameter_distribution configuration for details.

detect_noise (bool)

If true, the noise of the function values and function value derivatives are estimated by means of two hyperparameters.

Default: False

warping_function (str)

The name of the warping function \(w\). A warping function performs a transformation \(y \to w(y,\mathbf{b})\) of any function value \(y = f(x)\) by means of a strongly monotonic increasing function \(w(y,\mathbf{b})\) that depends on a set of hyperparameters \(\mathbf{b}\). The hyperparameters are chosen automatically by a maximum likelihood estimate. The choice identity leads to no warping of the function values while sinh uses a sinus hyperbolicus function for warping. Using sinh can result in better predictions at the cost of increasing the computational effort. It should be therefore only applied for more expensive black box functions.

Default: 'identity' Choices: 'identity', 'sinh'.

min_val (float)

The minimization of the objective is stopped when the observed objective value is below the specified minimum value.

Default: -inf

min_PoI (float)

The study is stopped if the maximum probability of improvement (PoI) of the last 5 iterations is below min_PoI.

Default: 1e-16

min_acq_val (float)

The study is stopped if the maximum acquisition value (usually expected improvement) of the last 5 iterations is below min_acq_val.

Default: -inf

strategy (str)

Minimization strategy of the acquisition. The choices are expected improvement (EI), lower confidence bound (LCB), and probability of improvement (PoI).

Default: 'EI' Choices: 'EI', 'LCB', 'PoI'.

localize (bool)

If true, a local search is performed, i.e. samples are not drawn in regions with large uncertainty.

Default: False

num_training_samples (int)

Number of pseudo-random initial samples before the samples are drawn according to the acquisition function.

Default: Automatic choice depending depending on dimensionality of design space.