ActiveLearning

Contents

Purpose:: The purpose of the driver.
Tutorials:: Tutorials demonstrating the application of this driver.
Driver Interface:: Driver-specific methods of the Python interface.
Configuration:: Configuration of the driver.

Purpose

The driver defines a general active learning process that can be configured for many different purposes such as optimization, integration or learning of the behavior of one or more expensive black box function.

Tip

The driver is very versatile but also requires a detailed configuration. For some use cases there exist specialized drivers that are based on this driver. For example, for the Bayesian optimization of a scalar function the BayesianOptimization driver is easier to set up.

Each black box function \(\mathbf{f}_{\rm bb}(\mathbf{p}_{\rm design}, \mathbf{p}_{\rm env})\) can depend on a vector of design parameters \(\mathbf{p}_{\rm design}\in\mathcal{X}\) and possibly environment parameters \(\mathbf{p}_{\rm env}\in\mathcal{E}\). The design space \(\mathcal{X}\) and environment \(\mathcal{E}\) can be configured when creating the study (see create_study()).

Tip

The output of the black-box functions can be a also vectorial. For example, the black box function \(\mathbf{f}_{\rm bb}: \mathbf{p}_{\rm design} \mapsto (a,b,c)\in\mathbb{R}^3\) could map a design value to three output values. The goal of the study could be to find design parameters that minimize \(a^2 - b^2\) and fulfill the constraint \(c \leq 1\).

The most important configurations of the driver concerns three parameters:

Surrogates:: The dependence of the output values on the parameters are learned via one or more surrogate models (see ActiveLearning.surrogates parameter).
Variables:: The target values of the study are defined by variables that are based on the outputs of surrogate models or other variables (see ActiveLearning.variables parameter).
Objectives:: Finally, one defines objectives of the active learning such as a minimizing a variable or constraining a variable in a specified interval (see ActiveLearning.objectives parameter).

Tutorials

Driver Interface

The driver instance can be obtained by Study.driver.

class jcmoptimizer.ActiveLearning(host, study_id, session)[source]

describe()

Get description of all modules and their parameters that are used by the driver. Example:

description = driver.describe()
print(description["members"]["surrogates"]["0"])

Return type:: dict[str, Any]

Returns: A nested dictionary with description of submodules consisting: of a name and a descriptive text. If the entry describes a module, it has an additional "members" entry with dictionaries describing submodules and parameters. If the entry describes a parameter, it has an additional entry "settable", which is true if the parameter can be overridden by the client (see override_parameter()).

get_minima(object_type='objective', name='objective', environment_value=None, num_initial_samples=None, num_output=10, epsilon=0.1, delta=0.1, ftol=1e-09, min_dist=0.0)

Get a list of information about local minima of a single-output surrogate, variable or objective. The width \(\sigma\) in each parameter direction is determined by a fit of the minimum to a Gaussian function that goes asymptotically to the mean value of the object. The minima are found using predictions of the surrogate models. The validity of constraints is completely ignored. Example:

import pandas as pd
minima = driver.get_minima(num_output=10)
print(pd.DataFrame(minima))

Parameters:

object_type (Literal['surrogate', 'variable', 'objective']) – The type of object for which a prediction is required.
name (str) – The name of the object for which predictions are required.
environment_value (Optional[list[float]]) – Optional environment value for which local minima of design values are determined. If None, the local minima are also determined with respect to environemtn parameters.
num_initial_samples (Optional[int]) – Number of initial samples for searching (default: automatic determination).
num_output (int) – Maximum number of minima that are returned.
epsilon (float) – Parameter used for identifying identical minima (i.e. minima with distance < length scale * epsilon) and minima with non-vanishing gradient (e.g. minima at the boundary of the search space)
ftol (float) – Precision goal for the minimum function value.
delta (float) – step size parameter used for approximating second derivatives.
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

dict[str, list[float]]

Returns: A dictionary with information about local minima: with lists of object values, uncertainties of the objective values, the parameter values and the width \(\sigma\) in each parameter direction (i.e. standard deviation after a fit to a Gaussian function)

get_observed_values(object_type='objective', name='objective', num_samples=None)

Get observed values and variances of variables or objectives. For noisy input data, the values are obtained on the basis of predictions of the surrogate models. Therefore, they can slightly differ from the input data. Example:

data = driver.get_observed_values("variable", "loss")

Parameters:

object_type (Literal['variable', 'objective']) – The type of object for which a prediction is required.
name (str) – The name of the object for which predictions are required.
num_samples (Optional[int]) – Number of samples used for posteriors that have a sampling-based distribution. If not specified, the same number as in the acquisition is used. If the posterior is described by a fixed number of ensemble points, the minimum of num_samples and the ensemble size is used.

Return type:

dict[str, Any]

Returns: Dictionary, with the following keys:

samples:

The observed samples (design and possibly environment values).

means:

Mean values of observations. For noiseless observations, this is the observed value itself.

variance:

Variance of observed values. For noiseless observations, this is typically a negligibly small number.

get_sobol_indices(object_type='objective', name='objective', max_uncertainty=0.001, max_time=inf, max_samples=1000000.0, min_dist=0.0)

Determines Sobol’ indices of a surrogate, variable or objective under a parameter distribution. By default the probability density of the parameters is a uniform distribution in the whole parameter domain. Other parameter distributions can be defined via study.configure(parameter_distribution = dict(...)).

Example:

study.configure(parameter_distribution = dict(
    distributions=[
        {type="normal", parameter="param1", mean=1.0, stddev=2.0},
        {type="uniform", parameter="param2", domain=[-1.0,1.0]}
    ]
))
sobol_indices = study.driver.get_sobol_indices(max_uncertainty=0.001)

Parameters:

object_type (Literal['surrogate', 'variable', 'objective']) – The type of object for which a prediction is required.
name (str) – The name of the object for which predictions are required.
max_uncertainty (float) – The uncertainty of the first-order Sobol’ indices
max_time (float) – The Monte Carlo integration is stopped when the time max_time has passed.
max_samples (float) – The Monte Carlo integration is stopped when the number of evaluated samples equals or exceeds the given value.
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

dict[str, Any]

Returns: A dictionary with the entries

first_oder:

First-order (or main-effect) Sobol’ indices of shape (n, d), where n is the number of parameters and d is the number of outputs of the object. The \(i\)-th first-order Sobol’ index of a parameter \(x_i\) is the contribution to the output variance stemming from the effect of varying \(x_i\) alone, but averaged over variations in other input parameters.

total_oder:

Total-order (or total effect) Sobol’ indices of shape (n, d), where n is the number of parameters and d is the number of outputs of the object. The \(i\)-th total-order Sobol’ index of a parameter \(x_i\) measures the contribution to the output variance of \(x_i\), including all variance caused by its interactions with any other parameter.

mean:

Expectation values \(\mathbf{m}=\mathbb{E}[\mathbf{g}(\mathbf{x})]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

variance:

Variance \(\mathbf{v}=\mathbb{E}[(\mathbf{g}(\mathbf{x})-\mathbf{m})^2]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

num_samples:

Number of sampling points \(N\) that were used in the Monte Carlo integration. The numerical uncertainty of the computed mean value is \(\sqrt{v/N}\).

Note

For information on Sobol’ indices and variance-based sensitivity analysis see https://en.wikipedia.org/wiki/Variance-based_sensitivity_analysis

get_state(path=None)

Get state of the driver. Example:

best_sample = driver.get_state(path="best_sample")

Parameters:: path (Optional[str]) – A dot-separated path to a submodule or parameter. If none, the full state is returned.
Return type:: dict[str, Any]

Returns: If path is None, a dictionary with information of driver state.

Note

A description of the meaning of each entry in the state can be retrieved by describe().

get_statistics(object_type='objective', name='objective', quantiles=None, rel_precision=0.001, abs_precision=1e-09, max_time=inf, max_samples=1000000.0, min_dist=0.0)

Determines statistics like the mean and variance of a surrogate, variable or objective under a parameter distribution. By default the probability density of the parameters is a uniform distribution in the whole parameter domain. Other parameter distributions can be defined via study.configure(parameter_distribution = dict(...)).

Example:

study.configure(parameter_distribution = dict(
    distributions=[
        {type="normal", parameter="param1", mean=1.0, stddev=2.0},
        {type="uniform", parameter="param2", domain=[-1.0,1.0]}
    ]
))
stats = study.driver.get_statistics(abs_precision=0.001)

Parameters:

object_type (Literal['surrogate', 'variable', 'objective']) – The type of object for which a prediction is required.
name (str) – The name of the object for which predictions are required.
quantiles (Optional[list[float]]) – A list with quantiles. If not specified, the quantiles [0.16,0.5,0.84] are used.
rel_precision (float) – The Monte Carlo integration is stopped when the empiric relative uncertainty of the mean value of all outputs is smaller than rel_precision.
abs_precision (float) – The Monte Carlo integration is stopped when the empiric absolute uncertainty of the mean value of all outputs is smaller than abs_precision.
max_time (float) – The Monte Carlo integration is stopped when the time max_time has passed.
max_samples (float) – The Monte Carlo integration is stopped when the number of evaluated samples equals or exceeds the given value.
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

dict[str, Any]

Returns: A dictionary with the entries

mean:

Expectation values \(\mathbf{m}=\mathbb{E}[\mathbf{g}(\mathbf{x})]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

variance:

Variance \(\mathbf{v}=\mathbb{E}[(\mathbf{g}(\mathbf{x})-\mathbf{m})^2]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

quantiles:

A list of quantiles of shape (num_quantiles, num_outputs).

num_samples:

Number of sampling points \(N\) that were used in the Monte Carlo integration. The numerical uncertainty of the computed mean value is \(\sqrt{v/N}\).

historic_parameter_values(path)

Get the values of an internal parameter for each iteration of the study. Example:

min_objective_values = driver.historic_parameter_values(
    path="acquisition_function.min_objective")

Parameters:: path (str) – A dot-separated path to the parameter.
Return type:: list[Any]

Note

A description of the meaning of each parameter can be retrieved by describe().

optimize_hyperparameters(path=None, num_samples=1, min_dist=0.0)

Optimize the hyperparameters of submodules of the driver. This is usually done automatically. Example:

driver.optimize_hyperparameters("driver.surrogates.0")

Parameters:

path (Optional[str]) – A dot-separated path to a submodule. If no path is specified, all submodules are optimized.
num_samples (int) – Number of initial start samples for optimization (default: automatic determination)
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

None

override_parameter(path, value)

Override an internal parameter of the driver that is otherwise selected automatically. Example:

driver.override_parameter(
    "surrogates.0.matrix.kernel.design_length_scales",
    [1.0, 2.0]
)

Parameters:

path (str) – A dot-separated path to the parameter to be overridden.
value (Any) – The new value of the parameter.

Return type:

None

Note

A description of the meaning of each parameter can be retrieved by describe().

predict(points, object_type='objective', name='objective', output_type='mean_var', min_dist=0.0, quantiles=None, num_samples=None)

Make predictions on various objects. Example:

prediction = driver.predict(points=[[1,0,0],[2,0,1]])

Parameters:

points (list[list[float]]) – Vectors of the space (design space + environment) of shape (num_points, num_dim)
object_type (Literal['surrogate', 'variable', 'objective']) – The type of object for which a prediction is required.
name (str) – The name of the object for which predictions are required.
output_type (Literal['mean_var', 'quantiles', 'samples']) –
The type of output. Options are:

mean_var:

Mean and variance of the posterior distribution. This describes the posterior distribution only for normally distributed posteriors. The function returns a dictionary with keywords "mean" and "variance" mapping to array of length num_points. If the posterior distribution is multivariate normal, for each point a covariance matrix of shape (output_dim, output_dim) is returned, otherwise a list of variances of shape (output_dim,) is returned.

quantiles:

A list of quantiles of the distribution is estimated based on samples drawn from the distribution. The function returns a dict with entry "quantiles" and a tensor of shape (num_quantiles, num_points, output_dim)

samples:

Random samples drawn from the posterior distribution. The function returns a dict with the entry "samples" and a tensor of shape (num_samples, num_points, output_dim)
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.
quantiles (Optional[list[float]]) – A list with quantiles. If not specified, the quantiles [0.16,0.5,0.84] are used.
num_samples (Optional[int]) – Number of samples used for posteriors that have a sampling-based distribution or if output_type is “samples”. If not specified, the same number as in the acquisition is used. If the posterior is described by a fixed number of ensemble points, the minimum of num_samples and the ensemble size is used.

Return type:

dict[str, list]

Returns: A dictionary with the entries “mean” and “variance” if: output_type = "mean_var" and “samples” if output_type = "samples"

run_mcmc(name='chisq', num_walkers=None, max_iter=50000, max_time=inf, rel_error=0.05, thin_chains=False, multi_modal=False, append=False, max_sigma_dist=inf, min_dist=0.0)

Runs a Markov Chain Monte Carlo (MCMC) sampling of a chi-squared or negative log-probability variable. The output value is interpreted as a likelihood function. By default the prior probability density of the parameters is a uniform distribution in the whole parameter domain. Other parameter distributions can be defined via study.configure(parameter_distribution = dict(...)). Example:

study.run()
study.configure(parameter_distribution = dict(
    distributions=[
        {type="normal", parameter="param1", mean=1.0, stddev=2.0},
        {type="uniform", parameter="param2", domain=[-1.0,1.0]}
    ]
))
samples = study.driver.run_mcmc()

The estimated error of a Monte-Carlo integration of some function \(f\) is \(\delta = \sigma / \sqrt{N_{\rm ind}}\) where \(\sigma^2\) is the variance of \(f\) and \(N_{\rm ind}\) is the number of independent samples from the probability distribution. The error relative to the variance \(\sigma^2\) is therefore \(\delta_{\rm rel}=1/\sqrt{N_{\rm ind}}\). Assuming, that subsequent samples of a chain are correlated and this correlation vanishes at a correlation time \(\tau\), \(N_{\rm ind} = N/\tau\) of all \(N\) MCMC samples are independent. Note, that \(\tau\) can only be estimated and thus the relative Monte-Carlo integration error \(\delta_{\rm rel}=\tau/\sqrt{N}\) can be under or overestimated.

Parameters:

name (str) – The name of the chi-squared or negative log-probability variable that defines the probability density.
num_walkers (Optional[int]) – Number of walkers. If not specified, the value is automatically chosen.
max_iter (int) – Maximum absolute chain length.
max_time (float) – Maximum run time in seconds. If not specified, the runtime is not limited.
rel_error (float) – Targeted relative Monte-Carlo integration error \(\delta_{\rm rel}\) of the samples.
thin_chains (bool) – If true, only every \(\tau\)-th sample of all MCMC samples is returned. This is helpful if the full number of samples gets too large and a representative uncorrelated subset is required.
multi_modal (bool) – If true, a more explorative sampling strategy is used.
append (bool) – If true, the samples are appended to the samples of the previous MCMC run.
max_sigma_dist (float) – If set, the sampling is restricted to a a distance max_sigma_dist * sigma to the maximum likelihood estimate. E.g. max_sigma_dist=3.0 means that only the 99.7% probability region of each parameter is sampled.
min_dist (float) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected.

Return type:

dict[str, Any]

Returns: A dictionary with the following entries:

samples:

The drawn samples without “burn-in” samples thinned by half of the correlation time.

medians:

The medians of all random parameters

lower_uncertainties:

The distances between the medians and the 16% quantile of all random parameters

upper_uncertainties:

The distance between the medians and the 84% quantile of all random parameters

tau:

Estimated correlation time of each parameter.

Configuration

The configuration parameters can be set by calling, e.g.

study.configure(example_parameter1 = [1,2,3], example_parameter2 = True)

max_iter (int)

Maximum number of evaluations of the studied system.

Default: Infinite number of evaluations.

max_time (float)

Maximum run time of study in seconds. The time is counted from the moment, that the parameter is set or reset.

Default: inf

num_parallel (int)

Number of parallel evaluations of the studied system.

Default: 1

transformations (list[dict])

A list of transformations of the parameter space. A surrogate can learn the mapping from the original or transformed parameter space to observation data for the model.

Default: []
Example

A transformation that transformes the absolute parameters \(x_1, x_2\) to relative parameters \(r = x_2-x_1, R=0.5(x_1 + x_2)\).
[{'type': 'general', 'name': 'trans1', 'parameters': [{'type': 'expression', 'name': 'r', 'expression': 'x2 - x1', 'bounds': [-2,2]},
 {'type': 'expression', 'name': 'R', 'expression': '0.5*(x1 + x2)', 'bounds': [-2,2]}]}]
Each element of the list must be a dict. In the following, the list elements are described:

An input transformation allows to transform a parameter space input another parameter space. The transformation is defined by a list of parameters spanning the new parameter space.

See trans configuration for details.

surrogates (list[dict])

A list of surrogate models used to learn the behavior of the black box function.

Default: [{'type': 'GP', 'optimization_step_max': 300, 'min_optimization_interval': 2, 'max_optimization_interval': 20, 'optimization_level': 0.2, 'min_dist': 0.0, 'name': 'loss', 'output_dim': 1, 'detect_noise': False, 'mean_function': 'constant', 'warping_function': 'identity', 'joint_hyperparameters': True, 'correlate_outputs': False, 'tolerance': 1e-20, 'tolerance_hypercov': 1e-08, 'covariance_matrix': {'resize_step': 1000, 'iterative_start_size': 500, 'cuda_start_size': 2000, 'num_iterative_solver_steps': 0, 'online_start_size': 500, 'max_data_hyper_derivs': 10000}, 'num_fantasies': 16, 'kernel': {'min_ls_blocks_per_dim': 0.25, 'max_ls_blocks': 1000000.0, 'target_ls_blocks': 100, 'joint32ls_prior': True, 'matern_order': 5}, 'outlier_detector': None}]
Example

Two Gaussian process surrogates that learn the dependence of all together 4 blackbox outputs on the design parameters.
study.configure(surrogates=[
   dict(type="GP", output_dim="3", name="intensities")
   dict(type="GP", output_dim="1", name="loss")
])
#add some data to "intensities" model
obs = study.new_observation()
obs.add([1.0,2.0,3.0], model_name="intensities")
obs.add([6.0], model_name="loss")
study.add_observation(obs, design_value=[7.4, 3.1])
Each element of the list must be a dict. The dict entry type specifies the type of the element. The remaining entries specify its properties. In the following, all possible list element types are described:

Gaussian process (type 'GP'): A Gaussian process is a surrogate model that can predict a Gaussian distribution of model values based on the training data of the model \(f(x_1), f(x_2), \dots\) for sampled points \(x_1, x_2, \dots \in \mathbb{R}^D\).

In the simplest case, the prediction of a noiseless, scalar function value \(f(x^*)\) is given a normally distributed random variable \(y\sim \mathcal{N}( \overline{y}(x^*),\sigma^2(x^*))\) with mean and variance

\[ \begin{align}\begin{aligned}\overline{y}(x^*) = \mu(x^*) + \sum_{ij} k(x^*,x_i) \mathbf{\Sigma}_{i j}^{-1}[f(x_j)-\mu(x_j)]\\\sigma^2(x^*) = k(x^*,x^*) - \sum_{ij} k(x^*,x_i) \mathbf{\Sigma}_{i j}^{-1} k(x_j,x^*),\end{aligned}\end{align} \]

where \(\mathbf{\Sigma}\) is the covariance matrix with entries \(\mathbf{\Sigma}_{ij} = k(x_i, x_j)\).

The functional behavior of the mean \(\mu(x)\) and the covariance function (also called kernel) \(k(x, x')\) depend on a set of hyperparameters \(\mathbf{h}\) whose values are chosen to maximize the likelihood of the observed function values. Depending on the configuration, the Gaussian process can depend on more hyperparameters, that are optimized in the same manner.

See GP configuration for details.

Neural Network Ensemble (type 'NN'): Ensembles of fully connected deep neural networks with an arbitrary number of hidden layers and neurons, different activation functions and weight initialization schemes are used for estimation of the uncertainty of the model. In this way we are able to approximate Bayesian posterior and use it in the context of active learning as well as effectively reduce the chances of overfitting by averaging over the predictions of the ensemble members. Deep ensemble surrogates are expected to perform better than Gaussian processes in higher-dimensional parameter spaces due to the large number of degrees of freedom associated with parametric models. The training time of the deep ensemble surrogate scales linearly with the number of observations, in contrast to quadratic or cubic complexity characteristic to Gaussian processes, which renders them more suitable for large datasets.

See NN configuration for details.

variables (list[dict])

A list of variables that can depend on the outputs of surrogates or other variables that are further up in the list.

Default: [{'type': 'SingleSelector', 'name': 'v', 'input': 'loss', 'select_by_name': 'loss'}]
Example

An expression based on the outputs variable1, variable2.
[{'type': 'Expression', 'expression': 'variable1*sin(2*PI*variable2)^2'}]
Each element of the list must be a dict. The dict entry type specifies the type of the element. The remaining entries specify its properties. In the following, all possible list element types are described:

Single-selector variable (type 'SingleSelector'):

This variable selects one input name of a surrogate model or a multi-output variable.

See SingleSelector configuration for details.

Multi-selector variable (type 'MultiSelector'):

This variable selects multiple outputs of a surrogate model or a multi-output variable.

See MultiSelector configuration for details.

Collector variable (type 'Collector'):

This variable collects outputs of multiple surrogate models or variables.

See Collector configuration for details.

Linear combination variable (type 'LinearCombination'):

This variable describes a linear combination \(c_0 + c_1 \cdot y_1 + \cdots + c_N \cdot y_N\) of \(N\) inputs \(y_1, y_2, \cdots, y_N\) stemming either from a surrogate with \(N\) inputs or from any set of \(N\) variables.

See LinearCombination configuration for details.

Chi-squared variable (type 'ChiSquaredValue'):

This variable describes the chi-squared deviation

\[\chi^2 = \sum_{i=1}^K \frac{\left(t_i - y_i\right)^2}{\eta_i^2}.\]

between K outputs \(y_i\) of a surrogate and K targets \(t_i\) scaled by the target uncertainties \(\eta_i\).

If the target uncertainties are not independent but correlated, it is also possible to specify the covariance matrix \(G\) of the targets. In this case, the chi-squared deviation is given as

\[\chi^2 = \sum_{i,j=1}^K (t_i - y_i) G_{i j}^{-1} (t_j - y_j).\]

If the predictions of the surrogate, \(y_i\), follow a Gaussian distribution, the variable follows a generalized chi-squared distribution.

See ChiSquaredValue configuration for details.

Negative log-probability variable (type 'NegLogProbability'): This variable describes the negative log-probability of model parameters \(\mathbf{p}\) given a set of \(K\) measurement targets \(t_i\), \(i=1,\ldots,K\).

We assume that the measurement process can be accurately modeled by a function \(\mathbf{f}(\mathbf{p}) \in \mathbb{R}^K\). That is, the vector of measurements \(\mathbf{t}\) is a random vector with

\[\mathbf{t} = \mathbf{f}(\mathbf{p}) + \mathbf{w}\]

with \(\mathbf{w} \sim \mathcal{N}(0, \mathbf{G})\), where \(\mathbf{G}\) is a covariance matrix of the measurement errors. In most cases the covariance matrix is diagonal (i.e. the measurement errors are uncorrelated) with diagonal entries \(G_{ii} = \eta_i^2\).

Sometimes, the measurement noise is known. However, generally one has to find a parameterized error model \(\eta_i(\mathbf{d})\) for the variance itself. A common choice is to assume that the error is composed of a background term \(b\) and a noise contribution which scales linearly with \(f_i(\mathbf{p})\):

\[\eta_i^2(a,b,\mathbf{p}) = b^2 + \left[a f_i(\mathbf{p})\right]^2\]

Since every entry of the measurement vector follows a normal distribution \(t_i \sim \mathcal{N}(f_i(\mathbf{p}),\eta_i(\mathbf{d}))\) the joint likelihood of measuring the vector \(\mathbf{t}\) is given as

\[P(\mathbf{t} | \mathbf{p}, \mathbf{d}) = \prod_{i=1}^K \frac{1}{\sqrt{2\pi}\eta_i(\mathbf{d})}\exp\left[-\frac{1}{2}\left(\frac{t_i - f_i(\mathbf{p})}{\eta_i(\mathbf{d})}\right)^2\right].\]

Sometimes, non-uniform prior distributions for the design parameter vector \(P_\text{prior}(\mathbf{p})\) and the error model parameters \(P_\text{prior}(\mathbf{d})\) are available. The posterior distribution is then proportional to

\[P(\mathbf{p}, \mathbf{d} | \mathbf{t}) \propto P(\mathbf{t} | \mathbf{p}, \mathbf{d}) P_\text{prior}(\mathbf{p}) P_\text{prior}(\mathbf{d})\]

Warning

If a parameter distribution for the design space is defined, make sure that it has a non-vanishing probability distribution within the boundaries of the design space. Otherwise the negative log-probability can be infinite. In this case the sample computation is numerically unstable.

Alltogether, the target of finding the parameters with maximum posterior probability density is equivalent of minimizing the value of the negative log-likelihood

\[\begin{split}\begin{split} -\log\left(P(\mathbf{p}, \mathbf{d}| \mathbf{t})\right) = & \frac{1}{2} K\log(2\pi) +\sum_{i=1}^K\log\left(\eta_i(\mathbf{d})\right) +\frac{1}{2}\sum_{i=1}^K \left( \frac{t_i - f_i(\mathbf{p})}{\eta_i(\mathbf{d})} \right)^2 \\ &-\log\left(P_\text{prior}(\mathbf{d})\right) -\log\left(P_\text{prior}(\mathbf{p})\right). \end{split}.\end{split}\]

See NegLogProbability configuration for details.

Expression variable (type 'Expression'):

This variable describes a general mathematical expression of known parameters and predicted variables stemming either from surrogates or from any previously defined variable. This variable type is the most flexible. However, its distribution is not determined analytically, but only through sampling from the posterior distribution(s) of the input variables and evaluating the expression of each of the samples. For linear or quadratic expressions it is generally advisable to use the more specialized variables instead.

See Expression configuration for details.

Interpolation variable (type 'Interpolation'):

This variable up-samples the output of a surrogate model by interpolating between the output values. An interpolation can reduce the cost of obtaining observations at high resolution.

See Interpolation configuration for details.

Scan variable (type 'Scan'):

This variable scans the output of a surrogate model over design or environment parameters. A typical use case is to determine the system behavior based on an interpolation between data points for different environment parameter values. An interpolation can reduce the cost of obtaining observations at small steps between environment parameters.

See Scan configuration for details.

Least square fit variable (type 'Fit'):

This variable fits a parameter vector \(\mathbf{p}\) consisting of \(M\) parameters \(p_1, p_2,\dots, p_M\) to a model function \(f(\mathbf{x}, \mathbf{p})\) that depends on the model variables \(x_1, x_2,\dots, x_D\). The output of the variable is a least-square estimate of the fit parameters to data points \((\mathbf{x}_i, y_i), i=1,\dots,N\), where \(y_1,\dots,y_n\) are the mean values of the input. That is, the variable locally minimizes

\[\chi^2(\mathbf{p}) = \sum_{i=1}^N \frac{\left(f\left(\mathbf{x}_i, \mathbf{p}\right) - y_i\right)^2}{\eta_i^2},\]

where \(\eta_i^2\) is the variance of input \(i\) to the variable.

See Fit configuration for details.

objectives (list[dict])

A list of objectives that define which system parameters are optimal.

Default: [{'type': 'Minimizer', 'name': 'objective', 'strategy': 'EI', 'localize': False, 'min_val': -inf, 'min_PoI': 1e-16, 'min_uncertainty': 0.0, 'min_acq_val': -inf}]
Example

A minimization objective together with an outcome constraint.
[{'type': 'Minimizer', 'variable': 'variable1', 'strategy': 'EI'},
 {'type': 'Constrainer', 'variable': 'variable2', 'lower_bound': 0.0, 'upper_bound': 1.5}]
Each element of the list must be a dict. The dict entry type specifies the type of the element. The remaining entries specify its properties. In the following, all possible list element types are described:

Minimization objective (type 'Minimizer'): The objective is to minimize a target variable.

See Minimizer configuration for details.

Multi-objective minimization (type 'MultiMinimizer'): The objective is to learn the Pareto front of multiple minimization objectives. A vector of objective values \(\mathbf{y}\in \mathbb{R}^q\) lies at the Pareto front if the improvement (decrease) if it is not dominated by any other possible vector of outcomes. A vector \(\mathbf{y}_1\) is dominated by a vector \(\mathbf{y}_2\) if all entries of \(\mathbf{y}_2\) are smaller or equal the entries of \(\mathbf{y}_1\).

The strategy to find samples close or at the Pareto front is to maximize the hypervolume enclosed by all non-dominated sampling points and the upper reference point \(\mathbf{y}_{\rm upper}\). The progress is defined in terms of the decreasing hypervolume between the lower reference point \(\mathbf{y}_{\rm lower}\) and the non-dominated sampling points.

See MultiMinimizer configuration for details.

Exploration objective (type 'Explorer'): The objective is to learn the global behaviour of a target variable by exploring regions with maximal uncertainty.

See Explorer configuration for details.

Outcome constraint (type 'Constrainer'): The objective is to constrain the value of a variable within a one-sided or two-sided interval. The sampling strategy tries to maximize the probability that sampled values meet the constraint. This can however not be guaranteed.

See Constrainer configuration for details.

acquisition_function (dict)

A module that defines additional parameters of the acquisition function.

Default: {'parameter_uncertainties': [], 'num_samples': 10000, 'smooth_range': 0.2}

The acquisition function defines the sampling strategy. That is, new samples for the evaluation of the black-box function are generated by maximizing the acquisition function. The acquisition strategy itself is defined in the objective. This module allows to configure additional properties of the acquisition function.

See acquisition_function configuration for details.

acquisition_optimizer (dict)

A module that maximized the acquisition function to determine new suggestions.

Default: {'adaptive_local_search': True, 'compute_suggestion_in_advance': True}

The module optimizes the acquisition function of the main objective of the study by a combination of a heuristic global optimization followed by a local convergence of the best result.

See acquisition_optimizer configuration for details.

scaling (float)

Scaling parameter of the model uncertainty. For scaling \(\gg 1.0\) (e.g. scaling=10.0) the search is more explorative. For scaling \(\ll 1.0\) (e.g. scaling=0.1) the search becomes more greedy (e.g. any local minimum is intensively exploited).

Default: 1.0

vary_scaling (bool)

If true, the scaling parameter is randomly varied between 0.1 and 10.

Default: True

parameter_distribution (dict)

Probability distribution of design and environment parameters.

Default: {'include_study_constraints': False, 'distributions': [], 'constraints': []}

Probability distribution of design and environment parameters defined by distribution functions and constraints. The definition of the parameter distribution can have several effects:

In a call to the method get_statistics of the driver interface the value of interest is averaged over samples drawn from the space distribution.

In a call to the method run_mcmc of the driver interface the space distribution acts as a prior distribution.

In a call to the method get_sobol_indices of the driver interface the space distribution acts as a weighting factor for determining expectation values.

In an ActiveLearning driver, one can access the value of the log-probability density (up to an additive constant) by the name 'log_prob' in any expression, e.g. in Expression variable, Linear combination variable.

See parameter_distribution configuration for details.