BayesianReconstruction

Contents

Purpose:: The purpose of the driver.
Tutorials:: Tutorials demonstrating the application of this driver.
Driver Interface:: Driver-specific methods of the Matlab interface.
Configuration:: Configuration of the driver.

Purpose

The driver searches for the model parameter with the largest posterior probability given a set of measurements \(\mathbf{t} \in \mathbb{R}^K\).

We assume that the measurement process can be accurately modeled by a function \(\mathbf{f}(\mathbf{p}) \in \mathbb{R}^K\). That is, the vector of measurements \(\mathbf{t}\) is a random vector with

\[\mathbf{t} = \mathbf{f}(\mathbf{p}) + \mathbf{w}\]

with \(\mathbf{w} \sim \mathcal{N}(0, \mathbf{G})\), where \(\mathbf{G}\) is a covariance matrix of the measurement errors. In most cases the covariance matrix is diagonal (i.e. the measurement errors are uncorrelated) with diagonal entries \(G_{ii} = \eta_i^2\).

Sometimes, the measurement noise is known. However, generally one has to find a parameterized error model \(\eta_i(\mathbf{d})\) for the variance itself. A common choice is to assume that the error is composed of a background term \(b\) and a noise contribution which scales linearly with \(f_i(\mathbf{p})\):

\[\eta_i^2(a,b) = b^2 + \left[a f_i(\mathbf{p})\right]^2\]

Since every entry of the measurement vector follows a normal distribution \(t_i \sim \mathcal{N}(f_i(\mathbf{p}),\eta_i(\mathbf{d}))\) the joint likelihood of measuring the vector \(\mathbf{t}\) is given as

\[P(\mathbf{t} | \mathbf{p}, \mathbf{d}) = \prod_{i=1}^K \frac{1}{\sqrt{2\pi}\eta_i(\mathbf{d})}\exp\left[-\frac{1}{2}\left(\frac{t_i - f_i(\mathbf{p})}{\eta_i(\mathbf{d})}\right)^2\right].\]

Sometimes, non-uniform prior distributions for the design parameter vector \(P_\text{prior}(\mathbf{p})\) and the error model parameters \(P_\text{prior}(\mathbf{d})\) are available. The posterior distribution is then proportional to

\[P(\mathbf{p}, \mathbf{d} | \mathbf{t}) \propto P(\mathbf{t} | \mathbf{p}, \mathbf{d}) P_\text{prior}(\mathbf{p}) P_\text{prior}(\mathbf{d})\]

Warning

If a parameter distribution for the design space is defined, make sure that it has a non-vanishing probability distribution within the boundaries of the design space. Otherwise the negative log-probability can be infinite. In this case the sample computation is numerically unstable.

Alltogether, the target of finding the parameters with maximum posterior probability density is equivalent of minimizing the value of the negative log-probability

\[\begin{split}\begin{split} -\log\left(P(\mathbf{p}, \mathbf{d}| \mathbf{t})\right) = & \frac{1}{2} K\log(2\pi) +\sum_{i=1}^K\log\left(\eta_i(\mathbf{d})\right) +\frac{1}{2}\sum_{i=1}^K \left( \frac{t_i - f_i(\mathbf{p})}{\eta_i(\mathbf{d})} \right)^2 \\ &-\log\left(P_\text{prior}(\mathbf{d})\right) -\log\left(P_\text{prior}(\mathbf{p})\right). \end{split}\end{split}\]

For reconstruction problems with uniform priors and known measurement uncertainty the BayesianLeastSquares driver is recommended. If moreover the vectorial forward model is inexpensive or has a known Jacobian, the ScipyLeastSquares driver is recommended.

Tutorials

Bayesian parameter reconstruction using Bayesian optimization

Driver Interface

The driver instance can be obtained by Study.driver.

class BayesianReconstruction(host, verbose, study_id)

active_learning_configuration

SpecializedActiveLearning.active_learning_configuration()
Return a configuration for the ActiveLearning driver that can be used to reproduce the behavior of the current driver. Example:
 config = driver.active_learning_configuration();
 study_active_learning = client.create_study(...
     driver, "ActiveLearning",...
 );
 study_active_learning.configure(config{:});

Returns:
  cell: Cell array with configuration for the ActiveLearningDriver

best_sample

Minimizer.best_sample()
Best sample with minimal objective value found during the minimization. Example:
samples = study.driver.best_sample;
fn = fieldnames(sample);
for i=1:length(fn)
   fprintf("%s = %s\n", fn{i}, sample.(fn{i}) );
end
Return type:

struct

describe

Driver.describe()
Get description of all modules and their parameters that are used by the driver. Example:
description = driver.describe();
fprintf(description.members.surrogates{1});
Returns:

A nested struct with description of submodules consisting
of a name and a descriptive text. If the entry describes a module, it has an additional "members" entry with struct describing submodules and parameters.

Return type:

struct

get_minima

SingleGPSingleVariable.get_minima(varargin)
Get a list of information about local minima of a single-output surrogate, variable or objective. The width \(\sigma\) in each parameter direction is determined by a fit of the minimum to a Gaussian function that goes asymptotically to the mean value of the object. The minima are found using predictions of the surrogate models. The validity of constraints is completely ignored. Example:
minima = driver.get_minima('num_output', 10);
Parameters:

environment_value (array) – Optional environment value for which local minima of design values are determined. If None, the local minima are also determined with respect to environment parameters.

num_initial_samples (int) – Number of initial samples for searching (default: automatic determination).

num_output (int) – Maximum number of minima that are returned. Defaults to 10.

epsilon (double) – Parameter used for identifying identical minima (i.e. minima with distance < length scale * epsilon) and minima with non-vanishing gradient (e.g. minima at the boundary of the search space). Defaults to 0.1.

ftol (double) – Precision goal for the minimum function value. Defaults to 1e-9.

delta (double) – step size parameter used for approximating second derivatives. Defaults to 0.1.

min_dist (double) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected. Defaults to 0.

Returns:

A struct with information about local minima
with lists of object values, uncertainties of the objective values, the parameter values and the width \(\sigma\) in each parameter direction (i.e. standard deviation after a fit to a Gaussian function)

Return type:

struct

get_observed_values

SingleGPSingleVariable.get_observed_values()
Get observed values. For noisy input data, the values are obtained on the basis of predictions of the surrogate models. Therefore, they can slightly differ from the input data. Example:
data = driver.get_observed_values();
Returns: Struct with the following fields:

samples:

The observed samples (design and possibly environment values).

means:

Mean values of observations. For noiseless observations, this is the observed value itself.

variance:

Variance of observed values. For noiseless observations, this is typically a negligibly small number.

get_state

Driver.get_state(path)
Get state of the driver. Example:
best_sample = driver.get_state("best_sample");
Parameters:

path (str) – A dot-separated path to a submodule or parameter. If none, the full state is returned.

Returns:

If path is None, a struct with information of driver state.

Return type:

struct

Note

A description of the meaning of each entry in the state can be retrieved by describe().

get_statistics

BayesianReconstruction.get_statistics(varargin)
Determines statistics like the mean and variance of a surrogate or negative log-probability objective under a parameter distribution. By default the probability density of the parameters is a uniform distribution in the whole parameter domain. Other parameter distributions can be defined via study.configure('parameter_distribution', {...}).

Example:
distribution = {...
    struct('type', 'normal', 'parameter', 'x1', 'mean', 0.0, 'stddev', 0.01), ...
    struct('type', 'uniform', 'parameter', 'x2', 'domain', [-1.0, 1.0])...
};
study.configure('parameter_distribution', distribution);
stats = study.driver.get_statistics('quantiles', [0.5]);
fprintf("Objective mean %f, standard deviation %f, median %f\n", ...
        stats.mean(1), sqrt(stats.variance(1)), stats.quantiles(1, 1));
Parameters:

object_type (str) – The type of object for which a prediction is required. Optional are “surrogate”, “variable” or “objective”. Defaults to “objective”.

quantiles (array) – A list with quantiles. If not specified, the quantiles [0.16,0.5,0.84] are used.

rel_precision (double) – The Monte Carlo integration is stopped when the empiric relative uncertainty of the mean value of all outputs is smaller than rel_precision. Defaults to 1e-3.

abs_precision (double) – The Monte Carlo integration is stopped when the empiric absolute uncertainty of the mean value of all outputs is smaller than abs_precision. Defaults to 1e-9.

max_time (double) – The Monte Carlo integration is stopped when the time max_time has passed. Defaults to inf.

max_samples (double) – The Monte Carlo integration is stopped when the number of evaluated samples equals or exceeds the given value. Defaults to 1e6.

min_dist (double) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected. Defaults to 0.

Returns:

A struct with the field names

mean:

Expectation values \(\mathbf{m}=\mathbb{E}[\mathbf{g}(\mathbf{x})]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

variance:

Variance \(\mathbf{v}=\mathbb{E}[(\mathbf{g}(\mathbf{x})-\mathbf{m})^2]\) of the object function \(\mathbf{g}(\mathbf{x})\) under the parameter distribution

quantiles:

A list of quantiles of shape (num_quantiles, num_outputs).

num_samples:

Number of sampling points \(N\) that were used in the Monte Carlo integration. The numerical uncertainty of the computed mean value is \(\sqrt{v/N}\).

Return type:

struct

historic_parameter_values

Driver.historic_parameter_values(path)
Get the values of an internal parameter for each iteration of the study. Example:
min_objective_values = driver.historic_parameter_values(...
    "acquisition_function.min_objective");
Parameters:

path (str) – A dot-separated path to the parameter.

Returns:

Array of parameter values

Return type:

array

Note

A description of the meaning of each parameter can be retrieved by describe().

min_objective

Minimizer.min_objective()
Minimal objective value found during the minimization. Example:
min_objective = driver.min_objective;
Return type:

double

optimize_hyperparameters

SingleGPSingleVariable.optimize_hyperparameters(varargin)
Optimize the hyperparameters of the driver. This is usually done automatically. Example:
driver.optimize_hyperparameters();
Parameters:

num_samples (str) – Number of initial start samples for optimization (default: automatic determination)

min_dist (double) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected. Defaults to 0.

override_parameter

SpecializedActiveLearning.override_parameter(path, value)
Override an internal parameter of the driver that is otherwise selected automatically. Example:
driver.override_parameter( ...
    "surrogates.0.matrix.kernel.design_length_scales", ...
    [1.0, 2.0]
);
Parameters:

path (str) – A dot-separated path to the parameter to be overridden.

value (Any) – The new value for the parameter.

Note

A description of the meaning of each parameter can be retrieved by Driver.describe().

predict

BayesianReconstruction.predict(points, varargin)
Make predictions on the multi-output surrogate or the negative log-probability objective. Example:
prediction = driver.predict([1,0,0; 2,0,1]);
Parameters:

points (array) – Vectors of the space (design space + environment) of shape (num_points, num_dim)

object_type (str) – The type of object for which a prediction is required. Optional are “surrogate”, “objective”. Defaults to “objective”.

name (str) – The name of the object for which predictions are required. Defaults to “objective”.

output_type (str) –
The type of output. Defaults to “mean_var”. Options are:

mean_var:

Mean and variance of the posterior distribution. This describes the posterior distribution only for normally distributed posteriors. The function returns a dictionary with keywords "mean" and "variance" mapping to array of length num_points. If the posterior distribution is multivariate normal, for each point a covariance matrix of shape (output_dim, output_dim) is returned, otherwise a list of variances of shape (output_dim,) is returned.

quantiles:

A list of quantiles of the distribution is estimated based on samples drawn from the distribution. The function returns a dict with entry "quantiles" and a tensor of shape (num_quantiles, num_points, output_dim)

samples:

Random samples drawn from the posterior distribution. The function returns a dict with the entry "samples" and a tensor of shape (num_samples, num_points, output_dim)

min_dist (double) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected. Defaults to 0.

quantiles (array) – A list with quantiles. If not specified, the quantiles [0.16,0.5,0.84] are used.

num_samples (int) – Number of samples used for posteriors that have a sampling-based distribution or if output_type is “samples”. If not specified, the same number as in the acquisition is used. If the posterior is described by a fixed number of ensemble points, the minimum of num_samples and the ensemble size is used.

Returns:

A struct with the field names “mean” and “variance” if
output_type = "mean_var" and “samples” if output_type = "samples"

Return type:

struct

run_mcmc

BayesianReconstruction.run_mcmc(varargin)
Runs a Markov Chain Monte Carlo (MCMC) sampling of probability density. By default the prior probability density of the parameters is a uniform distribution in the whole parameter domain. Other parameter distributions can be defined via study.configure('parameter_distribution', {...}). Example:
distribution = {...
    struct('type', 'normal', 'parameter', 'x1', 'mean', 0.0, 'stddev', 0.01), ...
    struct('type', 'uniform', 'parameter', 'x2', 'domain', [-1.0, 1.0])...
};
study.configure('parameter_distribution', distribution);
mcmc_result = study.driver.run_mcmc()
The estimated error of a Monte-Carlo integration of some function \(f\) is \(\delta = \sigma / \sqrt{N_{\rm ind}}\) where \(\sigma^2\) is the variance of \(f\) and \(N_{\rm ind}\) is the number of independent samples from the probability distribution. The error relative to the variance \(\sigma^2\) is therefore \(\delta_{\rm rel}=1/\sqrt{N_{\rm ind}}\). Assuming, that subsequent samples of a chain are correlated and this correlation vanishes at a correlation time \(\tau\), \(N_{\rm ind} = N/\tau\) of all \(N\) MCMC samples are independent. Note, that \(\tau\) can only be estimated and thus the relative Monte-Carlo integration error \(\delta_{\rm rel}=\tau/\sqrt{N}\) can be under or overestimated.

Parameters:

num_walkers (int) – Number of walkers. If not specified, the value is automatically chosen.

max_iter (int) – Maximum absolute chain length. Default is 50000.

max_time (double) – Maximum run time in seconds. If not specified, the runtime is not limited. Default is inf.

rel_error (double) – Targeted relative Monte-Carlo integration error \(\delta_{\rm rel}\) of the samples. Default is 0.05.

thin_chains (logical) – If true, only every \(\tau\)-th sample of all MCMC samples is returned. This is helpful if the full number of samples gets too large and a representative uncorrelated subset is required. Default is false.

multi_modal (logical) – If true, a more explorative sampling strategy is used. Default is false.

append (logical) – If true, the samples are appended to the samples of the previous MCMC run. Default is false.

max_sigma_dist (double) – If set, the sampling is restricted to a a distance max_sigma_dist * sigma to the maximum likelihood estimate. E.g. max_sigma_dist=3.0 means that only the 99.7% probability region of each parameter is sampled. Defaults to inf.

min_dist (double) – In order to speed up the prediction, one can use a sparsified version of the base surrogates where sampling with a distance smaller than min_dist (in terms of the length scales of the surrogate) are neglected. Defaults to 0.

Returns:

A struct with the following field names:

samples:

The drawn samples without “burn-in” samples thinned by half of the correlation time.

medians:

The medians of all random parameters

lower_uncertainties:

The distances between the medians and the 16% quantile of all random parameters

upper_uncertainties:

The distance between the medians and the 84% quantile of all random parameters

tau:

Estimated correlation time of each parameter.

Return type:

struct

Configuration

The configuration parameters can be set by calling, e.g.

study.configure('example_parameter1',[1,2,3], 'example_parameter2',true);

This driver requires a definition of a target_vector \([t_1,\cdots, t_K]\) of measurements to find a solution that maximizes the posterior probability density.

The noise variance of the measurements can be defined by either a parametrized error model or a constant uncertainty_vector or more generally a covariance_matrix between the \(K\) target-vector entries.

The prior density for the recontruction parameters can be defined in the parameter_distribution.

max_iter (int)

Maximum number of evaluations of the studied system.

Default: Infinite number of evaluations.

max_time (float)

Maximum run time of study in seconds. The time is counted from the moment, that the parameter is set or reset.

Default: inf

num_parallel (int)

Number of parallel evaluations of the studied system.

Default: 1

target_vector (cell{float})

Vector of targets \(t_i\) for \(i=1,\dots, K\).

Default: {0.0}

uncertainty_vector (cell{float})

Vector of target uncertainties \(\eta_i\) such that \(\chi^2 = \sum_{i=1}^K \frac{(t_i - y_i)^2}{\eta_i^2}\).

Default: vector of ones.

covariance_matrix (cell{cell{float}})

Covariance matrix \(G\) such that \(\chi^2 = \sum_{i,j=1}^K (t_i - y_i) G_{i j}^{-1} (t_j - y_j).\).

Default: diagonal identity matrix.

parameter_distribution (struct)

Probability distribution of design and environment parameters.

Default: struct('include_study_constraints', false, 'distributions', {}, 'constraints', {})

Probability distribution of design and environment parameters defined by distribution functions and constraints. The definition of the parameter distribution can have several effects:

In a call to the method get_statistics of the driver interface the value of interest is averaged over samples drawn from the space distribution.

In a call to the method run_mcmc of the driver interface the space distribution acts as a prior distribution.

In a call to the method get_sobol_indices of the driver interface the space distribution acts as a weighting factor for determining expectation values.

In an ActiveLearning driver, one can access the value of the log-probability density (up to an additive constant) by the name 'log_prob' in any expression, e.g. in Expression variable, Linear combination variable.

See parameter_distribution configuration for details.

scaling (float)

Scaling parameter of the model uncertainty. For scaling \(\gg 1.0\) (e.g. scaling=10.0) the search is more explorative. For scaling \(\ll 1.0\) (e.g. scaling=0.1) the search becomes more greedy (e.g. any local minimum is intensively exploited).

Default: 1.0

optimization_step_min (int)

Minimum number of observations of the objective before the hyperparameters are optimized.

Default: Automatic choice according to number of dimensions.

Note

Each derivative observation is counting as an independent observation.

optimization_step_max (int)

Maximum number of observations of the objective after which no more hyperparameter optimization is performed.

Default: 300

Note

Each derivative observation is counting as an independent observation.

min_optimization_interval (int)

Minimum number of observations of the objective after which the hyperparameters are optimized.

Default: 2

Note

Each derivative observation is counting as an independent observation.

max_optimization_interval (int)

Maximum number of observations of the objective after which the hyperparameters are optimized.

Default: 20

Note

Each derivative observation is counting as an independent observation.

optimization_level (float)

Controls how often the hyper-parameters are optimized. Small values (e.g. 0.01) lead to more frequent optimizations. Large values (e.g. 1) lead to less frequent optimizations.

Default: 0.2

approximate (bool)

If true, the generalized chi-squared variable with different uncertainties of the predictions of each channel K is approximated by a chi-squared variable with averaged uncertainties. This allows to analytically compute probability densities and any acquisition function that is directly based on the variable.

Default: true

effective_DOF (float)

Number of effective degrees of freedom (DOF) used for stochastic variable of the chi-squared distribution. This number roughly indicates how many output channels of the forward model are statistically independent.

Default: If not specified, the value us determined automatically.

Note

If approximate is false, this parameter has no effect.

effective_DOF_bounds (cell{float})

The number of effective degrees of freedom (DOF) is determined by a maximum likelihood estimate within the given lower and upper bounds.

Default: {20.0,50.0}

Note

If approximate is false or the effective_DOF is set manually, this parameter has no effect.

min_val (float)

The minimization of the objective is stopped when the observed objective value is below the specified minimum value.

Default: -inf

min_uncertainty (float)

The study is stopped if the uncertainty (square root of variance) of the objective at the last 5 sampling points was below min_uncertainty.

Default: 0.0

num_training_samples (int)

Number of pseudo-random initial samples before the samples are drawn according to the acquisition function.

Default: Automatic choice depending depending on dimensionality of design space.

min_PoI (float)

The study is stopped if the maximum probability of improvement (PoI) of the last 5 iterations is below min_PoI.

Default: 0.0

adaptive_local_search (bool)

The maximization of the acquisition function consists of a global heuristic search followed by a local convergence of the best samples. By default, the local search effort adapts to the value of max_num_model_evals. In some cases far better samples are computed, if the local search is only stopped when convergence to a local minimum was obtained.

Default: false

max_num_model_evals (int)

Maximum number of evaluations of the surrogates for finding the maximum of the acquisition function.

Default: 1500

num_iterative_solver_steps (int)

Gaussian process regression requires to solve systems of linear equations. In order to improve the accuracy of direct solution methods, an iterative refinement can be performed with the given number of iterations.

Default: 5

error_model (struct)

Note

If an error model is specified, it overrides setting for the target_vector or covariance_matrix of the measurement uncertainties. The error model parameters \(\mathbf{d}\) are then always fit to maximize the posterior probability for any given design point \(\mathbf{p}\)

\[P(\mathbf{d} | \mathbf{p}, \mathbf{t}) = \prod_{i=1}^K \frac{1}{\sqrt{2\pi}\eta_i(\mathbf{d})}\exp\left[ -\frac{1}{2}\left( \frac{t_i - \mu_i(\mathbf{p})}{\eta_i(\mathbf{d})} \right)^2 \right] + P_\text{prior}(\mathbf{d}).\]

Here, \(\mu_i(\mathbf{p})\) is the predicted mean of the input to the variable and \(P_\text{prior}(\mathbf{d})\) is the prior probability density for the error model parameters.

Default: struct('expression', "RSE*10^(log_err)", 'distributions', {struct('type', "normal", 'parameter', "log_err", 'mean', 0, 'stddev', 1)}, 'xtol', 1e-05) The parameter is optional and may also have the value None.

An analytic model for the error of the target vector entries. The model \(\eta_i({\rm RSE}, y_i, \epsilon_i)\) for each entry \(i=1,\dots,K\) of the target vector can depend on the residual standard error (RSE), the model prediction \(y_i\) and the corresponding initially assumed measurement error \(\epsilon_i\).

The RSE is defined as \(\sqrt{\chi_\text{min}^2/\mathrm{DOF}}\) where the number of degrees of freedom (DOF) is defined as the difference between the target-vector dimension \(K\) and the number parameters to reconstruct (the dimension of the design space).

Moreover, \(M\) other random parameters \(\mathbf{d}\) can be defined in the parameter distribution and initial_parameters.

The log-probability density of the random parameters \(\mathbf{d}\) and the design space parameters \(p\) is given as (up to constant additive terms)

\[\log[P(\mathbf{p}, \mathbf{d})] = -\frac{1}{2} \cdot \sum_i\left(\frac{(t_i-y_i(\mathbf{p}))^2}{\eta_i^2(\mathbf{d})} + \log[\eta_i^2(\mathbf{d})]\right) + \log[P_\text{prior}(\mathbf{p})] + \log[P_\text{prior}(\mathbf{d})].\]

See error_model configuration for details.