Bayesian parameter reconstruction using Bayesian optimization

Driver:

BayesianReconstruction

Download script: bayesian_reconstruction.m

The target of the study is to showcase the solution of a Bayesian parameter reconstruction problembased on a set ot measurements \(\mathbf{t} \in \mathbb{R}^K\).

We assume that the measurement process can be accurately modeled by a function \(\mathbf{f}(\mathbf{b}) \in \mathbb{R}^K\). As an example we consider the MGH17 problem which consists of fitting a vectorial model \(\mathbf{f}(\mathbf{b}) \in \mathbb{R}^{33}\) with

\[f_i(\mathbf{b}) = b_1 + b_2 \exp(-i \cdot b_4) + b_3 \exp(-i \cdot b_5),\, i = 0, \dots, 32\]

to a measurement vector with \(K = 33\) entries.

Due to random measurement noise \(\mathbf{w} \sim \mathcal{N}(0, {\rm diag}{[\eta_1^2,...,\eta_K^2]})\), the vector of measurements \(\mathbf{t} = \mathbf{f}(\mathbf{b}) + \mathbf{w}\) is a random vector with probability density

\[P(\mathbf{t}) = \prod_{i=1}^K \frac{1}{\sqrt{2\pi}\eta_i}\exp\left[-\frac{1}{2}\left(\frac{t_i - f_i(\mathbf{b})}{\eta_i}\right)^2\right].\]

The noise variances \(\eta_i^2\) shall be unknown and estimated from the measurement data. To this end we assume that the variance is composed of a background term \(c_1^2\) and a noise contribution which scales linearly with \(f_i(\mathbf{b})\), i.e.

\[\eta_i^2(\mathbf{c}) = c_1^2 + \left[c_2 f_i(\mathbf{b})\right]^2.\]

Taking non-uniform prior distributions for the design parameter vector \(P_\text{prior}(\mathbf{b})\) and the error model parameters \(P_\text{prior}(\mathbf{c})\) into account the posterior distribution is then proportional to \(P(\mathbf{b}, \mathbf{c} | \mathbf{t}) \propto P(\mathbf{t} | \mathbf{b}, \mathbf{c}) P_\text{prior}(\mathbf{b}) P_\text{prior}(\mathbf{c})\).

Alltogether, the target of finding the parameters with maximum posterior probability density is equivalent of minimizing the value of the negative log-probability

\[\begin{split}\begin{split} -\log\left(P(\mathbf{b}, \mathbf{c}| \mathbf{t})\right) = & \frac{1}{2} K\log(2\pi) +\sum_{i=1}^K\log\left(\eta_i(\mathbf{c})\right) +\frac{1}{2}\sum_{i=1}^K \left( \frac{t_i - f_i(\mathbf{b})}{\eta_i(\mathbf{c})} \right)^2 \\ &-\log\left(P_\text{prior}(\mathbf{d})\right) -\log\left(P_\text{prior}(\mathbf{c})\right). \end{split}\end{split}\]

In the following, we show how this negative log-probability can be minimized using the BayesianReconstruction driver. We compare the results of the dirver’s MCMC sampling of the posterior probability to an MCMC sampling of the analytic value.

  1jcm_optimizer_path = '<JCM_OPTIMIZER_PATH>';
  2addpath(fullfile(jcm_optimizer_path, 'interface', 'matlab'));
  3 
  4server = jcmoptimizer.Server();
  5client = jcmoptimizer.Client(server.host);
  6
  7% Definition of the search domain
  8design_space = { ...
  9  struct('name', 'b1', 'type', 'continuous', 'domain', [0,10]), ... 
 10  struct('name', 'b2', 'type', 'continuous', 'domain', [0.1,4]), ...
 11  struct('name', 'b3', 'type', 'continuous', 'domain', [-4,-0.1]), ...
 12  struct('name', 'b4', 'type', 'continuous', 'domain', [0.05,1]), ...
 13  struct('name', 'b5', 'type', 'continuous', 'domain', [0.05,1]) ...
 14};
 15
 16 % Creation of the study object with study_id 'bayesian_reconstruction'
 17study = client.create_study( ...
 18    'design_space', design_space, ...
 19    'driver','BayesianReconstruction',...
 20    'name','Bayesian parameter reconstruction using Bayesian optimization',...
 21    'study_id', 'bayesian_reconstruction');
 22%The vectorial model function of the MGH17 problem
 23function val = model(x)
 24    s = 0:32;
 25    val = x(1) + x(2)*exp(-s.*x(4)) + x(3)*exp(-s.*x(5));
 26end
 27
 28%The forward model parameters b to be reconstructed
 29b_true = [3.7541005211E-01, 1.9358469127E+00, -1.4646871366E+00, ...
 30 1.2867534640E-01,2.2122699662E-01];
 31
 32%The error model parameters in log-space to be reconstructed
 33log_c1 = log(0.005);
 34log_c2 = log(0.01);
 35
 36%The error model, i.e. the noise stddev depending on the model value y=f(b)
 37function stddev = error_model(log_c1, log_c2, y)
 38    stddev = sqrt( exp(log_c1)^2 + (exp(log_c2).*y).^2);
 39end
 40
 41%Generate a rantom target vector of measurements
 42rng("default");
 43model_vector = model(b_true);
 44err = error_model(log_c1,log_c2,model_vector);
 45measurement_vector = model_vector + err.*randn(size(model_vector));
 46
 47%Configuration of the error model
 48err_model_conf = struct();
 49%error model expression
 50err_model_conf.expression = 'sqrt(exp(log_c1)^2 + (exp(log_c2)*y_model)^2)';
 51%distribution of error model parameters
 52err_model_conf.distributions = { ...
 53   struct('type', 'normal', 'parameter', 'log_c1', 'mean', -5.0, 'stddev', 1), ...
 54   struct('type', 'normal', 'parameter', 'log_c2', 'mean', -4.0, 'stddev', 1), ...
 55};
 56%initial values and parameter bounds for fitting the error model parameters
 57err_model_conf.initial_parameters = [-5.0, -4.0];
 58err_model_conf.parameter_bounds = [-7,-2; -6.0,-1];
 59
 60%Multivariate normal prior distribution of forward model parameters.
 61%Unspecified parameters (b1,b4,b5) are uniformly distributed in the design space
 62distribution = struct();
 63distribution.type='mvn'; 
 64distribution.parameters = {'b2','b3'};
 65distribution.mean = [2.25,-2.0];
 66distribution.covariance = [0.5,-0.01;-0.01,0.5];
 67parameter_distribution = struct();
 68parameter_distribution.distributions = {distribution};
 69 
 70study.configure( ...
 71    'max_iter', 80, ...
 72    'target_vector', measurement_vector, ...
 73    'error_model', err_model_conf, ...
 74    'parameter_distribution', parameter_distribution ...
 75);
 76% Evaluation of the black-box function for specified design parameters
 77function observation = evaluate(study, sample)
 78
 79    observation = study.new_observation();
 80    %array of design values
 81    x = [sample.b1, sample.b2, sample.b3, sample.b4, sample.b5];    
 82    observation.add(model(x));
 83    
 84end  
 85
 86% Run the minimization
 87study.set_evaluator(@evaluate);
 88study.run(); 
 89
 90best_b_sample = study.driver.best_sample;
 91min_neg_log_prob = study.driver.min_objective;
 92
 93%determine sample [b1, b2, b3, b4, b5, log_c1, log_c2] that minimizes the negative
 94%log-probability
 95minimum = struct2array(best_b_sample);
 96
 97%path to negative log-probability variable
 98path = 'driver.acquisition_function.main_objective.variable';
 99neg_log_probs = study.historic_parameter_values([path, '.observed_value']);
100[min_val, idx] = min(neg_log_probs);
101
102logs_c1 = study.historic_parameter_values([path, '.error_model_parameters.log_c1']);
103logs_c2 = study.historic_parameter_values([path, '.error_model_parameters.log_c2']);
104
105minimum = [minimum, logs_c1(idx), logs_c2(idx)];
106 
107% Before running a Markov-chain Monte-Carlo (MCMC) sampling we converge the surrogate
108% models by sampling around the minimum. To make the study more explorative, the
109% scaling parameter is increased and the effective degrees of freedom is set to one.
110study.configure( ...
111    'scaling', 10.0, ...
112    'effective_DOF', 1.0,  ...
113    'min_uncertainty', abs(min_neg_log_prob)*1e-8, ...
114    'max_iter', 120 ...
115);
116while(not(study.is_done))
117   sug = study.get_suggestion();
118   obs = evaluate(study, sug.sample);
119   study.add_observation(obs, sug.id);
120end
121
122% Run the MCMC sampling with 32 walkers
123num_walkers = 32; max_iter = 10000;
124mcmc_result = study.driver.run_mcmc( ...
125    'rel_error', 0.01, ...
126    'num_walkers', num_walkers, ...
127    'max_iter', max_iter ...
128);
129% This Matlab code does not contain a detailed analysis of the MCMC sampling by corner plots.
130% Please, see the corresponding Python tutorial for a code example.
MCMC sampling based on surrogate

Markov-Chain Monte-Carlo (MCMC) sampling of the probability density of the model parameters \(b_1,\dots,b_5\) and the log value of the error model parameters \(c_1,\dots,c_2\) based on the analytic model function.

MCMC sampling based on surrogate

Markov-Chain Monte-Carlo (MCMC) sampling of the probability density of the model parameters \(b_1,\dots,b_5\) and the log value of the error model parameters \(c_1,\dots,c_2\) based on the trained surrogate of the study. A comparison between the analytic and the surrogate model function shows a good quantitative agreement.