.. _BayesianReconstruction:

================================================
BayesianReconstruction
================================================

Contents
=========

:`Purpose`_: The purpose of the driver.
:`Tutorials`_: Tutorials demonstrating the application of this driver.
:`Driver Interface`_: Driver-specific methods of the Matlab interface.
:`Configuration`_: Configuration of the driver.                   


Purpose
=======


The driver searches for the model parameter with the largest posterior
probability given a set of measurements :math:`\mathbf{t} \in \mathbb{R}^K`.

We assume that the measurement process can be accurately modeled by a
function :math:`\mathbf{f}(\mathbf{p}) \in \mathbb{R}^K`. That is, the
vector of measurements :math:`\mathbf{t}` is a random vector with

.. math::
  \mathbf{t} = \mathbf{f}(\mathbf{p}) + \mathbf{w}

with :math:`\mathbf{w} \sim \mathcal{N}(0, \mathbf{G})`, where
:math:`\mathbf{G}` is a covariance matrix of the measurement
errors. In most cases the covariance matrix is diagonal (i.e. the
measurement errors are uncorrelated) with diagonal entries
:math:`G_{ii} = \eta_i^2`.

Sometimes, the measurement noise is known. However, generally one has
to find a parameterized error model :math:`\eta_i(\mathbf{d})` for the
variance itself. A common choice is to assume that the error is
composed of a background term :math:`b` and a noise contribution which
scales linearly with :math:`f_i(\mathbf{p})`:

.. math::

   \eta_i^2(a,b) = b^2 + \left[a f_i(\mathbf{p})\right]^2
	   
Since every entry of the measurement vector follows a normal
distribution :math:`t_i \sim
\mathcal{N}(f_i(\mathbf{p}),\eta_i(\mathbf{d}))` the joint
*likelihood* of measuring the vector :math:`\mathbf{t}` is given as

.. math::
   P(\mathbf{t} | \mathbf{p}, \mathbf{d}) = \prod_{i=1}^K
   \frac{1}{\sqrt{2\pi}\eta_i(\mathbf{d})}\exp\left[-\frac{1}{2}\left(\frac{t_i -
   f_i(\mathbf{p})}{\eta_i(\mathbf{d})}\right)^2\right].

Sometimes, non-uniform *prior distributions* for the design parameter
vector :math:`P_\text{prior}(\mathbf{p})` and the error model
parameters :math:`P_\text{prior}(\mathbf{d})` are available. The
*posterior distribution* is then proportional to

.. math::
	   
   P(\mathbf{p}, \mathbf{d} | \mathbf{t}) \propto P(\mathbf{t}
   | \mathbf{p}, \mathbf{d})
   P_\text{prior}(\mathbf{p}) P_\text{prior}(\mathbf{d})
	   
.. warning::
   If a :ref:`parameter distribution<BayesianReconstruction.parameter_distribution>`
   for the design space is defined, make sure
   that it has a non-vanishing probability distribution within the
   boundaries of the design space. Otherwise the negative log-probability can
   be infinite. In this case the sample computation is numerically unstable.

Alltogether, the target of finding the parameters with maximum
posterior probability density is equivalent of minimizing the value of
the negative log-probability

.. math::
   \begin{split}
   -\log\left(P(\mathbf{p}, \mathbf{d}| \mathbf{t})\right) = &
   \frac{1}{2} K\log(2\pi)
   +\sum_{i=1}^K\log\left(\eta_i(\mathbf{d})\right) 
   +\frac{1}{2}\sum_{i=1}^K \left(
      \frac{t_i - f_i(\mathbf{p})}{\eta_i(\mathbf{d})}
   \right)^2 \\
   &-\log\left(P_\text{prior}(\mathbf{d})\right)
   -\log\left(P_\text{prior}(\mathbf{p})\right).
           \end{split}

For reconstruction problems with uniform priors and known measurement uncertainty
the :ref:`BayesianLeastSquares` driver is recommended. If moreover the vectorial
forward model is inexpensive or has a known Jacobian, the :ref:`ScipyLeastSquares`
driver is recommended.




Tutorials
=========
.. toctree::

   ../tutorials/bayesian_reconstruction



Driver Interface
================

The driver instance can be obtained by :attr:`.Study.driver`.

.. currentmodule:: jcmoptimizer
.. autoclass:: BayesianReconstruction
   
active_learning_configuration
"""""""""""""""""""""""""""""""""""
   .. automethod:: SpecializedActiveLearning.active_learning_configuration
      :noindex:
		     
   
best_sample
"""""""""""""""""""""""""""""""""""
   .. automethod:: Minimizer.best_sample
      :noindex:
		     
   
describe
"""""""""""""""""""""""""""""""""""
   .. automethod:: Driver.describe
      :noindex:
		     
   
get_minima
"""""""""""""""""""""""""""""""""""
   .. automethod:: SingleGPSingleVariable.get_minima
      :noindex:
		     
   
get_observed_values
"""""""""""""""""""""""""""""""""""
   .. automethod:: SingleGPSingleVariable.get_observed_values
      :noindex:
		     
   
get_state
"""""""""""""""""""""""""""""""""""
   .. automethod:: Driver.get_state
      :noindex:
		     
   
get_statistics
"""""""""""""""""""""""""""""""""""
   .. automethod:: BayesianReconstruction.get_statistics
      
		     
   
historic_parameter_values
"""""""""""""""""""""""""""""""""""
   .. automethod:: Driver.historic_parameter_values
      :noindex:
		     
   
min_objective
"""""""""""""""""""""""""""""""""""
   .. automethod:: Minimizer.min_objective
      :noindex:
		     
   
optimize_hyperparameters
"""""""""""""""""""""""""""""""""""
   .. automethod:: SingleGPSingleVariable.optimize_hyperparameters
      :noindex:
		     
   
override_parameter
"""""""""""""""""""""""""""""""""""
   .. automethod:: SpecializedActiveLearning.override_parameter
      :noindex:
		     
   
predict
"""""""""""""""""""""""""""""""""""
   .. automethod:: BayesianReconstruction.predict
      
		     
   
run_mcmc
"""""""""""""""""""""""""""""""""""
   .. automethod:: BayesianReconstruction.run_mcmc
      
		     
   

   



Configuration
=============
The configuration parameters can be set by calling, e.g.


.. code-block:: matlab

   study.configure('example_parameter1',[1,2,3], 'example_parameter2',true);


This driver requires a definition of a :ref:`target_vector<BayesianReconstruction.target_vector>` :math:`[t_1,\cdots, t_K]` of measurements to find a solution that maximizes the posterior probability density.   

The noise variance of the measurements can be defined by either a parametrized :ref:`error model<BayesianReconstruction.error_model>` or a constant :ref:`uncertainty_vector<BayesianReconstruction.uncertainty_vector>` or more generally a :ref:`covariance_matrix<BayesianReconstruction.covariance_matrix>` between the :math:`K` target-vector entries.  

The prior density for the recontruction parameters can be defined in the :ref:`parameter_distribution<BayesianReconstruction.parameter_distribution>`.

.. _BayesianReconstruction.max_iter:


max_iter (int)
""""""""""""""
   Maximum number of evaluations of the studied system. 

   Default: Infinite number of evaluations.

.. _BayesianReconstruction.max_time:


max_time (float)
""""""""""""""""
   Maximum run time of study in seconds.  The time is counted from the moment, that the parameter is set or reset. 

   Default: ``inf``

.. _BayesianReconstruction.num_parallel:


num_parallel (int)
""""""""""""""""""
   Number of parallel evaluations of the studied system. 

   Default: ``1``

.. _BayesianReconstruction.target_vector:


target_vector (cell{float})
"""""""""""""""""""""""""""
   Vector of targets :math:`t_i` for :math:`i=1,\dots, K`. 

   Default: ``{0.0}``

.. _BayesianReconstruction.uncertainty_vector:


uncertainty_vector (cell{float})
""""""""""""""""""""""""""""""""
   Vector of target uncertainties  :math:`\eta_i` such that :math:`\chi^2 = \sum_{i=1}^K \frac{(t_i - y_i)^2}{\eta_i^2}`. 

   Default: vector of ones.

.. _BayesianReconstruction.covariance_matrix:


covariance_matrix (cell{cell{float}})
"""""""""""""""""""""""""""""""""""""
   Covariance matrix :math:`G` such that :math:`\chi^2 = \sum_{i,j=1}^K (t_i - y_i) G_{i j}^{-1} (t_j - y_j).`. 

   Default: diagonal identity matrix.

.. _BayesianReconstruction.parameter_distribution:


parameter_distribution (struct)
"""""""""""""""""""""""""""""""
   Probability distribution of design and environment parameters. 

   Default: ``struct('include_study_constraints', false, 'distributions', {}, 'constraints', {})``
   

   Probability distribution of design and environment parameters defined by distribution functions and constraints. The definition of the parameter distribution can have several effects:  
   
   * In a call to the method ``get_statistics`` of the driver interface     the value of interest is averaged over samples drawn from the     space distribution.  
   
   * In a call to the method ``run_mcmc`` of the driver interface     the space distribution acts as a prior distribution.  
   
   * In a call to the method ``get_sobol_indices`` of the driver interface     the space distribution acts as a weighting factor for determining     expectation values.  
   
   * In an :ref:`ActiveLearning` driver, one can access the value of the     log-probability density (up to an additive constant) by the name     ``'log_prob'`` in any expression, e.g. in     :ref:`ExpressionVariable`,     :ref:`LinearCombinationVariable`.   
   
   See :ref:`parameter_distribution configuration <SpaceDistribution3>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   SpaceDistribution3



.. _BayesianReconstruction.scaling:


scaling (float)
"""""""""""""""
   Scaling parameter of the model uncertainty.  For scaling :math:`\gg 1.0` (e.g. ``scaling=10.0``) the search is more explorative. For scaling :math:`\ll 1.0` (e.g. ``scaling=0.1``) the search becomes more greedy (e.g. any local minimum is intensively exploited). 

   Default: ``1.0``

.. _BayesianReconstruction.optimization_step_min:


optimization_step_min (int)
"""""""""""""""""""""""""""
   Minimum number of observations of the objective   before the hyperparameters are optimized. 

   Default: Automatic choice according to number of dimensions.

   .. note:: Each derivative observation is counting as an independent observation.

.. _BayesianReconstruction.optimization_step_max:


optimization_step_max (int)
"""""""""""""""""""""""""""
   Maximum number of observations of the objective  after which no more hyperparameter optimization is performed. 

   Default: ``300``

   .. note:: Each derivative observation is counting as an independent  observation.

.. _BayesianReconstruction.min_optimization_interval:


min_optimization_interval (int)
"""""""""""""""""""""""""""""""
   Minimum number of observations of the objective  after which the hyperparameters are optimized. 

   Default: ``2``

   .. note:: Each derivative observation is counting as an independent observation.

.. _BayesianReconstruction.max_optimization_interval:


max_optimization_interval (int)
"""""""""""""""""""""""""""""""
   Maximum number of observations of the objective  after which the hyperparameters are optimized. 

   Default: ``20``

   .. note:: Each derivative observation is counting as an independent observation.

.. _BayesianReconstruction.optimization_level:


optimization_level (float)
""""""""""""""""""""""""""
   Controls how often the hyper-parameters are optimized. Small values (e.g. 0.01) lead to more frequent optimizations.  Large values (e.g. 1) lead to less frequent optimizations. 

   Default: ``0.2``

.. _BayesianReconstruction.approximate:


approximate (bool)
""""""""""""""""""
   If true, the generalized chi-squared  variable with different uncertainties of the predictions  of each channel `K` is approximated by a chi-squared  variable with averaged uncertainties. This allows to analytically compute probability densities and any acquisition function that is directly based  on the variable. 

   Default: ``true``

.. _BayesianReconstruction.effective_DOF:


effective_DOF (float)
"""""""""""""""""""""
   Number of effective degrees of freedom (DOF)  used for stochastic variable of the chi-squared distribution. This number roughly indicates how many output channels of the  forward model are statistically independent. 

   Default: If not specified, the value us determined automatically.

   .. note:: If ``approximate`` is false, this parameter has no  effect.

.. _BayesianReconstruction.effective_DOF_bounds:


effective_DOF_bounds (cell{float})
""""""""""""""""""""""""""""""""""
   The number of effective degrees of freedom (DOF)  is determined by a maximum likelihood estimate within the given lower and upper bounds. 

   Default: ``{20.0,50.0}``

   .. note:: If ``approximate`` is false or the ``effective_DOF`` is set manually, this parameter has no effect.

.. _BayesianReconstruction.min_val:


min_val (float)
"""""""""""""""
   The minimization of the objective is stopped when the observed objective value is below the specified minimum value. 

   Default: ``-inf``

.. _BayesianReconstruction.min_uncertainty:


min_uncertainty (float)
"""""""""""""""""""""""
   The study is stopped if the uncertainty (square root of variance) of the objective at the last 5 sampling points was below ``min_uncertainty``. 

   Default: ``0.0``

.. _BayesianReconstruction.num_training_samples:


num_training_samples (int)
""""""""""""""""""""""""""
   Number of pseudo-random initial samples before the samples are drawn according to the acquisition function. 

   Default: Automatic choice depending depending on dimensionality of design space.

.. _BayesianReconstruction.min_PoI:


min_PoI (float)
"""""""""""""""
   The study is stopped if the maximum probability of improvement (PoI) of the last 5 iterations is below ``min_PoI``. 

   Default: ``0.0``

.. _BayesianReconstruction.adaptive_local_search:


adaptive_local_search (bool)
""""""""""""""""""""""""""""
   The maximization of the acquisition function consists of a global heuristic search followed by a local convergence of the best samples. By default, the local search effort adapts to the value of ``max_num_model_evals``. In some cases far better samples are computed, if the local search is only stopped when convergence to a local minimum was obtained. 

   Default: ``false``

.. _BayesianReconstruction.max_num_model_evals:


max_num_model_evals (int)
"""""""""""""""""""""""""
   Maximum number of evaluations of the surrogates for finding the maximum of the acquisition function. 

   Default: ``1500``

.. _BayesianReconstruction.num_iterative_solver_steps:


num_iterative_solver_steps (int)
""""""""""""""""""""""""""""""""
   Gaussian process regression requires to solve systems of linear equations. In order to improve the accuracy of direct solution methods, an iterative refinement can be performed with the given number of iterations. 

   Default: ``5``

.. _BayesianReconstruction.error_model:


error_model (struct)
""""""""""""""""""""
    
   
   .. note::  
   
      If an error model is specified,    it overrides setting for the    :ref:`target_vector<BayesianReconstruction.target_vector>` or    :ref:`covariance_matrix<BayesianReconstruction.covariance_matrix>` of the    measurement uncertainties. The error model parameters    :math:`\mathbf{d}` are then always fit to maximize the    posterior probability for any given design point    :math:`\mathbf{p}`             
   
      .. math::  
   
         P(\mathbf{d} | \mathbf{p}, \mathbf{t}) =       \prod_{i=1}^K \frac{1}{\sqrt{2\pi}\eta_i(\mathbf{d})}\exp\left[          -\frac{1}{2}\left(              \frac{t_i - \mu_i(\mathbf{p})}{\eta_i(\mathbf{d})}          \right)^2       \right] + P_\text{prior}(\mathbf{d}).     
   
      Here, :math:`\mu_i(\mathbf{p})` is the predicted mean of the input    to the variable and :math:`P_\text{prior}(\mathbf{d})` is the    prior probability density for the error model parameters. 

   Default: ``struct('expression', "RSE*10^(log_err)", 'distributions', {struct('type', "normal", 'parameter', "log_err", 'mean', 0, 'stddev', 1)}, 'xtol', 1e-05)``
   The parameter is optional and may also have the  value ``None``.

   An analytic model for the error of the target vector entries. The model :math:`\eta_i({\rm RSE}, y_i, \epsilon_i)` for each entry :math:`i=1,\dots,K` of the target vector can depend on the residual standard error (RSE), the model prediction :math:`y_i` and the corresponding initially assumed measurement error :math:`\epsilon_i`.   
   
   The RSE is defined as :math:`\sqrt{\chi_\text{min}^2/\mathrm{DOF}}` where the number of degrees of freedom (DOF) is defined as the difference between the target-vector dimension :math:`K` and the number parameters to reconstruct (the dimension of the design space).  
   
   Moreover, :math:`M` other random parameters :math:`\mathbf{d}` can be defined in the :ref:`parameter distribution<BayesianReconstruction.error_model.distributions>` and :ref:`initial_parameters<BayesianReconstruction.error_model.initial_parameters>`.  
   
   The log-probability density of the random parameters :math:`\mathbf{d}` and the design space parameters :math:`p` is given as (up to constant additive terms)  
   
   .. math::  
   
      \log[P(\mathbf{p}, \mathbf{d})] = -\frac{1}{2} \cdot    \sum_i\left(\frac{(t_i-y_i(\mathbf{p}))^2}{\eta_i^2(\mathbf{d})} +    \log[\eta_i^2(\mathbf{d})]\right) + \log[P_\text{prior}(\mathbf{p})] +    \log[P_\text{prior}(\mathbf{d})].   
   
   See :ref:`error_model configuration <ErrorModel1>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   ErrorModel1