.. _ActiveLearning:

================================================
ActiveLearning
================================================

Contents
=========

:`Purpose`_: The purpose of the driver.
:`Tutorials`_: Tutorials demonstrating the application of this driver.
:`Driver Interface`_: Driver-specific methods of the Python interface.
:`Configuration`_: Configuration of the driver.                   


Purpose
=======


The driver defines a general active learning process that
can be configured for many different purposes such as optimization,
integration or learning of the behavior of one or more expensive black box function.

.. tip::
   The driver is very versatile but also requires a detailed configuration. For
   some use cases there exist specialized drivers that are based on
   this driver.
   For example, for the Bayesian optimization of a scalar function the
   :ref:`BayesianOptimization` driver is easier to set up.

Each black box function
:math:`\mathbf{f}_{\rm bb}(\mathbf{p}_{\rm design}, \mathbf{p}_{\rm env})`
can depend on a vector of design parameters :math:`\mathbf{p}_{\rm design}\in\mathcal{X}` 
and possibly environment parameters :math:`\mathbf{p}_{\rm env}\in\mathcal{E}`. The
design space :math:`\mathcal{X}` and environment :math:`\mathcal{E}` can be
configured when creating the study (see :func:`~jcmoptimizer.Client.create_study`).
   
.. tip::
                
   The output of the black-box functions can be a also vectorial.
   For example, the black box function
   :math:`\mathbf{f}_{\rm bb}: \mathbf{p}_{\rm design} \mapsto (a,b,c)\in\mathbb{R}^3`
   could map a design value to three output values.
   The goal of the study could be to find design parameters that
   minimize :math:`a^2 - b^2` and fulfill the constraint :math:`c \leq 1`.

The most important configurations of the driver concerns three parameters:

:Surrogates: The dependence of the output values on the parameters are learned via
  one or more surrogate models (see `ActiveLearning.surrogates`_ parameter).

:Variables: The target values of the study are defined by variables that are based on the
  outputs of surrogate models or other variables (see `ActiveLearning.variables`_
  parameter).

:Objectives: Finally, one defines objectives of the active learning such as a minimizing
  a variable or constraining a variable in a specified interval
  (see `ActiveLearning.objectives`_ parameter).
          
 


Tutorials
=========
.. toctree::

   ../tutorials/changing_environment

   ../tutorials/harmonic_oscillator_fit

   ../tutorials/benchmark

   ../tutorials/sensitivity_analysis

   ../tutorials/multi_objective_optimization

   ../tutorials/active_surrogate_training



Driver Interface
================

The driver instance can be obtained by :attr:`.Study.driver`.

.. currentmodule:: jcmoptimizer
.. autoclass:: ActiveLearning
   :members:
   :inherited-members:
      

   



Configuration
=============
The configuration parameters can be set by calling, e.g.

.. code-block:: python

   study.configure(example_parameter1 = [1,2,3], example_parameter2 = True)



.. _ActiveLearning.max_iter:


max_iter (int)
""""""""""""""
   Maximum number of evaluations of the studied system. 

   Default: Infinite number of evaluations.

.. _ActiveLearning.max_time:


max_time (float)
""""""""""""""""
   Maximum run time of study in seconds.  The time is counted from the moment, that the parameter is set or reset. 

   Default: ``inf``

.. _ActiveLearning.num_parallel:


num_parallel (int)
""""""""""""""""""
   Number of parallel evaluations of the studied system. 

   Default: ``1``

.. _ActiveLearning.transformations:


transformations (list[dict])
""""""""""""""""""""""""""""
   A list of transformations of the parameter space. A surrogate can learn the mapping from the original or transformed parameter space to observation data for the model. 

   Default: ``[]``

   .. admonition:: Example

      A transformation that transformes the absolute parameters :math:`x_1, x_2` to relative parameters :math:`r = x_2-x_1, R=0.5(x_1 + x_2)`.

      .. code-block:: python

         [{'type': 'general', 'name': 'trans1', 'parameters': [{'type': 'expression', 'name': 'r', 'expression': 'x2 - x1', 'bounds': [-2,2]},
          {'type': 'expression', 'name': 'R', 'expression': '0.5*(x1 + x2)', 'bounds': [-2,2]}]}]

   Each element of the list must be a dict. In the following, the list elements are described:

   An input transformation allows to transform a parameter space input another parameter space. The transformation is defined by a list of parameters spanning the new parameter space.   
   
   See :ref:`trans configuration <GeneralTransformedSpace>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   GeneralTransformedSpace



.. _ActiveLearning.surrogates:


surrogates (list[dict])
"""""""""""""""""""""""
   A list of surrogate models used to learn the behavior  of the black box function. 

   Default: ``[{'type': 'GP', 'optimization_step_max': 100, 'min_optimization_interval': 2, 'max_optimization_interval': 20, 'optimization_level': 0.2, 'min_dist': 0.0, 'name': 'loss', 'output_dim': 1, 'detect_noise': False, 'num_fantasies': 16, 'mean_function': 'constant', 'warping_function': 'identity', 'joint_hyperparameters': True, 'correlate_outputs': True, 'tolerance': 1e-20, 'kernel': {'matern_order': 5}, 'outlier_detector': None, 'covariance_matrix': {'resize_step': 1000, 'iterative_start_size': 500, 'cuda_start_size': 2000, 'num_iterative_solver_steps': 0, 'online_start_size': 500, 'max_data_hyper_derivs': 10000}}]``

   .. admonition:: Example

      Two Gaussian process surrogates that learn the dependence of all together 4 blackbox outputs on the design parameters.

      .. code-block:: python

         
                             study.configure(surrogates=[
                                dict(type="GP", output_dim="3", name="intensities")
                                dict(type="GP", output_dim="1", name="loss")
                             ])
                             #add some data to "intensities" model
                             obs = study.new_observation()
                             obs.add([1.0,2.0,3.0], model_name="intensities")
                             obs.add([6.0], model_name="loss")
                             study.add_observation(obs, design_value=[7.4, 3.1])                    
                             

   Each element of the list must be a dict. The dict entry ``type`` specifies the type of the element. The remaining entries  specify its properties. In the following, all possible  list element types are described:

      **Gaussian process** (type ``'GP'``): A Gaussian process is a surrogate model that can predict a Gaussian distribution of model values based on the training data of the model :math:`f(x_1), f(x_2), \dots` for sampled points :math:`x_1, x_2, \dots \in \mathbb{R}^D`.  
      
      In the simplest case, the prediction of a noiseless, scalar  function value :math:`f(x^*)` is given a normally distributed  random variable  :math:`y\sim \mathcal{N}( \overline{y}(x^*),\sigma^2(x^*))`  with mean and variance  
      
      .. math::   \overline{y}(x^*) = \mu(x^*) + \sum_{ij} k(x^*,x_i) \mathbf{\Sigma}_{i j}^{-1}[f(x_j)-\mu(x_j)]  
      
        \sigma^2(x^*) = k(x^*,x^*) - \sum_{ij} k(x^*,x_i) \mathbf{\Sigma}_{i j}^{-1} k(x_j,x^*),  
      
      where :math:`\mathbf{\Sigma}` is the covariance matrix with entries  :math:`\mathbf{\Sigma}_{ij} = k(x_i, x_j)`.  
      
      The functional behavior of the mean :math:`\mu(x)` and the covariance function (also called kernel) :math:`k(x, x')` depend on a set of hyperparameters  :math:`\mathbf{h}` whose values are chosen to maximize the likelihood  of the observed function values. Depending on the configuration, the  Gaussian process can depend on more hyperparameters, that are optimized in the same manner.      
      
      See :ref:`GP configuration <GaussianProcess>` for details.

      **Neural Network Ensemble** (type ``'NN'``): Ensembles of fully connected deep neural networks with an arbitrary number of hidden layers and neurons, different activation functions and weight initialization schemes are used for estimation of the uncertainty of the model. In this way we are able to approximate Bayesian posterior and use it in the context of active learning as well as effectively reduce the chances of overfitting by averaging over the predictions of the ensemble members. Deep ensemble surrogates are expected to perform better than Gaussian processes in higher-dimensional parameter spaces due to the large number of degrees of freedom associated with parametric models. The training time of the deep ensemble surrogate scales linearly with the number of observations, in contrast to quadratic or cubic complexity characteristic to Gaussian processes, which renders them more suitable for large datasets.      
      
      See :ref:`NN configuration <NeuralNetworkEnsemble>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   GaussianProcess
   NeuralNetworkEnsemble



.. _ActiveLearning.variables:


variables (list[dict])
""""""""""""""""""""""
   A list of variables that can depend on the outputs  of surrogates or other variables that are further up in the list. 

   Default: ``[{'type': 'SingleSelector', 'name': 'v', 'input': 'loss', 'select_by_name': 'loss'}]``

   .. admonition:: Example

      An expression based on the outputs ``variable1``, ``variable2``.

      .. code-block:: python

         [{'type': 'Expression', 'expression': 'variable1*sin(2*PI*variable2)^2'}]

   Each element of the list must be a dict. The dict entry ``type`` specifies the type of the element. The remaining entries  specify its properties. In the following, all possible  list element types are described:

      **Single-selector variable** (type ``'SingleSelector'``):  
      
      This variable selects one input name of a surrogate model or a multi-output variable.      
      
      See :ref:`SingleSelector configuration <SingleSelectorVariable>` for details.

      **Multi-selector variable** (type ``'MultiSelector'``):  
      
      This variable selects multiple outputs of a surrogate model or a multi-output variable.      
      
      See :ref:`MultiSelector configuration <MultiSelectorVariable>` for details.

      **Collector variable** (type ``'Collector'``):  
      
      This variable collects outputs of multiple surrogate models or variables.      
      
      See :ref:`Collector configuration <CollectorVariable>` for details.

      **Linear combination variable** (type ``'LinearCombination'``):  
      
      This variable describes a linear combination  :math:`c_0 + c_1 \cdot y_1 + \cdots + c_N \cdot y_N` of :math:`N` inputs  :math:`y_1, y_2, \cdots, y_N` stemming either from a surrogate with :math:`N` inputs or from any set of :math:`N` variables.      
      
      See :ref:`LinearCombination configuration <LinearCombinationVariable>` for details.

      **Chi-squared variable** (type ``'ChiSquaredValue'``):  
      
      This variable describes the chi-squared deviation   
      
      .. math::   \chi^2 = \sum_{i=1}^K \frac{\left(t_i - y_i\right)^2}{\eta_i^2}.  
      
      between `K` outputs :math:`y_i` of a surrogate and `K` targets  :math:`t_i` scaled by the target uncertainties :math:`\eta_i`.   
      
      If the target uncertainties are not independent but correlated, it is also possible to specify the covariance matrix :math:`G` of the targets. In this case, the chi-squared deviation is given as  
      
      .. math::   \chi^2 = \sum_{i,j=1}^K (t_i - y_i) G_{i j}^{-1} (t_j - y_j).  
      
      If the predictions of the surrogate, :math:`y_i`,  follow a Gaussian distribution, the variable follows a  generalized chi-squared distribution.      
      
      See :ref:`ChiSquaredValue configuration <ChiSquaredVariable>` for details.

      **Negative log-probability variable** (type ``'NegLogProbability'``): This variable describes the negative log-probability of model parameters :math:`\mathbf{p}` given a set of :math:`K` measurement targets :math:`t_i`, :math:`i=1,\ldots,K`.  
      
      We assume that the measurement process can be accurately modeled by a function :math:`\mathbf{f}(\mathbf{p}) \in \mathbb{R}^K`. That is, the vector of measurements :math:`\mathbf{t}` is a random vector with  
      
      .. math::   \mathbf{t} = \mathbf{f}(\mathbf{p}) + \mathbf{w}  
      
      with :math:`\mathbf{w} \sim \mathcal{N}(0, \mathbf{G})`, where :math:`\mathbf{G}` is a covariance matrix of the measurement errors. In most cases the covariance matrix is diagonal (i.e. the measurement errors are uncorrelated) with diagonal entries :math:`G_{ii} = \eta_i^2`.  
      
      Sometimes, the measurement noise is known. However, generally one has to find a parameterized error model :math:`\eta_i(\mathbf{d})` for the variance itself.  A common choice is to assume that the error is composed of a background term :math:`b` and a noise contribution which scales linearly with :math:`f_i(\mathbf{p})`:  
      
      .. math::  
      
         \eta_i^2(a,b,\mathbf{p}) = b^2 + \left[a    f_i(\mathbf{p})\right]^2  
      
      Since every entry of the measurement vector follows a normal distribution :math:`t_i \sim \mathcal{N}(f_i(\mathbf{p}),\eta_i(\mathbf{d}))` the joint *likelihood* of measuring the vector :math:`\mathbf{t}` is given as  
      
      .. math::    P(\mathbf{t} | \mathbf{p}, \mathbf{d}) = \prod_{i=1}^K    \frac{1}{\sqrt{2\pi}\eta_i(\mathbf{d})}\exp\left[-\frac{1}{2}\left(\frac{t_i -    f_i(\mathbf{p})}{\eta_i(\mathbf{d})}\right)^2\right].  
      
      Sometimes, non-uniform *prior distributions* for the design parameter vector :math:`P_\text{prior}(\mathbf{p})` and the error model parameters :math:`P_\text{prior}(\mathbf{d})` are available. The *posterior distribution* is then proportional to  
      
      .. math::  
      
         P(\mathbf{p}, \mathbf{d} | \mathbf{t}) \propto P(\mathbf{t}    | \mathbf{p}, \mathbf{d})    P_\text{prior}(\mathbf{p}) P_\text{prior}(\mathbf{d})  
      
      .. warning::    If a :ref:`parameter distribution<ActiveLearning.parameter_distribution>`    for the design space is defined, make sure    that it has a non-vanishing probability distribution within the    boundaries of the design space. Otherwise the negative log-probability can    be infinite. In this case the sample computation is numerically unstable.  
      
      Alltogether, the target of finding the parameters with maximum posterior probability density is equivalent of minimizing the value of the negative log-likelihood  
      
      .. math::    \begin{split}    -\log\left(P(\mathbf{p}, \mathbf{d}| \mathbf{t})\right) = &    \frac{1}{2} K\log(2\pi)    +\sum_{i=1}^K\log\left(\eta_i(\mathbf{d})\right)     +\frac{1}{2}\sum_{i=1}^K \left(       \frac{t_i - f_i(\mathbf{p})}{\eta_i(\mathbf{d})}    \right)^2 \\    &-\log\left(P_\text{prior}(\mathbf{d})\right)    -\log\left(P_\text{prior}(\mathbf{p})\right).    \end{split}.      
      
      See :ref:`NegLogProbability configuration <NegLogProbVariable>` for details.

      **Expression variable** (type ``'Expression'``):  
      
      This variable describes a general mathematical expression of known parameters and predicted variables stemming either from surrogates or from any previously defined variable. This variable type is the most flexible. However, its distribution is not determined analytically, but only through sampling from the posterior distribution(s) of the input variables and evaluating the expression of each of the samples. For linear or quadratic expressions it is generally advisable to use the more  specialized variables instead.      
      
      See :ref:`Expression configuration <ExpressionVariable>` for details.

      **Interpolation variable** (type ``'Interpolation'``):  
      
      This variable up-samples the output of a surrogate model  by interpolating between the output values. An interpolation  can reduce the cost of obtaining observations at high resolution.      
      
      See :ref:`Interpolation configuration <InterpolationVariable>` for details.

      **Scan variable** (type ``'Scan'``):  
      
      This variable scans the output of a surrogate model  over design or environment parameters. A typical use case is to determine the system behavior based on an interpolation between data points for different environment parameter values. An interpolation can reduce the cost of obtaining  observations at small steps between environment parameters.      
      
      See :ref:`Scan configuration <ScanVariable>` for details.

      **Least square fit variable** (type ``'Fit'``):  
      
      This variable fits a parameter vector :math:`\mathbf{p}` consisting of  :math:`M` parameters :math:`p_1, p_2,\dots, p_M` to a model function :math:`f(\mathbf{x}, \mathbf{p})` that depends on the model variables :math:`x_1, x_2,\dots, x_D`. The output of the variable is a least-square estimate of the fit parameters to data points  :math:`(\mathbf{x}_i, y_i), i=1,\dots,N`, where :math:`y_1,\dots,y_n` are the mean values of the input. That is, the variable locally minimizes  
      
      .. math::   \chi^2(\mathbf{p}) = \sum_{i=1}^N \frac{\left(f\left(\mathbf{x}_i, \mathbf{p}\right) - y_i\right)^2}{\eta_i^2},  
      
      where :math:`\eta_i^2` is the variance of input :math:`i` to the variable.      
      
      See :ref:`Fit configuration <FitVariable>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   SingleSelectorVariable
   MultiSelectorVariable
   CollectorVariable
   LinearCombinationVariable
   ChiSquaredVariable
   NegLogProbVariable
   ExpressionVariable
   InterpolationVariable
   ScanVariable
   FitVariable



.. _ActiveLearning.objectives:


objectives (list[dict])
"""""""""""""""""""""""
   A list of objectives that define which system  parameters are optimal. 

   Default: ``[{'type': 'Minimizer', 'name': 'objective', 'strategy': 'EI', 'localize': False, 'min_val': -inf, 'min_PoI': 1e-16, 'min_uncertainty': 0.0, 'min_acq_val': -inf}]``

   .. admonition:: Example

      A minimization objective together  with an outcome constraint.

      .. code-block:: python

         [{'type': 'Minimizer', 'variable': 'variable1', 'strategy': 'EI'},
          {'type': 'Constrainer', 'variable': 'variable2', 'lower_bound': 0.0, 'upper_bound': 1.5}]

   Each element of the list must be a dict. The dict entry ``type`` specifies the type of the element. The remaining entries  specify its properties. In the following, all possible  list element types are described:

      **Minimization objective** (type ``'Minimizer'``): The objective is to minimize a target variable.      
      
      See :ref:`Minimizer configuration <MinimizationObjective>` for details.

      **Multi-objective minimization** (type ``'MultiMinimizer'``): The objective is to learn the *Pareto front* of multiple minimization objectives.  A vector of objective values :math:`\mathbf{y}\in \mathbb{R}^q` lies at the Pareto front if the improvement (decrease) if it is not dominated by any other possible vector of outcomes. A vector :math:`\mathbf{y}_1` is dominated by a vector :math:`\mathbf{y}_2` if all entries of :math:`\mathbf{y}_2` are smaller or equal the entries of :math:`\mathbf{y}_1`.  
      
      The strategy to find samples close or at the Pareto front is to maximize the hypervolume enclosed by all non-dominated sampling points and the upper reference point :math:`\mathbf{y}_{\rm upper}`. The progress is defined in terms of the decreasing hypervolume between the lower reference point :math:`\mathbf{y}_{\rm lower}` and the non-dominated sampling points.      
      
      See :ref:`MultiMinimizer configuration <MultiMinimizationObjective>` for details.

      **Exploration objective** (type ``'Explorer'``): The objective is to learn the global behaviour of a  target variable by exploring regions with maximal uncertainty.      
      
      See :ref:`Explorer configuration <ExplorationObjective>` for details.

      **Outcome constraint** (type ``'Constrainer'``): The objective is to constrain the value of a variable within a one-sided or two-sided interval. The sampling strategy tries to  maximize the probability that sampled values meet the constraint. This can however not be guaranteed.      
      
      See :ref:`Constrainer configuration <OutcomeConstraint>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   MinimizationObjective
   MultiMinimizationObjective
   ExplorationObjective
   OutcomeConstraint



.. _ActiveLearning.acquisition_function:


acquisition_function (dict)
"""""""""""""""""""""""""""
   A module that defines additional parameters  of the acquisition function. 

   Default: ``{'parameter_uncertainties': [], 'num_samples': 10000, 'smooth_range': 0.2}``
   

   The acquisition function defines the sampling strategy. That is, new samples for the evaluation of the black-box function are  generated by maximizing the acquisition function. The acquisition strategy  itself is defined in the ``objective``. This module allows to configure  additional properties of the acquisition function.   
   
   See :ref:`acquisition_function configuration <AcquisitionFunction>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   AcquisitionFunction



.. _ActiveLearning.acquisition_optimizer:


acquisition_optimizer (dict)
""""""""""""""""""""""""""""
   A module that maximized the acquisition function  to determine new suggestions. 

   Default: ``{'adaptive_local_search': True, 'compute_suggestion_in_advance': True}``
   

   The module optimizes the acquisition function  of the main objective of the study by a combination of a  heuristic global optimization followed by a local convergence  of the best result.   
   
   See :ref:`acquisition_optimizer configuration <AcquisitionOptimizer>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   AcquisitionOptimizer



.. _ActiveLearning.scaling:


scaling (float)
"""""""""""""""
   Scaling parameter of the model uncertainty.  For scaling :math:`\gg 1.0` (e.g. ``scaling=10.0``) the search is more explorative. For scaling :math:`\ll 1.0` (e.g. ``scaling=0.1``) the search becomes more greedy (e.g. any local minimum is intensively exploited). 

   Default: ``1.0``

.. _ActiveLearning.vary_scaling:


vary_scaling (bool)
"""""""""""""""""""
   If true, the scaling parameter is randomly varied  between 0.1 and 10. 

   Default: ``True``

.. _ActiveLearning.parameter_distribution:


parameter_distribution (dict)
"""""""""""""""""""""""""""""
   Probability distribution of design and environment parameters. 

   Default: ``{'include_study_constraints': False, 'distributions': [], 'constraints': []}``
   

   Probability distribution of design and environment parameters defined by distribution functions and constraints. The definition of the parameter distribution can have several effects:  
   
   * In a call to the method ``get_statistics`` of the driver interface     the value of interest is averaged over samples drawn from the     space distribution.  
   
   * In a call to the method ``run_mcmc`` of the driver interface     the space distribution acts as a prior distribution.  
   
   * In a call to the method ``get_sobol_indices`` of the driver interface     the space distribution acts as a weighting factor for determining     expectation values.  
   
   * In an :ref:`ActiveLearning` driver, one can access the value of the     log-probability density (up to an additive constant) by the name     ``'log_prob'`` in any expression, e.g. in     :ref:`ExpressionVariable`,     :ref:`LinearCombinationVariable`.   
   
   See :ref:`parameter_distribution configuration <SpaceDistribution>` for details.

.. toctree::
   :maxdepth: 100 
   :hidden:
   
   SpaceDistribution