.. _physics_informed_bayesian_optimization:

Physics-informed Bayesian optimization
==========================================================================================


:Driver: :ref:`ActiveLearning`


:Download script: :download:`physics_informed_bayesian_optimization.py`


In this tutorial we discuss physics-informed Bayesian optimization (PIBO) [SEK2025]_.
In contrast to standard Bayesian optimization (BO), the surrogate model is not
trained on scalar objective values :math:`f(\mathbf{x}) \in \mathbb{R}` but on the
*physical response* :math:`\mathbf{r}(\mathbf{x}) \in \mathbb{R}^K` of a system
(simulation or experiment). We assume that while obtaining
:math:`\mathbf{r}(\mathbf{x}^*)` for some :math:`\mathbf{x}^*` is expensive, the
mapping :math:`f(\mathbf{x}) = \Phi(\mathbf{r}(\mathbf{x}))` is an analytic function
that can be efficiently evaluated.

The mapping :math:`\Phi: \mathbb{R}^K \rightarrow \mathbb{R}` leads intuitively to an
information loss. If moreover :math:`\Phi` is a non-linear function that adds some
complexity, it can be easier to learn the behaviour of the channels
:math:`\mathbf{r}(\mathbf{x})` than to learn the behavior of :math:`f(\mathbf{x})`.

As an example we consider a spectral filter consisting of alternating layers of a
dielectric layer :math:`S` made of silicon oxide (SiO\ :sub:`2`) and a metallic layer
:math:`A` made of silver (Ag). The device has the general structure :math:`S(AS)^N`
which is surrounded by air (refractive index :math:`n=1`). We consider :math:`N=4`
layer pairs which leads to a :math:`D = 2N + 1 = 9` dimensional optimization problem.

By varying the thicknesses, the objective is to have a uniform transmittance
:math:`t(\lambda)` in a short-wavelength regime between :math:`\lambda_{\rm short} =
800` nm and :math:`\lambda_{\rm gap} = 1300` nm of at least 40% on average and to
minimize the transmittance in a long-wavelength regime between :math:`\lambda_{\rm
gap}` and :math:`\lambda_{\rm long}=1800` nm:

.. math::
   
   \text{minimize}\,\, \text{STD}_{\rm short}[t] + \mathbb{E}_{\rm long}[t] &= 
   \sqrt{
      \int_{\lambda_{\rm short}}^{\lambda_{\rm gap}} t^2(\lambda){\rm d}\lambda 
      - \left(
        \int_{\lambda_{\rm short}}^{\lambda_{\rm gap}} t(\lambda){\rm d}\lambda 
      \right)^2 
   }
   + \int_{\lambda_{\rm gap}}^{\lambda_{\rm long}} t(\lambda){\rm d}\lambda
     
   \text{such that}\,\, \mathbb{E}_{\rm short}[t] &=
   \int_{\lambda_{\rm short}}^{\lambda_{\rm gap}} t(\lambda){\rm d}\lambda > 0.4\,.

Luckily, the transmission of light through the material stack can be simulated
efficiently using the transfer-matrix method. We employ the Python package
``tmm-fast`` which also allows to compute derivatives with respect to the :math:`D=9`
thicknesses using back propagation. We learn :math:`K=50` samples :math:`\mathbf{t}
= [\mathbf{t}_{\rm short}, \mathbf{t}_{\rm long}] \in \mathbb{R}^K` of the
transmission spectrum :math:`t(\lambda)` between :math:`\lambda_{\rm short}` and
:math:`\lambda_{\rm long}` using a *multi-output* GP.

To showcase the advantage of PIBO, we compare its convergence behaviour to that of a
global heuristic optimization method, differential evolution (DE). Like other
heuristic methods DE cannot make use of derivative information.  This would explain a
speedup of a factor of :math:`\leq D + 1 = 10` since 10 samples are required to
estimate the gradient of the function. However, we observe a speedup factor of more
than 100. The reason is that even after the very first evaluation, PIBO learns
already a local linear model of the transmission spectrum and hence a **local
quadratic model** of the loss function. After some more iterations it has a very good
global model of the loss function and can identify local minima very efficiently.

.. [SEK2025] Sekulic, I, *et al*. "Physics-informed Bayesian optimization of
             expensive-to-evaluate black-box functions".
             *Mach. Learn.: Sci. Technol.* **6** 040503 (2025)



.. literalinclude:: ./physics_informed_bayesian_optimization.py
   :language: python
   :linenos:


 

.. figure:: images/physics_informed_bayesian_optimization/optimized_filter.svg
   :alt: Optimized filter

   **Top:** Spectral response of optimized filter design.
   **Bottom:** Geometry of optimized filter design.

.. figure:: images/physics_informed_bayesian_optimization/comparison_PIBO_DE.svg
   :alt: Comparison between physics-informed BO and differential evolution

   Comparison of the convergence behaviour of physics-informed Bayesian optimization
   (PIBO), which uses gradient information, and the heuristic global optimization
   method differential evolution (DE), which cannot exploit gradient information.
   PIBO converges **two orders of magnitude** faster to the optimal design than DE.