Physics-informed Bayesian optimization
- Driver:
- Download script:
In this tutorial we discuss physics-informed Bayesian optimization (PIBO) [SEK2025]. In contrast to standard Bayesian optimization (BO), the surrogate model is not trained on scalar objective values \(f(\mathbf{x}) \in \mathbb{R}\) but on the physical response \(\mathbf{r}(\mathbf{x}) \in \mathbb{R}^K\) of a system (simulation or experiment). We assume that while obtaining \(\mathbf{r}(\mathbf{x}^*)\) for some \(\mathbf{x}^*\) is expensive, the mapping \(f(\mathbf{x}) = \Phi(\mathbf{r}(\mathbf{x}))\) is an analytic function that can be efficiently evaluated.
The mapping \(\Phi: \mathbb{R}^K \rightarrow \mathbb{R}\) leads intuitively to an information loss. If moreover \(\Phi\) is a non-linear function that adds some complexity, it can be easier to learn the behaviour of the channels \(\mathbf{r}(\mathbf{x})\) than to learn the behavior of \(f(\mathbf{x})\).
As an example we consider a spectral filter consisting of alternating layers of a dielectric layer \(S\) made of silicon oxide (SiO2) and a metallic layer \(A\) made of silver (Ag). The device has the general structure \(S(AS)^N\) which is surrounded by air (refractive index \(n=1\)). We consider \(N=4\) layer pairs which leads to a \(D = 2N + 1 = 9\) dimensional optimization problem.
By varying the thicknesses, the objective is to have a uniform transmittance \(t(\lambda)\) in a short-wavelength regime between \(\lambda_{\rm short} = 800\) nm and \(\lambda_{\rm gap} = 1300\) nm of at least 40% on average and to minimize the transmittance in a long-wavelength regime between \(\lambda_{\rm gap}\) and \(\lambda_{\rm long}=1800\) nm:
Luckily, the transmission of light through the material stack can be simulated
efficiently using the transfer-matrix method. We employ the Python package
tmm-fast which also allows to compute derivatives with respect to the \(D=9\)
thicknesses using back propagation. We learn \(K=50\) samples \(\mathbf{t}
= [\mathbf{t}_{\rm short}, \mathbf{t}_{\rm long}] \in \mathbb{R}^K\) of the
transmission spectrum \(t(\lambda)\) between \(\lambda_{\rm short}\) and
\(\lambda_{\rm long}\) using a multi-output GP.
To showcase the advantage of PIBO, we compare its convergence behaviour to that of a global heuristic optimization method, differential evolution (DE). Like other heuristic methods DE cannot make use of derivative information. This would explain a speedup of a factor of \(\leq D + 1 = 10\) since 10 samples are required to estimate the gradient of the function. However, we observe a speedup factor of more than 100. The reason is that even after the very first evaluation, PIBO learns already a local linear model of the transmission spectrum and hence a local quadratic model of the loss function. After some more iterations it has a very good global model of the loss function and can identify local minima very efficiently.
Sekulic, I, et al. “Physics-informed Bayesian optimization of expensive-to-evaluate black-box functions”. Mach. Learn.: Sci. Technol. 6 040503 (2025)
1import sys,os
2import numpy as np
3import time
4import torch
5import numpy as np
6import scipy
7import requests
8from dispersion import Spectrum, Material # run pip install dispersion if not installed
9import tmm_fast # run pip install tmm-fast if not installed
10import matplotlib.pyplot as plt
11from matplotlib import patches
12
13torch.set_default_dtype(torch.float64)
14
15
16from jcmoptimizer import Client, Study, Observation
17client = Client()
18
19
20K = 50 # number spectral sampling points
21K_short = K_long = int(K/2) #samples in short and long wavelength range
22wl = torch.linspace(800, 1800, K) * 1e-9 #Wavelength scan
23N = 4 # number N in layer stack S(AS)^N
24D = 2*N + 1 #dimensionality of search space
25theta = torch.tensor([0.0]) #angle of incidence
26num_layers = 2*N + 1 + 2 #total number of layers including air
27num_stacks = 1 #only one stack is simulated in each iteration
28
29#Load material data refractiveindex.info
30if not os.path.exists("SiO2.yml"):
31 r = requests.get("https://refractiveindex.info/database/data/main/SiO2/nk/Malitson.yml")
32 open("SiO2.yml", 'wb').write(r.content)
33if not os.path.exists("Ag.yml"):
34 r = requests.get("https://refractiveindex.info/database/data/main/Ag/nk/Johnson.yml")
35 open("Ag.yml", 'wb').write(r.content)
36
37#create stack of refractive indices S(AS)^N
38M = torch.full((num_stacks, num_layers, wl.shape[0]), 1+0j)
39M[0,0,:] = M[0,-1,:] = 1.0 #first and last layer is air
40SiO2 = Material(file_path="SiO2.yml", unit="um")
41Ag = Material(file_path="Ag.yml", unit="um")
42spm = Spectrum(wl.numpy(), spectrum_type='Wavelength', unit='meter')
43#Set n-k data for each layer and wavelength
44M[0,1:-1:2,:] = torch.tensor(SiO2.get_nk_data(spm))
45M[0,2:-1:2,:] = torch.tensor(Ag.get_nk_data(spm))
46
47#design space (units nm)
48design_space_SiO2 = [{'name': f'sio2_{i+1}', 'domain': (100, 500)} for i in range(N+1)]
49design_space_Ag = [{'name': f'ag_{i+1}', 'domain': (2, 8)} for i in range(N)]
50design_space = design_space_SiO2 + design_space_Ag
51param_names = [d['name'] for d in design_space]
52# Creation of the study object with study_id 'physics_informed_bayesian_optimization'
53study = client.create_study(
54 design_space=design_space,
55 driver="ActiveLearning",
56 study_name="Physics-informed Bayesian optimization",
57 study_id="physics_informed_bayesian_optimization"
58)
59study.configure(
60 max_iter=30,
61 surrogates=[
62 dict(type='GP', name="model", output_dim=K,
63 # To speed things up, we run a single hyperparameter optimization after
64 # 5 iterations (= 5*(D+1) observations of function values and derivatives)
65 max_optimization_interval=5*(D+1),
66 optimization_step_max=5*(D+1),
67 covariance_matrix=dict(max_data_hyper_derivs=5*(D+1))
68 )
69 ],
70 variables=[
71 # Selector of transmissions in short wangelength range
72 dict(type="MultiSelector", name="trans_short", input="model",
73 select_range=(0, K_short-1), output_dim=K_short),
74 # Selector of transmissions in long wangelength range
75 dict(type="MultiSelector", name="trans_long", input="model",
76 select_range=(K_short, K-1), output_dim=K_long),
77 # Mean transmission in short wangelength range
78 dict(type="LinearCombination", name="avg_short", inputs=["trans_short"]),
79 # Mean transmission in long wangelength range
80 dict(type="LinearCombination", name="avg_long", inputs=["trans_long"]),
81 # Chi^2 = sum of squared transmission sum(t_i^2) in short wavelength range
82 dict(type="ChiSquaredValue", name="chi_squared_short", input="trans_short"),
83 # Loss: variance of short wavelengths + squared average of long wavelengths
84 dict(type="Expression", name="loss",
85 expression=f"sqrt(max(0, chi_squared_short/{K_short} - avg_short^2)) + avg_long"
86 )
87 ],
88 objectives=[
89 dict(type="Minimizer", variable="loss", name="objective"),
90 dict(type="Constrainer", variable="avg_short", lower_bound=0.4,
91 name="min_transmission_short"),
92 ],
93 acquisition_optimizer=dict(
94 num_training_samples=5, compute_suggestion_in_advance=False
95 )
96)
97# Transmission spectrum for thickness tensor D
98def transmissions(D: torch.Tensor) -> torch.Tensor:
99 T = torch.zeros((1,num_layers))
100 T[0, 1:-1:2] = D[:N+1] # SiO2 thicknesses
101 T[0, 2:-1:2] = D[N+1:] # Ag thicknesses
102 T[0, 0] = T[0, -1] = np.inf
103 # Get solution using transfer-matrix method
104 output = tmm_fast.coh_tmm('s', M, T * 1e-9, theta, wl, device='cpu')
105 return output["T"][0,0,:]
106
107# Evaluation of the black-box function for specified design parameters
108def evaluate(study: Study, **kwargs: float) -> Observation:
109 observation = study.new_observation()
110 D = torch.tensor([kwargs[param_names[i]] for i in range(2*N + 1)])
111 T = transmissions(D)
112 Jac = torch.autograd.functional.jacobian(transmissions, D)
113 observation.add(T.tolist())
114 for idx, param in enumerate(param_names):
115 observation.add(Jac[:,idx].tolist(), derivative=param)
116 return observation
117
118# Run the minimization
119study.set_evaluator(evaluate)
120study.run()
121kwargs = study.get_state("driver.best_sample")
122D = torch.tensor([kwargs[param_names[i]] for i in range(2*N + 1)])
123T = transmissions(D)
124fig, ax = plt.subplots(nrows=2, height_ratios=[3,1], figsize=(8, 4))
125
126ax[0].vlines(1300, 0, 0.8, colors="black", linestyles="dashed",
127 label=r"$\lambda_{\rm gap}=1300$ nm")
128ax[0].plot(wl*1e9, T, ".-", label="spectral response")
129ax[0].set_xlabel(r"$\lambda [nm]$")
130ax[0].legend()
131ax[0].grid()
132
133T = [0]*(2*N + 1)
134D = [kwargs[param_names[i]] for i in range(2*N + 1)]
135T[0::2] = D[:N+1]
136T[1::2] = D[N+1:]
137for idx, t in enumerate(T):
138 rect = patches.Rectangle(
139 xy=(wl[0] + sum(T[:idx]),0), width=T[idx], height=1,
140 facecolor="lightblue" if idx % 2==0 else "white"
141 )
142 ax[1].add_patch(rect)
143 ax[1].set_xlim(0, sum(T))
144 ax[1].text(y=0.5, x=sum(T[:idx]) + 0.5*T[idx],
145 s="SiO$_2$" if idx % 2==0 else "Ag",
146 horizontalalignment='center', verticalalignment='center')
147 ax[1].set_xlabel("Layer Stack Thicknesses [nm]")
148 ax[1].get_yaxis().set_visible(False)
149plt.tight_layout()
150plt.show()
151fig.savefig("optimized_filter.svg", transparent=True)
152
153# As a comparison, we run a global differential evolution (DE)
154# and save evaluated objective values.
155Y_DE: list[float] = []
156
157def objective(x: np.ndarray) -> float:
158 T = transmissions(torch.tensor(x))
159 T_short = T[:K_short]
160 T_long = T[K_short:]
161 loss = T_short.std() + T_long.mean()
162 # Penalized loss is zero if constraint is not fulfilled, otherwise it's negative.
163 penalized_loss = (loss - 1) * torch.heaviside(
164 T_short.mean() - 0.4, values = torch.tensor(0.0)
165 )
166 Y_DE.append(penalized_loss.item() + 1)
167 return penalized_loss.item()
168
169scipy.optimize.differential_evolution(
170 objective,
171 bounds=[info["domain"] for info in design_space],
172 polish=False, # Run standard DE without L-BFGS-B
173 popsize=6, # Generation size = 6 * D = 54
174 maxiter=100 # max 100 generations -> 100 * 54 = 5400 evaluations
175)
176
177Y_BO = study.driver.get_observed_values("variable", "loss")
178fig, ax = plt.subplots(figsize=(8, 4))
179ax.plot(Y_DE, ".", color="steelblue", label="Differential evloution")
180ax.plot(np.minimum.accumulate(Y_DE), "-", color="steelblue")
181ax.plot(Y_BO["means"], ".", color="darkorange", label="Physics-informed BO")
182ax.plot(np.minimum.accumulate(Y_BO["means"] + [min(Y_BO["means"])]*3500),
183 "-", color="darkorange")
184ax.set_xscale("log")
185ax.set_yscale("log")
186ax.set_xlabel("# Iterations")
187ax.set_ylabel("loss")
188ax.legend()
189ax.grid()
190plt.show()
191fig.savefig("comparison_PIBO_DE.svg", transparent=True)
192
193
194client.shutdown_server()
Top: Spectral response of optimized filter design. Bottom: Geometry of optimized filter design.
Comparison of the convergence behaviour of physics-informed Bayesian optimization (PIBO), which uses gradient information, and the heuristic global optimization method differential evolution (DE), which cannot exploit gradient information. PIBO converges two orders of magnitude faster to the optimal design than DE.