Optimal control of a system in a changing environment
- Driver:
Download script: changing_environment.m
The target of the study is show how to control a system that depends on environment parameters such as temperature or humidity. While the environment parameters can be measured, their influence on the system’s performance is often unknown.
As an example objective the 2d Rastrigin function
is considered. The environment parameter \(\phi\) acts as an additional phase offset to the first cosine function in the objective function
The phase shall slowly vary over time as
Please note, that this specific time dependent behaviour is not exploited and is assumed to be unknown.
Before being able to control the system in an optimal way depending on the environment, one has to learn for many environment values, where the global minimum is located. To this end, a standard Bayesian optimization is performed for 500 iterations that explores the parameter space. In a second phase, the target is to evaluate the system in an optimal way, i.e. an exploration of the parameter space is not desired. This behaviour is mainly enforced by choosing a small scaling value.
The control phase could have an arbitrary number of iterations and it would be problematic to add all new observations to the study. On the one hand, this slows down the computation time of a suggestion. Since the environment value changes during the computation, this can lead to less optimal evaluation points. On the other had, adding more and more data points close to each other leads to an ill conditioned Gaussian process surrogate. To avoid these drawbacks, data points are not added in the control phase if the study predicts a value with very small uncertainty, which means that the observation would not add significant information.
1server = jcmoptimizer.Server();
2client = jcmoptimizer.Client('host', server.host);
3
4%Rastrigin-like function depending on additional phase offset phi
5function val=rast(x1, x2, phi)
6 val = 10*2 + x1^2 + x2^2 - 10*cos(2*pi*x1 + phi) - 10*cos(2*pi*x2);
7end
8
9%time-dependent slowly varying phi
10function phi=current_phi()
11 t = now()*1e5;
12 phi = 2*pi*sin(t/180);
13end
14
15% Definition of the search domain
16design_space = { ...
17 struct('name', 'x1', 'type', 'continuous', 'domain', [-1.5,1.5]), ...
18 struct('name', 'x2', 'type', 'continuous', 'domain', [-1.5,1.5]) ...
19};
20
21% Definition of the environment variable "phi"
22environment = { ...
23 struct('name', 'phi', 'type', 'variable', 'domain', [-2*pi, 2*pi]) ...
24};
25
26 % Creation of the study object with study_id 'changing_environment'
27study = client.create_study( ...
28 'design_space', design_space, ...
29 'environment', environment, ...
30 'driver','ActiveLearning',...
31 'study_name','Optimal control of a system in a changing environment',...
32 'study_id', 'changing_environment');
33
34% In the initial training phase, the target is to explore the
35% parameter space to find the global minimim.
36study.configure( ...
37 ... train with 500 data points
38 'max_iter', 500, ...
39 ... Advanced sample computation is switched off since the environment
40 ... parameter phi can change significantly between computation
41 ... of the suggestion and evaluation of the objective function
42 'acquisition_optimizer', struct('compute_suggestion_in_advance', false) ...
43);
44
45% Evaluation of the black-box function for specified design parameters
46function observation = evaluate(study, sample)
47
48 pause(2); % make objective expensive
49 observation = study.new_observation();
50 % get current phi
51 phi = current_phi();
52 observation.add(rast(sample.x1, sample.x2, phi), 'environment_value', {phi});
53
54end
55
56% Run the minimization
57study.set_evaluator(@evaluate);
58study.run();
59
60
61% The target in the control phase is to evaluate the offet Rastrigin function only
62% at well performing (x1,x2)-point depending on the current value of the environment.
63MAX_ITER = 500; %evaluate for 500 additional iterations
64study.configure( ...
65 'max_iter', 500 + MAX_ITER, ...
66 ... The scaling is reduced to penalize parameters with large uncertainty
67 'scaling', 0.01, ...
68 ... The lower-confidence bound (LCB) strategy is chosen instead of the
69 ... default expected improvement (EI). LCB is easier to maximize at the
70 ... risk of less exploration of the parameter space, which is anyhow not
71 ... desired in the control phase.
72 'objectives', {...
73 struct('type', 'Minimizer', 'name', 'objective', 'strategy', 'LCB') ...
74 }, ...
75 'acquisition_optimizer', struct('compute_suggestion_in_advance', false) ...
76);
77
78% keep track of suggested design points and phis at request time and evaluation time
79design_points = {};
80phis_at_request = {};
81phis_at_eval = {};
82
83iter = 0;
84while not(study.is_done())
85 iter = iter + 1;
86 if iter > MAX_ITER
87 break
88 end
89
90 phi = current_phi();
91 suggestion = study.get_suggestion({phi});
92 phis_at_request{end + 1} = phi;
93 sample = suggestion.sample;
94 design_points{end + 1} = [sample.x1, sample.x2];
95 try
96 obs = evaluate(study, sample);
97 %update phi from observation
98 phi = obs.data.None{1}.env{1};
99 phis_at_eval{end + 1} = phi;
100
101 predictions = study.driver.predict({[sample.x1, sample.x2, phi]});
102 std = sqrt(predictions.variance(1, 1));
103
104 fprintf("Uncertainty of prediction %f\n", std);
105 % add data only if prediction has significant uncertainty
106 if std > 0.01
107 study.add_observation(obs, suggestion.id);
108 else
109 study.clear_suggestion( ...
110 suggestion.id, sprintf("Ignoring observation with uncertainty %f", std));
111 end
112
113 catch err
114 study.clear_suggestion( ...
115 suggestion.id, ...
116 sprintf("Evaluator function failed with error: %s", err.message) ...
117 )
118 rethrow(err);
119 end
120
121end
122
123
124fig = figure('Position', [0, 0, 1000, 500]);
125
126% all observed training samples
127observed = study.driver.get_observed_values();
128subplot(1, 2, 1)
129plot(observed.means,".")
130hold on;
131xline(500, '--');
132xlabel("training+control iteration")
133ylabel("observed value of Rastrigin function")
134
135%observed values during control phase
136observed_vals = [];
137
138%values that would have been observed at request time,
139%i.e. if there would be no time delay between request and
140%evaluation of suggestion
141observed_vals_at_request = [];
142
143for i = 1:length(design_points)
144 observed_vals(i) = rast( ...
145 design_points{i}(1), design_points{i}(2), phis_at_eval{i});
146 observed_vals_at_request(i) = rast( ...
147 design_points{i}(1), design_points{i}(2), phis_at_request{i});
148end
149
150%best value of x1-parameter depending on environment
151function val = best_x1(phi)
152 val = -phi/(2*pi);
153 if abs(phi) > pi
154 val = val + sign(phi);
155 end
156end
157
158% best possible values
159best_vals = [];
160for i = 1:length(phis_at_eval)
161 phi = phis_at_eval{i};
162 best_vals(i) = rast(best_x1(phi), 0, phi);
163end
164
165subplot(1, 2, 2)
166semilogy(observed_vals,".", 'DisplayName', 'observed values')
167hold on;
168semilogy(observed_vals_at_request,".", 'DisplayName', 'observed values if no time delay')
169semilogy(best_vals, 'DisplayName', 'best possible values')
170ylim([1e-4, 1e1])
171xlabel("control iteration")
172legend()
173saveas(fig, "training_and_control.png")
Left: During the initial training phase in the first 500 iterations, the parameter space is explored leading to small and large objective values. In the control phase, only small objective values are observed. Right: The observed values (blue dots) agree well with the lowest achievable values (green line). Most of the deviations are due to the time offset between the request of a new suggestion for a given environment value \(\phi\) and the actual evaluation of the Rastrigin function about a second later. To see this, the values that would have been observed at the time of request are shown as orange dots.