Optimal control of a system in a changing environment

Driver:

ActiveLearning

Download script: changing_environment.m

The target of the study is show how to control a system that depends on environment parameters such as temperature or humidity. While the environment parameters can be measured, their influence on the system’s performance is often unknown.

As an example objective the 2d Rastrigin function

\[r(x_1, x_2) = 10\cdot 2 +x_1^2 + x_2^2 - 10\cos(2\pi x_1) - 10\cos(2\pi x_1)\]

is considered. The environment parameter \(\phi\) acts as an additional phase offset to the first cosine function in the objective function

\[r(x_1, x_2, \phi) = 10\cdot 2 +x_1^2 + x_2^2 - 10\cos(2\pi x_1 + \phi) - 10\cos(2\pi x_1)\]

The phase shall slowly vary over time as

\[\phi(t) = 2\pi\sin\left(\frac{t}{3{\rm min}}\right).\]

Please note, that this specific time dependent behaviour is not exploited and is assumed to be unknown.

Before being able to control the system in an optimal way depending on the environment, one has to learn for many environment values, where the global minimum is located. To this end, a standard Bayesian optimization is performed for 500 iterations that explores the parameter space. In a second phase, the target is to evaluate the system in an optimal way, i.e. an exploration of the parameter space is not desired. This behaviour is mainly enforced by choosing a small scaling value.

The control phase could have an arbitrary number of iterations and it would be problematic to add all new observations to the study. On the one hand, this slows down the computation time of a suggestion. Since the environment value changes during the computation, this can lead to less optimal evaluation points. On the other had, adding more and more data points close to each other leads to an ill conditioned Gaussian process surrogate. To avoid these drawbacks, data points are not added in the control phase if the study predicts a value with very small uncertainty, which means that the observation would not add significant information.

  1jcm_optimizer_path = '<JCM_OPTIMIZER_PATH>';
  2addpath(fullfile(jcm_optimizer_path, 'interface', 'matlab'));
  3 
  4server = jcmoptimizer.Server();
  5client = jcmoptimizer.Client(server.host);
  6
  7%Rastrigin-like function depending on additional phase offset phi 
  8function val=rast(x1, x2, phi)
  9    val = 10*2 + x1^2 + x2^2 - 10*cos(2*pi*x1 + phi) - 10*cos(2*pi*x2);
 10end
 11 
 12%time-dependent slowly varying phi
 13function phi=current_phi()
 14    t = now()*1e5;
 15    phi = 2*pi*sin(t/180);
 16end
 17 
 18% Definition of the search domain
 19design_space = { ...
 20    struct('name', 'x1', 'type', 'continuous', 'domain', [-1.5,1.5]), ...
 21    struct('name', 'x2', 'type', 'continuous', 'domain', [-1.5,1.5]) ...
 22};
 23
 24% Definition of the environment variable "phi"
 25environment = { ...
 26    struct('name', 'phi', 'type', 'variable', 'domain', [-2*pi, 2*pi]) ...
 27};
 28
 29 % Creation of the study object with study_id 'changing_environment'
 30study = client.create_study( ...
 31    'design_space', design_space, ...
 32    'environment', environment, ...
 33    'driver','ActiveLearning',...
 34    'name','Optimal control of a system in a changing environment',...
 35    'study_id', 'changing_environment');
 36
 37% In the initial training phase, the target is to explore the
 38% parameter space to find the global minimim.
 39study.configure( ...
 40    ... train with 500 data points
 41    'max_iter', 500, ...    
 42    ... Advanced sample computation is switched off since the environment
 43    ... parameter phi can change significantly between computation
 44    ... of the suggestion and evaluation of the objective function
 45    'acquisition_optimizer', struct('compute_suggestion_in_advance', false) ...
 46);
 47
 48% Evaluation of the black-box function for specified design parameters
 49function observation = evaluate(study, sample)
 50
 51    pause(2); % make objective expensive
 52    observation = study.new_observation();
 53    % get current phi
 54    phi = current_phi();
 55    observation.add(rast(sample.x1, sample.x2, phi), 'environment_value', {phi});
 56    
 57end  
 58
 59% Run the minimization
 60study.set_evaluator(@evaluate);
 61study.run(); 
 62
 63
 64% The target in the control phase is to evaluate the offet Rastrigin function only
 65% at well performing (x1,x2)-point depending on the current value of the environment.
 66MAX_ITER = 500; %evaluate for 500 additional iterations
 67study.configure( ...
 68    'max_iter', 500 + MAX_ITER, ...
 69    ... The scaling is reduced to penalize parameters with large uncertainty    
 70    'scaling', 0.01, ...
 71    ... The lower-confidence bound (LCB) strategy is chosen instead of the
 72    ... default expected improvement (EI). LCB is easier to maximize at the
 73    ... risk of less exploration of the parameter space, which is anyhow not
 74    ... desired in the control phase.
 75    'objectives', {...
 76        struct('type', 'Minimizer', 'name', 'objective', 'strategy', 'LCB') ...
 77    }, ...
 78    'acquisition_optimizer', struct('compute_suggestion_in_advance', false) ...
 79);
 80
 81% keep track of suggested design points and phis at request time and evaluation time
 82design_points = {};
 83phis_at_request = {};
 84phis_at_eval = {};
 85            
 86iter = 0;
 87while not(study.is_done())
 88    iter = iter + 1;
 89    if iter > MAX_ITER
 90        break
 91    end
 92        
 93    phi = current_phi();
 94    suggestion = study.get_suggestion({phi});
 95    phis_at_request{end + 1} = phi;
 96    sample = suggestion.sample;
 97    design_points{end + 1} = [sample.x1, sample.x2]; 
 98    try
 99        obs = evaluate(study, sample);
100        %update phi from observation
101        phi = obs.data.None{1}.env{1};
102        phis_at_eval{end + 1} = phi;
103        
104        predictions = study.driver.predict({[sample.x1, sample.x2, phi]});
105        std = sqrt(predictions.variance(1, 1));
106            
107        fprintf("Uncertainty of prediction %f\n", std);    
108        % add data only if prediction has significant uncertainty
109        if std > 0.01
110            study.add_observation(obs, suggestion.id);
111        else
112            study.clear_suggestion( ...
113                suggestion.id, sprintf("Ignoring observation with uncertainty %f", std));
114        end
115            
116    catch err
117        study.clear_suggestion( ...
118            suggestion.id, sprintf("Evaluator function failed with error: %s", err) ...
119        )
120        rethorw(err);
121    end    
122
123end
124            
125
126fig = figure('Position', [0, 0, 1000, 500]);
127 
128% all observed training samples
129observed = study.driver.get_observed_values();
130subplot(1, 2, 1)
131plot(observed.means,".")
132hold on;
133xline(500, '--');
134xlabel("training+control iteration")
135ylabel("observed value of Rastrigin function")
136
137%observed values during control phase
138observed_vals = [];
139
140%values that would have been observed at request time,
141%i.e. if there would be no time delay between request and 
142%evaluation of suggestion
143observed_vals_at_request = [];
144
145for i = 1:length(design_points)
146    observed_vals(i) = rast( ...
147        design_points{i}(1), design_points{i}(2), phis_at_eval{i});
148    observed_vals_at_request(i) = rast( ...
149        design_points{i}(1), design_points{i}(2), phis_at_request{i});
150end
151
152%best value of x1-parameter depending on environment
153function val = best_x1(phi)
154    val = -phi/(2*pi);
155    if abs(phi) > pi
156        val = val + sign(phi);
157    end
158end
159
160% best possible values 
161best_vals = [];
162for i = 1:length(phis_at_eval)
163    phi = phis_at_eval{i};
164    best_vals(i) = rast(best_x1(phi), 0, phi);
165end
166
167subplot(1, 2, 2)
168semilogy(observed_vals,".", 'DisplayName', 'observed values')
169hold on;
170semilogy(observed_vals_at_request,".", 'DisplayName', 'observed values if no time delay')
171semilogy(best_vals, 'DisplayName', 'best possible values')
172ylim([1e-4, 1e1])
173xlabel("control iteration")
174legend()
175saveas(fig, "training_and_control.png")
training and control

Left: During the initial training phase in the first 500 iterations, the parameter space is explored leading to small and large objective values. In the control phase, only small objective values are observed. Right: The observed values (blue dots) agree well with the lowest achievable values (green line). Most of the deviations are due to the time offset between the request of a new suggestion for a given environment value \(\phi\) and the actual evaluation of the Rastrigin function about a second later. To see this, the values that would have been observed at the time of request are shown as orange dots.