Multivariate learning framework for long‐term adaptation in the artificial pancreas

Abstract The long‐term use of the artificial pancreas (AP) requires an automated insulin delivery algorithm that can learn and adapt with the growth, development, and lifestyle changes of patients. In this work, we introduce a data‐driven AP adaptation method for improved glucose management in a home environment. A two‐phase Bayesian optimization assisted parameter learning algorithm is proposed to adapt basal and carbohydrate‐ratio profile, and key feedback control parameters. The method is evaluated on the basis of the 111‐adult cohort of the FDA‐accepted UVA/Padova type 1 diabetes mellitus simulator through three scenarios with lifestyle disturbances and incorrect initial parameters. For all the scenarios, the proposed method is able to robustly adapt AP parameters for improved glycemic regulation performance in terms of percent time in the euglycemic range [70, 180] mg/dl without causing risk of hypoglycemia in terms of percent time below 70 mg/dl.


| INTRODUCTION
For several decades, engineers and clinicians have been working on artificial pancreas (AP) systems to achieve automated blood glucose (BG) regulation for patients with Type 1 diabetes mellitus (T1DM).
With the encouraging outcome of recent large-scale/long-duration outpatient clinical trials, 1-4 current AP systems require user intervention around meals and exercise. Since clinical factors such as basal rate (BR) and carbohydrate ratio (CR) are not fixed but do change for patients with Type 1 diabetes, adaptation and learning are needed and expected from AP systems to improve outcome and enable better control as insulin sensitivity changes. In addition, the effects of physical activity on glucose utilization, production, insulin sensitivity, and absorption strongly depend on its timing, type, and intensity, 5,6 which vary with lifestyle changes. This forms another motivation to incorporate adaptation in AP design.
To achieve this goal, different approaches to AP adaptation have been investigated in the literature. Automated BR modulation was considered in recent outpatient clinical trials 1,2 and the first U.S. Food and Drug Administration (FDA) approved commercial hybrid AP system in the United States. 3 With different degrees of automation and clinician effort, pump setting adaptation was also considered in multiple clinical studies, [7][8][9] although the detailed techniques were not explicitly reported. In a 12-week outpatient study, 4 an automated algorithm monitored by study physicians was employed to perform joint adaptation of BR and CR. In earlier investigations, 10,11 a run-torun approach was proposed to update BR and meal bolus sizes based on sparse BG measurements; this approach was recently revisited in Reference 12 to adapt CR profile during the day and BR at night based on continuous glucose monitor (CGM) measurements, and was verified through multiweek simulations that took account of insulin sensitivity circadian rhythm. A closely related approach that can exploit CGM measurements and continuous insulin delivery is iterative learning control, which can be regarded as a two-timescale enhancement of run-to-run methods. 13 In Reference 14, iterative learning model predictive control (MPC) was proposed to adapt the reference trajectory of the closed-loop controller used for glucose regulation, which was tested in a pilot study recently. 15 Despite the aforementioned progress, the problem of systematically adapting multiple key AP parameters (e.g., segments in BR and CR profiles, controller parameters) under lifestyle disturbances while explicitly enforcing safety considerations remains to be explored. In this work, a two-phase data-driven multivariate parameter learning framework for long-term adaptation of AP is developed. The proposed learning algorithm features a controller-led approach to adaptation of BR profile and a Bayesian optimization (BO) approach to adaptation of the CR profile and the controller parameters. The effectiveness of the algorithm is evaluated using the entire 111-adult cohort of the FDA-accepted Universities of Virginia (UVA)/Padova T1DM simulator 16 for multiple challenging scenarios. The obtained results suggest that the proposed AP adaptation method can robustly improve glucose regulation performance in terms of percent time in the euglycemic range and average glucose levels without causing the risk of hypoglycemia. A related topic to AP adaptation is individualized AP design, the key idea of which is to design the AP algorithms based on patient-specific parameters or historical data for improved glucose regulation performance. 17,18 Generally, AP adaptation attempts to dynamically adjust the individualized parameters (e.g., BR and CR profiles) that might have been improperly determined or change slowly in time, while AP individualization features more on static personalization of glucose control performance based on individualized parameters or personal patient data, which can be regarded as AP adaptation if the individualization procedures are performed progressively in time. In addition, AP adaptation can work seamlessly on top of an individualized controller; in our work, the lower-level MPC algorithm adopts a control-relevant model personalized by total daily insulin.

| METHODS AND MATERIALS
Patients with T1DM suffer from malfunctions of the glucose metabolic system due to the failure of the pancreas to secrete insulin and require external insulin infusion to regulate excessive BG. An AP attempts to automate glucose regulation through generating insulin micro-boluses according to real time BG measurements. To assist with the home use of an AP, a dual-layer control system scheme is first introduced. In the following, the parameter adaptation problem is described and the outline of the solution proposed is discussed.
The human glucose metabolic process is affected by disturbances that evolve under different time scales, including (a) food intake, physical exercise, stress, and alcohol consumption occurring on a random basis within a day, (b) the diurnal circadian rhythm of the body sensitivity to insulin and life habits that repeat on a daily/weekly basis, (c) randomly occurring sickness that can affect insulin sensitivity for one or a couple of days (e.g., influenza and gastrointestinal disease), and (d) chronic metabolic variations due to aging and lifestyle change.
To achieve satisfactory glucose regulation performance, these disturbances need to be considered in the design of AP algorithms.
Considering the multi-timescale nature of the disturbances, we introduce a dual-layer control scheme in this work (see Figure 1). The lower layer is composed of multiple controllers that deal with shortterm disturbances, including 1. a feedforward BR calculator that provides the basal insulin delivery rate based on a predetermined piecewise constant BR profile to counteract against patient's daily life habit and the diurnal insulin sensitivity.
2. a feedforward meal controller that calculates reference meal and correction boluses based on meal information and additional insulin requests provided by the patient.
3. a feedback controller that deals with all uncompensated disturbances based on real-time CGM measurements and safety considerations.
The feedback controller tops up the BR doses provided by the BR calculator to finalize real-time insulin microboluses. With a properly designed BR profile, the feedback controller does not need to act with abruptly varying and scenario-dependent aggressiveness, which reduces the difficulty of feedback control design and the chance of faulty controller-induced overdose behavior. Feedforward control (e.g., meal boluses) can reduce high-glucose events and is therefore important in glucose management, particularly when meals cannot be accurately predicted. In the AP literature, lower-layer control system design has been extensively investigated, the focus of which has been developing safe and efficient feedback control algorithms to improve glucose management; interested readers are referred to Reference 19 for a detailed review of control design for the AP. In our algorithm, we assume that lower layer control tasks are handled by a periodic zone MPC with asymmetric costs 20 together with the meal bolus strategy introduced therein, although the proposed approach can be applied to other lower-layer control algorithms.
As is shown in Figure 1, the upper layer is responsible for longterm parameter adaptation of lower layer control algorithms. This layer evolves on a longer timescale (e.g., weeks) and handles chronic changes in the patient's glucose metabolic process and lifestyle FIGURE 1 Dual-layer control scheme. The lower layer (highlighted in blue color) deals with disturbances occurring on smaller time scales based on real time CGM data, while the upper layer (highlighted in green color) handles chronic changes in the glucose metabolic process based on recent performance metrics from the lower layer through long-term AP adaptation algorithms, based on historical performance metric records from the lower layer. With the successful deployment of large and long-term out-patient clinical studies, longterm AP adaptation has gained in importance only recently, and is the main focus of this article.
With the dual-layer control scheme, we are now ready to introduce the parameter adaptation problem to be investigated. Specifically, the parameters to be adapted include 1. segments {β n } that compose the BR profile β ≔ [β 1 , β 2 , …, β N ] used in the BR calculator. Note that each β n is associated with a time period T β n during which it is active. In this work, we assume that T β n is fixed and predetermined by clinicians. 2. segments {γ n } that compose the CR profile γ ≔ [γ 1 , γ 2 , …, γ M ] used in the meal controller. Similar to β, each segment γ n is used for a particular time period T γ n , which is assumed to be fixed by clinicians as well.
3. parameters in the closed-loop controller. Three parameters in the zone MPC developed in Reference 20 are considered, includingR that represents the control penalty parameter for insulin delivery above the BR, D that determines the upper bound of a glucose zone for which the velocity penalty is active, and a coefficient γ IOB that determines the responsiveness of the insulin-on-board (IOB) constraint.
The goal of this work is to develop an automatic parameter learning algorithm that can correctly adapt the parameters in the lowerlayer control algorithms and is robust to lifestyle disturbances, with minimal patient/clinician involvement and without causing risks of hypoglycemia during the adaptation procedure.
To achieve this goal, a data-driven multivariate parameter learning framework in proposed. Considering the different roles of the lower-layer control algorithms, the adaptation procedure is divided into two phases. In Phase I, the parameters in the feedforward control algorithms (namely, the BR and CR profiles) are opti-

| Learning feedforward control parameters
In this subsection, we focus on adapting the parameters in the BR and CR profiles, which is Phase I of the adaptation procedure. The aim here is to obtain reasonable rather than optimal profiles, such that an appropriate operating point is provided for the feedback controller, which helps enhance the safety of closed-loop glucose control. More importantly, considering the fact that the feedback control may not be available for some periods (e.g., when the controller runs out of battery or lost CGM connection), the obtained parameter needs to be "safe" in the sense that no hypoglycemia would be caused when the patient loses closed-loop control. In the following, we first separately introduce the methods for BR and CR adaptation, based on which a hybrid time-and event-triggered method is introduced to iterate the procedures that adapt BR and CR.

| Adaptation of BR profile
Intuitively, the BR profile β is designed to manage "healthy" fasting glucose levels without considering meal-induced glucose excursions. However, when the fasting glucose levels (measured as glucose levels at night or before meals) are not satisfactory, it is difficult to diagnose which segment β n in β is the root cause.
This problem becomes more complicated when the effects of meal boluses are considered, due to a further loss of "identifiability." 21 A helpful observation, however, is that the "effective" realtime BR is ultimately determined by the feedback controller, which is able to adjust a potentially inappropriate BR provided by the BR profile to a certain extent. This observation allows us to perform BR profile adaptation using a controller-led approach, without the need of explicitly performing root cause diagnosis. To reduce the risk of controller-induced hypoglycemia, safety constraints are designed to eliminate the adoption of aggressive BR profiles. Through controller-led BR adaptation, the problem of lack of identifiability/diagnosability can be automatically solved, which leads to a direct separation of the adaptation of BR and CR profiles. The difficulty of algorithm design and implementation, however, is independent of the number of segments in the BR profile.

Controller-led BR profile update
BR adaptation determines the values of BR segments β k + 1 n È É at iteration k + 1 based on β k n È É and available glucose and insulin delivery information. The idea is to update the BR segment β n with the averaged nonmeal-related insulin microboluses commanded by the feedback controller during the same time interval, namely, T β n . Although different methods can be used to distinguish meal/nonmealrelated insulin, we consider a simple approach in this work and define nonmeal-related insulin as the insulin deliveries that happen τ m hours after the previous meal. This defines a set of admissible meal-related insulin deliveries I d n for each BR segment β n : where I d t denotes an insulin delivery at time instant t on day d, T β n denotes the time period for which β n is active, and T d m t ð Þ denotes the time of the previous meal that happens before time t on day d. We take τ m = 3 hr in our implementation. An initial BR estimate β k n for β k n can be written as where n d denotes the number of insulin microboluses in a day, and n w denotes the number of days in iteration k of the adaptation process, and 1(Á) denotes the indicator function. Although the actual real-time BR will be decided by the feedback control algorithm (using the knowledge of β n ), the key consideration on the obtained β n is that it should not overestimate the "true" BR and cause hypoglycemic events, which is particularly important when the feedback controller is unavailable. To address this concern, two types of safety constraints are incorporated, as will be introduced below.

Statistical IOB constraint
Similar to the case with lower-layer control algorithms, the IOB information can be exploited to prevent aggressive behavior of the parameter adaptation algorithm. On the basis of the IOB calculation used in zone MPC, 20 a statistical IOB constraint is proposed for BR adaptation. Specifically, a dynamic database D β n ð Þ is built to eliminate overestimated values of β n , which stores the information of a triplet that have led to low-glucose events in history for i 2 {1, 2, …, k}, with IOB i n denoting the averaged IOB value immediately before β n becomes active in adaptation iteration i, and γ k m$n denotes the values CR used within T β n in iteration k. When β k + 1 n is computed for iteration k + 1 and β k + 1 n > β k n , the IOB constraint is enforced and implemented as where "⪯" denotes an element-wise partial order holds for the two vectors compared. Here, IOB k n is used to estimate the average IOB value immediately before β n becomes active in adaptation iteration k + 1. If any of the IOB constraints are violated, β k + 1 n ¼ 0:95β k n to avoid causing hypoglycemia risk.

Smoothness constraint
This constraint ensures that the BR segment β n does not change too much compared with its neighboring segments β n − and β n + , with n − = N − mod(N − n + 1, N) and n + = mod(n, N) + 1, respectively. The underlying intuition is that the insulin requirement of the metabolic system to maintain a healthy fasting glucose level does not change abruptly in time. Mathematically, this constraint is implemented as where λ β s denotes the "smoothness coefficient" and is selected as 1.3 in our work to compromise smoothness with performance.

| Adaptation of CR profile
Different from BR adaptation, the effect of different CR segments on glucose profiles are relatively isolated, as meals are usually several hours apart. This helps decouple the adaptation problems for different γ n values, for which a data-driven optimization approach can be developed.

Zone objectives
Instead of optimizing the CR profile for a specific glycemic metric, our goal for CR adaptation is to only ensure that the average BG levels

Parameter optimization with safety constraints
To adapt the CR profile toward the target zone, a virtual optimization problem is formulated for each γ n . The cost function is selected as the average postprandial BG levels where f γ n γ n , θ γ n À Á is used to represent the underlying unknown dependency of y γ n on γ n and other parameters θ γ n . The adaptation process is then performed by solving a sequence of constrained optimization problems with this unknown cost function: To ensure the smoothness of the adaptation procedure, three safety constraints are considered in the optimization problem above.
The first and second constraints, respectively, restrict the rate of change of γ n and f γ n γ n , θ γ n À Á . In our implementation, λ γ is chosen as 30%, and δ y γ n is selected as 12 mg/dl. The third constraint directly bounds γ n from below to avoid hypoglycemia risks caused by an underestimated CR; the lower bound γ k + 1 n is updated dynamically by taking the maximum value of γ i n that has caused risk of hypoglycemia during the adaptation process. Note that γ k n and γ k + 1 n are known and f γ n γ k n , θ γ n À Á can be calculated based on historical CGM measurements is generally not known. To solve this problem, a datadriven BO-assisted algorithm will be provided in Section 2.3. is set to 2 in our implementation. CR adaptation is performed in a combined time-and event-triggered fashion; the adaptation procedure ends either when the zone objectives are achieved or when a maximum number of iterations N γ is reached, which is set to 5 in our implementation. As meal boluses can be adjusted by the patients based on their knowledge and preference, the overall feedforward parameter adaptation phase begins by adapting the BR profile and iterates according to the alternating procedure described. An illustration of the procedure is provided in Figure 2. The phase terminates when the zone objective criterion for CR adaptation remains valid after performing the BR adaptation.

| Choice of parameters
A few parameters are adjustable in the proposed BR and CR adapta-

| Learning feedback control parameters
In this subsection, we focus on Phase II of the adaptation procedure,

| Parameter selection based on sensitivity analysis
Generally, parameter selection determines the parameter φ 2 R, D, γ IOB n o to be adapted in Phase II. In this work, a sensitivity analysis approach is utilized to achieve automatic parameter selection, by rerunning the closed-loop control algorithm with different parameter settings using the most recent historical glucose measurements for a specific patient, which is the so-called advisory-mode analysis. 22 To do this, for each φ 2R,D, γ IOB where ℐ(φ + ) and ℐ(φ − ) denote the total amount of insulin calculated in the advisory mode analysis for φ = φ + and φ = φ − , respectively. This procedure determines the parameter that has the "strongest" controllability of insulin delivery, hence the closed-loop glycemic regulation can be adjusted most efficiently.

| Parameter optimization
The goal of adapting the selected parameter φ is to achieve satisfactory average glucose level without having risk of hypoglycemia, which implies improved percentage time in euglycemic range [70, 180] mg/dl as the glucose profile is restricted below by considering constraints on hypoglycemia. To achieve this goal, we again formulate the adaptation procedure as a sequential optimization problem: Here, φ k + 1 denotes the value of φ at the (k + 1)th iteration, f φ (φ k + 1 , θ φ ) denotes the average glucose value obtained using φ k + 1 with θ φ being potential dependent parameters. The constraints in Equations 12 and 13 restrict the rate of change of φ and f φ (φ, θ φ ), respectively. In our implementation, λ φ is selected as 30%, and δ y φ is chosen as 6 mg/dl; the roles of these two parameters are identical to those of λ γ and δ y γ n discussed in Section 2.
where f(ψ k , θ ψ ) is an unknown function of ψ k parameterized by θ ψ , y k − 1 is a noisy measurement of f(ψ k − 1 , θ ψ ), δ is a known parameter, and g(ψ k ) is a known (linear) function of ψ k . Noticing the fact that a noisy measurement/estimate y k of f(ψ k , θ ψ ), which is average glucose level for a fixed ψ k , is available, we solve this problem through designing a BO-assisted optimization algorithm, to make efficient use of historical measurements of {f(ψ k , θ ψ )| k 2 π k } for an admissible set of iteration indexes π k ; here, π k is defined as the set of iteration indexes for which the same set of parameters are adapted while other adaptation parameters are kept constant. The main idea is to obtain a data-driven estimatef ψ k , θ k jD k ð Þof f (ψ k , θ ψ ) based on available historical data D k . In the BO literature, different forms off ψ k ,θ k jD k ð Þhave been proposed. 23 As the size of data set for adaptation is small compared with those available for machine learning problems, a linear kernel is adopted in this work: with θ k ≔ [θ k, 1 , θ k, 2 ] > . Note that to make efficient use of data, the value of θ k is made ψ-dependent and is obtained based on data segment D k D k , which is the subset of data D k corresponding to iteration indexes in π k . This further explains the rationale for using a linear φ k is then implemented to obtain y k , which is collected to update D k (line 12).

| RESULTS
In this section, we evaluate the proposed adaptation algorithm through multiple-month simulations on the 111-patient cohort of the US FDA accepted UVA/Padova simulator. 16 To mimic intra-day variations in insulin sensitivity and the effect of dawn phenomenon, the diurnal patterns of the parameters in the endogenous glucose produc-

| Results for Scenario I
The results for Scenario I are shown in Figures 3 and 4. The adaptation algorithm is safe in the sense that no hypoglycemia risk is caused during the process.
The trends of parameter changes during the adaptation procedure are provided in Figure 4. The important observation here is that FIGURE 3 Trends of the glycemic management metrics in the adaptation procedures for the 111-patient cohort (Scenario I). A box-and-whisker approach is used to plot the data, where on each box, the central white line is the median, the edges of a box denote the 25% and 75% percentiles, and the whiskers denote the 5% and 95% percentiles   through the use of statistical IOB constraints and smoothness constraints, strong performance is achieved by the BR profiles with their segments aligning around the reference value (which is 1 in Figure 4), instead of profiles with extremely large and small neighboring segments. In addition, the CR profile segments are aligned around the reference value; due to the safety constraints, risky (small) values for the FIGURE 4 Trends of the adaptation parameters in the adaptation procedures for the 111-patient cohort (Scenario I). The same box-and-whisker approach as that in Figure 3 is used to plot the data. For illustration purpose, the relative values of parameters against default values in the UVA/Padova simulator are provided FIGURE 5 Trends of the glycemic management metrics in the adaptation procedures for the 111-patient cohort (Scenario II). The same box-and-whisker approach as that in Figure 3 is used to plot the data CR segments are avoided. In addition, only moderate changes are observed for parameters in the feedback controller, which is expected as the obtained feedforward control parameters (BR and CR profiles) have set up an optimized operating point for the feedback controller, and thus the default values can achieve satisfactory glucose management. It is interesting to note that the second segment of the BR profile is larger than the other elements, which helps counteract against diurnal insulin sensitivity changes and dawn phenomenon. FIGURE 6 Trends of the adaptation parameters in the adaptation procedures for the 111-patient cohort (Scenario II). The same box-and-whisker approach as that in Figure 3 is used to plot the data. Keys are the same as those in Figure 4 FIGURE 7 Trends of the glycemic management metrics in the adaptation procedures for the 111-patient cohort (Scenario III). The same box-andwhisker approach as that in Figure 3 is used to plot the data

| Results for Scenario II
The results for Scenario II are provided in Figure 5

| Intra-patient changes
To further illustrate the effect of the adaptation process, glucose and insulin profiles of a particular patient simulated using the adaptation parameters obtained on Weeks 1, 8, 16, and 24 are provided in For instance, if a patient had a low BR for some time period while he/she happened to be jogging (which would increase insulin sensitivity) at the same time, the glucose traces might still appear to be acceptable, which would prevent an adaptation algorithm from identifying the real problems in glucose regulation. Finally, a considerable number of parameters in AP need to be considered as candidate variables in the adaptation procedure (which is 12 in our in silico tests in this work), while from a user experience perspective, the time of adaptation is limited (e.g., 24 weeks). As an iteration of the adaptation FIGURE 8 Trends of the adaptation parameters in the adaptation procedures for the 111-patient cohort (Scenario III). The same box-andwhisker approach as that in Figure 3 is used to plot the data. Keys are the same as those in Figure 4 procedure may take 1 week to average out lifestyle disturbances, this basically implies that the task of AP adaptation through adjusting 12 coupled parameters need to be achieved within 24 iterations.
To overcome these challenges, a data-driven multivariate learning framework is developed in this work, on the basis of a dual-layer control scheme for long-term home use of AP. In this scheme, feedback/feedforward control algorithms operate in the lower layer to achieve real-time glucose regulation, while the adaptation algorithm is implemented in the upper layer based on data from the lower layer. The proposed adaptation procedure is composed of two For optimization problems encountered in CR profile and feedback control adaptation, analytical relationships between optimization parameters and performance metrics are generally not known, which precludes the use of standard optimization methods. 25 To overcome this additional challenge, the optimization problems are solved utilizing a BO approach, which was developed to optimize unknown objective functions based on noisy measurements in the machine learning community, 23 and has been recently used in on-line dynamics learning, automatic controller tuning and nonlinear adaptive control. [26][27][28][29] In our work, we integrate BO with safety requirements and clinical experience, so that the "black-box" glucose regulation process can be safely adapted and improved.
The proposed adaptation method is evaluated on the basis of the (d) Week 24 FIGURE 9 Twenty-four-hour glucose and insulin profiles simulated using the system parameters obtained on Weeks 1, 8, 16, and 24 of Scenario I for a particular patient. For comparison purpose, the same meal protocol and measurement noise sequence are applied to generate the data in the four subplots. Green and purple triangles denote meals of 50 g and 75 g CHO, respectively (d) Week 24 FIGURE 11 Twenty-four-hour glucose and insulin profiles simulated using the system parameters obtained on Weeks 1, 8, 16, and 24 of Scenario III for a particular patient. Meal protocol and sensor noise sequence are the same as those in Figure 9  (d) Week 24 FIGURE 10 Twenty-four-hour glucose and insulin profiles simulated using the system parameters obtained on Weeks 1, 8, 16, and 24 of Scenario II for a particular patient. Meal protocol and sensor noise sequence are the same as those in Figure 9 adaptively adjust the inappropriate parameters to achieve improved and satisfactory glucose regulation, without causing risks of hypoglycemia throughout the adaptation procedure.

| CONCLUSION
In this article, a data-driven multivariate learning approach has been proposed for long-term parameter adaptation of an AP, on the basis of a dual-layer control scheme. Through a two-phase adaptation procedure, the algorithm gradually adjusts BR profile, CR profile, and parameters in the closed-loop control algorithm. The adaptation of BR profile was performed by based on the behavior of the closed-loop control algorithm while considering IOB and smoothness constraints, and CR profile and feedback control parameters were adjusted through a BO-assisted approach. It was shown that the algorithm was able to adjust inappropriate parameters for improved glucose regulation, despite of lifestyle disturbances caused by skipped meals, sickness, and inter-and intra-day insulin sensitivity changes.

CONFLICT OF INTERESTS
The authors have no conflicts of interest to declare.
Algorithm 1: BO-assisted parameter adaptation 1: while termination conditions not satisfied do.
2: update iteration counter k ≔ k + 1, update π k and query D k for data D k used in adaptation of ψ; 3: update S k based on y k − 1 ; 4: if jπ k j ≤ n BO then.
12: implement ψ k to obtain y k , and update data set 13: end while.