Multivariate data analysis of growth medium trends affecting antibody glycosylation

Abstract Use of multivariate data analysis for the manufacturing of biologics has been increasing due to more widespread use of data‐generating process analytical technologies (PAT) promoted by the US FDA. To generate a large dataset on which to apply these principles, we used an in‐house model CHO DG44 cell line cultured in automated micro bioreactors alongside PAT with four commercial growth media focusing on antibody quality through N‐glycosylation profiles. Using univariate analyses, we determined that different media resulted in diverse amounts of terminal galactosylation, high mannose glycoforms, and aglycosylation. Due to the amount of in‐process data generated by PAT instrumentation, multivariate data analysis was necessary to ascertain which variables best modeled our glycan profile findings. Our principal component analysis revealed components that represent the development of glycoforms into terminally galacotosylated forms (G1F and G2F), and another that encompasses maturation out of high mannose glycoforms. The partial least squares model additionally incorporated metabolic values to link these processes to glycan outcomes, especially involving the consumption of glutamine. Overall, these approaches indicated a tradeoff between cellular productivity and product quality in terms of the glycosylation. This work illustrates the use of multivariate analytical approaches that can be applied to complex bioprocessing problems for identifying potential solutions.

mined that different media resulted in diverse amounts of terminal galactosylation, high mannose glycoforms, and aglycosylation. Due to the amount of in-process data generated by PAT instrumentation, multivariate data analysis was necessary to ascertain which variables best modeled our glycan profile findings. Our principal component analysis revealed components that represent the development of glycoforms into terminally galacotosylated forms (G1F and G2F), and another that encompasses maturation out of high mannose glycoforms. The partial least squares model additionally incorporated metabolic values to link these processes to glycan outcomes, especially involving the consumption of glutamine. Overall, these approaches indicated a tradeoff between cellular productivity and product quality in terms of the glycosylation. This work illustrates the use of multivariate analytical approaches that can be applied to complex bioprocessing problems for identifying potential solutions.

K E Y W O R D S
bioprocessing, galactosylation, glutamine, glycosylation, MVDA

| INTRODUCTION
Generation of biological drug products from bioreactors is a complex procedure due to using living cells as the site of manufacturing. Much is still unknown about the relevant cellular processes, so understanding how different variables in bioprocessing affect critical quality attributes (CQA) such as glycosylation and monoclonal antibody (mAb) product titer is of great interest to the biopharmaceutical industry. 1,2 To study how these variables could potentially contribute to changes in cellular productivity and the CQAs of the product molecule, multivariate approaches are needed to analyze the large amount of data required. 3 This use of multivariate data analysis (MVDA) in associating cell culture process and material variables to CQAs is sometimes referred to as "fermentanomics." 4 MVDA is necessary due to the difficulty in detecting the subtle, yet important, relationships through univariate means in addition to the problems inherent to large datasets Currently at: Nicholas Trunfio, Sartorius Stedim Data Analytics, Bohemia, NY 11716. such as varying degrees of experimental error, multicollinearity, and missing data. 5 N-glycosylation is an example of one CQA present in therapeutic mAbs that has large consequences on the efficacy and stability of the protein. 6 In the mAb immunoglobulin G (IgG), N-glycosylation is found at Asn 297 in the crystallizable fragment (Fc) of the heavy chain. 7 This modification takes place in the endoplasmic reticulum (ER) and Golgi apparatus, where a 14-sugar precursor Glc3Man9GlcNAc2-is attached and modified as the protein traverses the Golgi. 8 The modification process typically involves the loss of the mannose saccharides and replacement with N-acetylglucosamine (GlcNAc) and galactose.
Cellular stresses, such as nutrient depletion, can interrupt these enzymatic processes since it affects the available pool of substrate molecules such as nucleotide sugars. 9,10 The altered rates at which the enzymes modify the polysaccharide chains can result in the different glycoforms, which collectively form the glycan profile of the protein.
These alterations of the cellular environment that impact the mAb glycan profile can have a profound impact on the resulting drug's quality. As the activity of a mAb drug is commonly mediated through antibody-dependent cellular cytotoxicity (ADCC) or complementdependent cytotoxicity, the antibody glycoform can affect these processes by either facilitating or hindering recruitment of necessary interactors. For example, glycans that feature a core fucose will reduce ADCC activity due to the moiety interfering with Fcγ receptor interaction. 11 The glycosylation state of the protein can additionally affect stability, immunogenicity, and clearance rate. 12,13 Due to the importance of N-glycosylation in drug quality, further understanding of the variables and processes that affect its outcome is warranted. In this vein, supplementation strategies where the additions were comprised of sugars, metals and amino acids have been shown to directly affect the produced glycan profile. 14,15 Aglycosylation, where the protein lacks a glycan residue at the Asn 297 site, is another possible outcome. Nutrient depletion, such as glucose, has been shown to result in this change. 16 Due to the role of the glycan modification in protein binding, its absence has a marked effect on the protein properties: protein aggregation, reduced stability, and altered pharmacokinetic properties. [17][18][19] A wide variety of glycan outcomes are possible and there are many bioprocessing and culture medium variables that can affect this process, necessitating the need for multivariate analysis.
To generate a dataset with an adequate number of replicates given the large number of potential inputs involved, we used the ambr ® 15 automated micro bioreactor system (Sartorius, Hartfordshire, UK). This automated, parallel cell culture platform allows a suitable dataset for multivariate analysis to be generated because many cell culture experiments can be simultaneously run with minimal spurious batch-tobatch variability caused by differences in seeding density, cell culture operation, environmental conditions, and media preparation. An inhouse model IgG 1 producing CHO DG44 cell line was used for these cultures, with the system run in batch mode to accommodate for the vessels' small sizes which could not sustain a reasonable sampling frequency over a longer fed-batch culture while maintaining the minimum reactor volume necessary for operation. Earlier studies were used to determine our selection of media: Ex-Cell Advanced (SAFC), CD OptiCHO (Thermo Fisher), PowerCHO2 (Lonza), ProCHO5 (Lonza). 20 Due to the small size of the micro bioreactors (15 ml, with a minimum volume of 10 ml), the breadth of in-process analytics that could be performed, such as technical replicates of in-process measurements, was limited. As such, glycosylation was the only product quality attribute evaluated on the final harvested product.
Due to the quantity of data collected and the difficulty of identifying relationships in complex datasets with only univariate analyses, we used two multivariate analysis techniques to uncover these interactions: principal component analysis (PCA) and partial least squares regressions (PLS). This is possible because the measured variables, collectively referred to as the feature space, vary collinearly with one another; for example, an increase in the abundance (percentage) of one glycoform must result in an equal cumulative decrease in the abundance of the remaining glycoforms. PCA exploits this collinearity by projecting the feature space onto a set of latent variables, called principal components, that describe the orthogonal variations in the original feature space. 21 Collectively, latent variable data is referred to as the score space, as observations on these new principal components are called scores. Each observation will have a score value associated with it for each of the principal components. Observations that are similar across many of the original variables will appear clustered together in the projection. Therefore, any observations that deviate from the others can be seen. PLS was used to find the functional relationship between bioreactor process parameters and glycan outcomes, such as high mannose (HM) glycoforms and altered proportions of terminal galactosylation.

| RESULTS
We sought to generate a complex dataset from a model IgG 1 antibody producing bioprocess for use in MVDA using the following commercially available media: Ex-Cell Advanced, OptiCHO, PowerCHO2, and ProCHO5. We used our in-house CHO DG44 cell line at an inoculation density of~1 × 10 6 cells/ml. Nine micro bioreactors were prepared with each of the four media. One of the bioreactors containing Ex-Cell Advanced was lost, resulting in a total of 35 bioreactors successfully run. The bioreactors were operated within normal ranges found in bioprocessing with minor differences in the levels of nutrients supplemented. The micro bioreactors were run for 8 days, after which the mAb was collected and purified. Figure 1 displays technical dot plots with the final integrated viable cell density (IVCD) and specific productivity profiles categorized by culture medium, where each dot represents an individual micro bioreactor culture. The IVCD is a calculated variable that represents the cumulative viable cell density over the course of the whole bioreactor run, measured in cell-days/ml (further details for this variable, as well as the following metabolite values, can be found in Section 4.9). Cultures grown in Ex-Cell Advanced and PowerCHO2 have comparable IVCD, while this value is lowest in ProCHO5 and highest in OptiCHO. OptiCHO also featured the lowest specific productivity, which was roughly the same in the other media.
In-process measurements for the concentrations of glutamine (Gln), glucose (Glc), and lactate (Lac) were performed using a Bioprofile Flex Analyzer. We used the measured values to calculate the total specific amounts consumed/produced per cell within the micro bioreactors over the total culture life, the resultant technical dot plots of which are shown in Figure 2. Overall, OptiCHO featured the lowest consumption of glutamine and glucose while also producing the lowest amounts of lactate. Likewise, Ex-Cell Advanced had cultures that consumed/produced in the middle, while the highest consumers/producers were PowerCHO2 and ProCHO5 containing micro bioreactors. Our next step was to assess if any of the differences we observed in the cell growth and nutrient profiles affected the glycosylation state of the IgG 1 antibody that was produced. Due to the limited bioreactor volume size, the product mAb was only harvested at the end of the run; because the mAb accumulates in the vessel over time, the final harvest material is representative of the average mAb produced throughout the varied process conditions of the cell culture. Additionally, the average mAb will more closely resemble protein produced at the end of the cell culture due to more protein being produced at the higher cell densities that are reached in the latter stages of the process, and due to the cells' increased productivity as they enter the stationary phase during this time. The purified antibody was analyzed for its glycan profile and heavy chain size variants using mass spectrometry and capillary electrophoresis, respectively. The combined results for these analyses are shown in Table 1. The data in Table 1 is categorized by culture medium and analytical technique, as fluorescence mass spectrometry was used to quantify the numerical percentages of all the glycoforms, while reduced capillary electrophoresis (rCE-SDS) was used to compute the amount of aglycosylation antibody heavy chains. The results of all biological and three technical replicates each were mean averaged to obtain the values shown in Table 1; the values that are highest for each glycan type are bolded and underlined. Based on the glycan species we observed in our analysis, we used the following groupings: G0F, G1F, G2F, HM (this category contains Mannose 4 to Mannose 9 which consist of mannose oligosaccharide clusters bound to the 2 GlcNAc core), and Other (this category consists of the uncommon glycoforms such as G0F-N and nonfucosylated glycoforms like G0, G1, and G2). The "Other" category contained less than 5% of the overall glycans present. These groups of glycan species, HM and other, were created to help in data visualization and to simplify the data used to detect significant trends. The F I G U R E 1 Representative bioreactor growth profiles sorted by media Technical dot plots depict the final integrated viable cell density (IVCD) and specific productivity across all culture conditions for Ex-Cell Advanced, OptiCHO, PowerCHO2, and ProCHO5 media F I G U R E 2 Representative consumption and production within the micro bioreactors sorted by media. Total specific consumption of glutamine (Gln) and glucose (Glu) and lactate (Lac) production across all culture conditions for Ex-Cell Advanced, OptiCHO, PowerCHO2, and ProCHO5 in terms of total mass per cell (mg/cell) sum of terminal galactosylated species (G1F and G2F as opposed to G0F which features no galactosylation) varied greatly based on media: ProCHO5 cultures produced the most G1F and G2F of all the media tested, while OptiCHO produced the most G0F (more than G1F and G2F combined, 53.189% vs. 32.423%). These values could indicate that ProCHO5 medium promotes trafficking through the Golgi apparatus and/or galactosyltransferase enzymatic activity responsible for galactosylation more efficiently than in other media. 22 Culture medium is also associated with the abundance of HM glycoforms which can be characteristic of cellular stress and causes incomplete processing during N-glycan biosynthesis. 6,23,24 This cellular stress can manifest differently, either as a reduction in galactosylation, or increases in either HM glycoforms or aglycosylation. For example, Ex-Cell Advanced cultures displayed the largest amounts of HM glycoforms, while PowerCHO2 had the most aglycosylation. Table 1 shows that culture medium significantly affects the total product glycan profile. Nutrient related cellular stress can cause both an increase of immature glycoforms such as HM and an increase in antibody aglycosylation, but the conditions that result in production of aglycosylated mAbs are not well characterized and do not appear to overlap with those that result in immature glycoforms. 16,22 As mentioned earlier, glucose depletion has been shown to be responsible for aglycosylation outcomes, though we did not find any evidence of glucose deficiency in the PowerCHO2 vessels ( Figure 2). We note that the conditions which result in aglycosylation do not appear to be correlated with those that result in low terminal galacosylation or increased HM species. Our work illustrates the need for fully characterizing the in-process parameters that result in altered glycosylation states that will modify the therapeutic properties of the antibody, especially when the drug mechanism requires Fc-binding ligands.
Due to the complex interplay between the different cellular functions affecting growth, metabolism, and glycan outcomes, we reasoned that data driven multivariate analysis would be required to understand the correlation structure that relates the harvested mAbs' quality and the in-process variables. Accordingly, the exact relationship between media selection and product quality is obscured for multiple reasons. First, the identities of the chemicals contained within commercial growth media are proprietary and unknown. Second, the relationship between the growth measurements, metabolite measurements and final product quality is governed by metabolic pathways containing complex reaction networks that have not been fully characterized. Due to this, we used MVDA techniques, such as PCA and PLS, to find a set of latent variables that describe the variability seen in the measured data and calculated variables when the exact biological relationship between model features is unknown. We only used the mass spectrometry data for the MVDA since mass spectrometry and rCE-SDS are vastly different techniques that measure attributes on different analytes.
PCA was performed to assess the suitability of using MVDA to characterize the impact of media selection on antibody glycosylation and productivity. The model's features, X, are comprised of only the titer and glycosylation profiles from each of the 35 micro bioreactors; the culture medium and other in-process variables were not included in this initial model, which is summarized in Table 2. Sevenfold crossvalidation was used to determine Q 2 from predicted values of the excluded data. In order to prevent overfitting, the number of T A B L E 1 Glycosylation profiles by growth medium to achieve terminal galactosylation. This is also evident in the model's F I G U R E 3 Principal component analysis model to characterize impact of media selection on the glycosylation profile and titer. A and B show the PCA model's loadings and score space. Part C shows a principal component regression (PCR) demonstrating that the first principal component describes the degree to which the cells were able to achieve terminal galactosylation. D and E show that the second principal component characterizes the degree to which the cells can convert high mannose glycoforms to G0F and that the metabolic processes associated elevated conversion are inversely correlated with the metabolic process associated with protein production T A B L E 2 Summary statistics for the profile models this is in good agreement with the univariate analysis derived from Table 1. Further evidence that the first principal component is correlated with the cells ability to achieve terminal galactosylation is provided in Figure 3C where  Table 1. Further evidence that the second principal component is correlated with the cells ability to convert HM glycoforms into the G0F glycoform is provided in Figure 3D where the PCR expressing the total amount of all G0F, In addition, the second principal component's loadings, p 2 , suggest that the underlying metabolic phenomena responsible for converting the HM glycoforms into the G0F glycoform are inversely correlated with the metabolic phenomena responsible for increased protein production and that optimizing for titer could have deleterious effects on product quality, and vice versa. Further evidence for this can be seen in Figure 3e; it shows that the PCR that expresses titer as a linear function of t 2 is also able to describe a majority of the variability in titer (R 2 = 0.739). The fact that the slope of the regression in 3D is negative and the slope of the regression in 3E is positive is further evidence of the inverse relationship between cell productivity and efficiency in converting HM glycoforms to the G0F glycoform.
Having established a set of latent variables that can be used to discriminate between the productivity and glycosylation profiles resulting from cells grown in different media formulations, our next step was to determine if there were metabolic differences in the vari-    These results imply that cells whose metabolic processes utilize more glucose should be more likely to create protein that has achieved terminal galactosylation. It is important to note that because a design of experiments (DoE) was not performed to independently set X block measurements, a causal relationship between glucose utilization and glycosylation efficiency was not established; we are claiming that differences in media composition impact aspects of cellular metabolism responsible for glucose utilization and aspects of cellular metabolism responsible for achieving terminal galactosylation and that there appears to be a positive correlation between these two processes. The loading weights at the end of the culture, Days 5, 6, and 7, provide more detail to the interpretation; they imply that lactate production is positively correlated with terminal galactosylation efficiency and growth is inversely correlated with terminal galactosylation efficiency. Together, we interpret this as suggesting that cells achieving a high degree of terminal galactosylation are utilizing the increased amount of glucose as an energy source at the end of the culture, as evidenced by the positive correlation with lactate production, but this energy is being used by metabolic processes unrelated to growth, as evidenced by the inverse correlation with cell growth.
Examining the score values for the second principal component, Taken together, these results suggest that cells that utilize more glucose for energy during the stationary phase of the culture will also result in cells with lower productivity. It should be noted again that we are not suggesting the aspects of metabolism related to consuming glucose for energy have a causal relationship with the aspects of metabolism related to protein production; rather we are suggesting that media selection has an impact on both of these aspects of cellular metabolism and that there appears to be an inverse correlation between these two processes. In addition, the results indicate that cells that consume more glutamine during the lag and exponential phases tend to produce more protein. The elevated consumption of glutamine during the lag phase appears to come at a cost: the cells that consume elevated levels of glutamine during the lag phase also tend to consume less glutamine during the stationary phase and produce protein that has a lower degree of G0F, G1F, and G2F glycoforms.
However, elevated levels of glutamine consumption during the exponential growth phase does not appear to suffer from this drawback, as cells with elevated glutamine consumption during the exponential growth phase tended to produce more protein without having an effect on the cells' ability to convert HM glycoforms to G0F. the lag phase tended to produce more protein, but that protein tended to have more immature HM glycoforms. Furthermore, cell cultures that consumed elevated levels of glutamine during the exponential growth phase tended to produce more protein without any change in the glycosylation profile. Lastly, cell cultures that consumed elevated levels of glutamine during the stationary phase tended to produce less protein, but this product contained less of the immature HM glycoforms. It is important to note that because the in-process variables could not be set independently of one another that we are not concluding that the relationships found are causal; rather, the data indicates that the same underlying metabolic conditions responsible for productivity and glycosylation outcomes are also responsible for the differences seen in cell growth and metabolite utilization.

| DISCUSSION
Glutamine is a commonly measured nutrient in bioreactors since it supports cells with high energy demands (as an alternative energy source) and that synthesize large amounts of proteins. However, glutamine is unstable when not in a dipeptide form and breaks down into ammonia and pyroglutamate. 25 The resulting increase in ammonia that results from excess glutamine can cause an elevation in pH which decreases the functionality of the glycosyltransferases, such as those responsible for galactosylation. While the pH in the bioreactors is carefully controlled, there might still be small increases in the intracellular pH that can result in the decrease of galatosylated glycoforms. 26 Alternatively, as glutamine supplementation causes increases in glucosamine levels it is likely that loss of galactosylation results from this effect as well. 27 Here, we report that we also find that elevated glutamine consumption during the lag phase can result in a greater abundance of HM glycoforms as well. Collectively, these results show the importance of PAT to measure levels of nutrients such as glutamine to maintain concentrations that will not adversely affect product quality.
MVDA allowed us to validate experimental findings already established through different experimental approaches while also discovering new growth, medium, and productivity related trends. The increasing complexity of bioprocessing and costs associated with running bioreactors necessitates better understanding on how different input variables affect product quality. Future bioreactor studies to verify that metabolite and growth profile factors can cause changes in the HM glycoform rates will be used to validate our current findings. In this study, we uncovered growth medium parameters that were significantly linked to changes in the final antibody product glycan profile, which will be studied further with the overarching goal of tailored CQA control.

| Cell culture instrumentation and process
This procedure is demonstrated in Journal of Visualized Experiments. 28 Briefly, the ambr ® 15 system (Sartorius, Hertfordshire, UK) was run in batch mode while using four culture stations to support running 36 micro bioreactors (only sparged). An in-house CHO DG44 cell line was inoculated at a density of 1 × 10 6 cells/ml. The same process parameter set points were used for all reactor vessels: agitation rate = 1,000 rpm, dissolved oxygen (DO) = 50%, pH = 7.1 ± 0.05, temperature = 37 C. Due to the small micro bioreactor volume average of 15 ml, only~2 ml of medium per day were pulled from each micro bioreactor. Because of this, the product antibodies could only be characterized after harvesting on the 8th day. The micro bioreactors were inoculated using standard seed train protocol. 20

| Concentration of purified antibody
This procedure is shown in Journal of Visualized Experiments. 29 After purification, a Thermo Scientific NanoDrop One microvolume UV-Vis spectrophotometer was used to determine the sample concentration while the protein extinction coefficient of 13.7 at 280 nm for a 1% IgG solution was used in the calculation of the antibody concentration. A solution of 0.1 M acetic acid neutralized to pH 5.5 with Tris Base (Sample Buffer) was used to blank the instrument.

| Aglycosylation of antibody heavy chain
Reduced capillary electrophoresis-sodium dodecyl sulfate method was used to determine the percentage of aglycosylated (nonglycosylated) heavy chain of purified antibody. The size of the aglycosylated heavy chain was tested using PNGase F (Promega, cat#V4483A) and monitoring peak shift from glycosylated to aglycosylated heavy chain with and without PNGase F treatment (data not shown

| Cell culture data preprocessing
To overcome time lag effects due to the measurement times, it was necessary to fit a smooth function to each batches' set of observed metabolite and growth measurements. This was accomplished using the Shape Language Modeling toolbox to fit a smooth function to each of the measured time-series using a cubic interpolating spline with six knots. 30 In addition to finding the best profile, as defined by least squares, over-fitting was prevented by using growth, nutrient and metabolic byproduct heuristics as linear constraints in the objective function that minimizes the sum of square error--these constraints are shown in Table 3. The exact locations of the maxima and inflection points were found using an iterative approach with a 0.25-day grid for each time-series individually. Sevenfold crossvalidation was used to determine the locations that resulted in the best fitting spline. 31 After determining our smoothing functions, they can be applied to

| Cell parameter estimation
There is additional information about the cell culture that can be cal- For all integral calculations, we used the Newton-Cotes integration formula for 5 points, as shown in Equation 1.
• Viable cell density accelerates until an inflection point before the maximum where it begins to decelerate • 9t inf 2 0, t max ð Þ: The integration boundaries are a and b, f(x)| x = a is the parameter estimate when the spline is evaluated at x = a and the step size is defined by h = b −a 5 . 32 All derivatives were calculated using a second order Lagrange polynomial that was fit to the data because some independent variables (i.e., IVCD in the cell derivative calculations) exhibit uneven spacing. This is shown in Equation 2: To evaluate the derivative at time point x ϵ (x 0 , x 2 ), three consecutive measurement pairs (x 0 , f(x 0 )), (x 1 , f(x 1 )), (x 2 , f(x 2 )) are used. 32 The first time point is designated x = x 0 , the internal data points The regressors are projected into the score space according to Similar to PCA, kfold cross-validation 28 was used to select the optimal number of principal components; A was selected to maximize the model's predictive power, as measured by Q 2 .

| Feature selection
In total, there were 150 potential model features: 39 cell culture measurement estimates and 111 additional calculated variables generated from the measurement estimates. Of these potential model features, 9 trivial features were eliminated as they did not vary between batches. For example, the Day 0 lactate measurement is trivial as it is below the limit-of-detection. Similarly, the Day 0 value can be excluded for all variables derived by integration because nothing has accumulated when the cell culture is starting.
In order to focus the PLS model's analysis on features relevant for cellular metabolism, only the cell specific metabolite features were used for the analysis and all metabolite concentration data were not used. After this, 79 features remained: they were the time-series for growth rate, IVCD 24 , IVCD, specific glutamine consumption rate, specific glucose consumption rate, specific lactate production rate, specific glutamine consumed 24 , specific glucose consumed 24 , specific lactate produced 24 , cumulative specific glutamine consumed, cumulative specific glucose consumed, and cumulative specific lactate any feature whose VIP value exceeds 1, or whose confidence interval falls entirely above 0.5, was retained for further model building.