The formulation of optimal mixtures with Generalized Disjunctive Programming: A solvent design case study

Systematic approaches for the design of mixtures, based on a Computer-Aided Mixture/blend Design (CAMbD) framework, have the potential to deliver better products and processes. In most existing methodologies the number of mixture ingredients is fixed (usually a binary mixture) and the identity of at least one compound is chosen from a given set of candidate molecules. We present a novel CAMbD methodology for formulating the general mixture design problem where the number, identity and composition of mixture constituents are optimised simultaneously. To this end, Generalized Disjunctive Programming (GDP) is integrated into the CAMbD framework to formulate the discrete choices. This generic methodology is applied to a case study to find an optimal solvent mixture that maximises the solubility of ibuprofen. The best performance in this case study is obtained with a solvent mixture, showing the benefit of using mixtures instead of pure solvents to attain enhanced behaviour.


Introduction
Mixtures play an important role in the process industries and blends of refrigerants 1-3 , polymers 4,5 and solvents 6 are employed in a wide range of applications. Solvent mixtures, for example, are a design phase that consists of optimising a performance index subject to property constraints, and finally a post-design phase where verification and analysis of the results obtained from the design phase take place.
Some studies in the area of CAM b D have been focused on developing new methodologies for formulating and solving the mixture design problem. One set of approaches is based on the use of Mixed Integer Nonlinear Programming (MINLP), and this has mostly been applied to the design of binary mixtures. Such a methodology was presented by Duvedi and Achenie 1 who studied the design of environmentally friendly refrigerant mixtures. The authors proposed a mathematical programming problem where the identity of candidate molecules and of the components in the mixture are defined by binary variables, whereas continuous decision variables are used to represent mixture properties and composition. This design methodology was also employed by Churi and Achenie 2 to design optimal refrigerant mixtures that have the highest cooling effect in a double-evaporator refrigeration system. In this study, optimal binary mixtures that can give higher efficiencies were identified for a refrigeration cycle with two evaporators operating at two different temperatures. Siougkrou et al. 37 investigated the design of binary solvent mixtures as part of conceptual process design. Their approach focused on the design of a CO 2 -expanded solvent and its impact on process performance. They used enumeration to solve the resulting MINLP due to the small number of discrete choices.
A general and systematic methodology based on an MINLP formulation was also proposed by Vaidyanathan and El-Halwagi 4 for the design of polymer blends, where binary polymer mixtures that match a set of target properties were determined. In the resulting design problem, the identity and the compositions of the components in the mixture were considered as decision variables. Local solutions of the problem were obtained with commercial software packages and an example of small dimensionality was solved globally using an optimization algorithm based on interval analysis.
Much effort has also been devoted to developing strategies that can address the complexity of the MINLP mixture design problem. An interval-analysis based optimisation framework was developed by Sinha et al. 38,39 to solve such problems. In their study, an eight-step interval-based domain reduction algorithm, LIBRA, was developed and used successfully to identify the globally optimal binary mixture in the design of environmentally acceptable blanket wash solvent blend. The solvents that participate in the mixture were selected from a list of promising candidates.
Several authors have developed decomposition-based approaches in which the search space is gradually narrowed. A decomposition-based computer-aided molecular/mixture design methodology was proposed by Karunanithi et al. 10 . Within their framework, the original mixture design problem is decomposed into the five following non-linear sub-problems: (i) structural constraints, (ii) pure component property constraints, (iii) mixture property constraints, (iv) miscibility constraints, and (v) process model constraints and objective function (MINLP subproblem).
In each subproblem, pure components and/or mixtures that do not satisfy a subset of the constraints of the original problem are eliminated. This leads to a smaller final MINLP problem, in which a large portion of the search region has been eliminated. This decomposition-based methodology was later applied by the same authors to the design of crystallisation solvents 11 .
Their study involved the design of optimal binary solvent mixtures that maximise the potential recovery of a drug, subject to several property constraints, such as crystal morphology, solubility, viscosity, toxicity, normal boiling point and melting point.
Buxton et al. 40 proposed a systematic decomposition-based procedure to select optimal solvent blends for nonreactive, multicomponent gas absorption processes. Their approach was based on an extension of the work of Pistikopoulos and Stefanis 9 , who considered the design of pure solvents for environmental impact minimisation. Within their proposed framework, process operations that make use of solvents are identified first and the solvent candidates are then determined subject to specific property and environmental constraints. Finally, the performance of the solvents is verified on a plant-wide basis and the optimal solvent candidate is selected. The extension of this formulation to the design of binary solvent mixtures 40 requires the inclusion of additional constraints on the physical properties and the operating conditions.
In the recent work of Papadopoulos et al. 33 , a CAM b D approach was developed as a Multi-Objective Optimisation problem (MOO) for obtaining the optimal binary fluid mixture for organic Rankine cycles. In this approach, the two compounds and their optimal composition in the mixture are designed simultaneously by employing a two-stage methodology. In the first part, the molecular structure of a pure compound that matches a set of properties and yields the best performance measure is designed; this compound is then selected to be the first component in the binary mixture. The second stage consists of designing a number of feasible molecules for the second component in the mixture and defining the optimal mixture composition. The proposed mixture design methodology is followed by a nonlinear sensitivity analysis to evaluate the effect of the uncertainties arising from the model and in particular from the use of group contribution methods. A useful feature of this approach is that the first component is guaranteed to be a good fluid, so gives good baseline performance, and the second component is guaranteed to provide a performance enhancement, regardless of its performance as a pure compound. In principle this means that, if this framework was applied to the design of a solvent mixture that maximises the solubility of paracetamol, a binary mixture with the characteristics of the acetone and water mixture (of Figure 1) could be identified. Acetone, which is considered to be a good solvent, could be identified in the first part of the methodology and water, which improves the overall performance, could be determined in the second part.
Only a few methods have been proposed to address the design of mixtures with more than two components. This issue was studied systematically by Klein et al. 41 , who proposed a Successive Regression and Linear Programming (SRLP) algorithm for the solution of the Nonlinear Programming (NLP) formulation of the problem. In this work, the objective was to determine the minimum cost solvent mixture, subject to linear constraints on the solubility parameters and nonlinear constraints on the density and boiling-point temperature. The candidate solvents were selected from a predefined set of molecules.
Recently, Ng et al. 42 presented a two-stage methodology for mixture design in an integrated biorefinery. The first stage consists of the mixture design framework, where optimal mixtures are formulated based on standard CAM b D techniques. First, the component that performs the main functionality of the mixture is identified from given products or designed with respect to physico-chemical properties and structural constraints. Next, based on the target properties (product needs), the number of components to participate in the final mixture is defined, and suitable additive components that meet these properties are then designed. In the final part of this stage, the miscibility of the mixture components is investigated. In the second stage, the optimal biomass conversion pathways that produce the optimal mixtures determined in the first stage are identified by using a superstructure optimisation approach. The design methodology was applied to case study for the design of biofuels from palm-based biomass.
A systematic four-step methodology applicable to more than two components was proposed recently by Yunus et al. 34 for the design of blended liquid products. The first step consists of the definition of the problem, where the product needs are identified and translated into physico-chemical target properties, and target values for these properties are determined. In the next step, a set of property models is retrieved from the model library to allow the prediction of pure component and mixture target properties. The third step involves the design of multicomponent mixtures based on a decomposition methodology, where pure components that satisfy the property constraints are first identified and then a stability analysis is performed to define possible mixtures. The third step is concluded by optimising the performance objective subject to the linear and nonlinear property models. The mixture design methodology is applied to binary and ternary mixtures. In the fourth and final step, rigorous models are employed to verify the mixture property values, resulting in a set of optimal blends that satisfy all property targets. The proposed methodology was applied to two case studies: a) designing gasoline blends that can be used in car engines in hot climates and b) designing environmentally friendly base-oil mixtures that have good lubrication properties, from organic chemicals and mineral oils.
In spite of these advances, there remains significant scope for further research in this area. Most existing methodologies are applicable to binary mixtures only 1,2,4,10,11,33,38-40 , with the exception of the work of Klein et al. 41 and Yunus et al. 34 . Furthermore, in many methods 1,2,10,34,38,41 the space of possible components is restricted. In some cases, all components that participate in a mixture are selected from a given set of molecules, while in others, one of the mixture ingredients is defined a priori, and the other compounds (usually one compound) are designed or selected from a set. In these methodologies, the number and the identity of one component (or of all components) are usually chosen in advance, and then the identities of the remaining molecules in a mixture, their compositions, and, where relevant, process topology and operating conditions, are defined. In this sequential approach, where the desired number of mixture ingredients is specified and a single compound is selected from a given list, there is a risk of excluding from the design space molecular structures which, when combined in a mixture, can lead to better performance. Only a small number of studies are reported in the literature in which the simultaneous design of all compounds is considered 33, 40 . However, these studies refer to problems where the number of mixture ingredients is fixed to two. In summary, a reduced version of the general CAM b D problem has been addressed to date: the number of mixture ingredients is fixed (usually 2) and the identity of a compound (or in a few cases, of all compounds) that can participate in a mixture is chosen from a given set of candidate molecules.
A hurdle in the further development and widespread use of tools for CAM b D is the complexity of the mathematical programs that need to be formulated and solved. When modelling a mixture design problem directly as a Mixed-Integer Nonlinear Programming (MINLP) problem, several numerical difficulties can arise, related to the nonlinearity (nonconvexity) of the property models and the large design space. Solving the optimisation problems can be quite challenging, as these are combinatorial due to the presence of binary variables, and highly nonlinear due to the expressions that relate composition, structure and physical properties 12 . In view of these challenges, the main purpose of this paper is to create a comprehensive and systematic mathematical programming approach for the formulation and solution of the general CAM b D problem. In order to address the difficulties arising from the complexity of expressing the problem within a mathematical framework, we adopt a logic-based methodology in which Generalized Disjunctive Programming (GDP) 43 is used to formulate the discrete choices inherent in mixture design problems. In this framework we first show how to formulate 6 a design problem in which the number of components (N ) in the mixture is fixed and these components are selected from a predefined set of molecules. In working towards the generalized CAM b D problem, we focus on making N a design variable where at most N max compounds are chosen from a given list. This design methodology is an initial approach to a more general concept where the simultaneous design of the number, identity and compositions of mixture ingredients will be considered. Our proposed logic-based approach fits within the broader framework proposed by Harper et al. 36 and can be used in the second (design) step of such an approach.
This paper is organised as follows: first, the background theory is given, including a brief introduction to the CAM b D framework, followed by a short description of Generalized

Modelling approaches
Computer Aided Mixture/blend Design (CAM b D) The many systematic approaches that have been derived for the design of compounds that exhibit desirable performance are known collectively as Computer-Aided Molecular Design (CAMD) methods. The CAMD concept was initially introduced by Gani and Brignole in 1983 7 and there has since been significant progress towards this goal 9,10,14,33,35,36,[44][45][46] . In the case of mixture design applications, a CAMD problem is expanded into Computer-Aided Mixture/blend Design (CAM b D ) problem, usually by including additional mixture property constraints in a "standard" MINLP CAMD problem. Achenie and co-authors 10,14,35 defined CAM b D as follows: "Given a set of chemicals and a specified set of property constraints, determine the optimal mixture and/or blend".
The main objectives of the CAM b D framework focus on optimising the physical properties of mixtures or compounds 47,48 and/or on optimising process performance, such as minimising process cost 49 or maximising production 50, 51 .
As discussed in the introduction, most existing CAM b D methods involve formulating and solving a problem by using mathematical programming techniques; usually a Mixed Integer 7 Nonlinear Programming (MINLP) problem. According to Karunanithi et al. 10 , a general CAM b D problem can be formulated as an MINLP problem as follows: where f is the objective function to be optimised (minimisation is assumed without loss of generality), subject to structural constraints (g 1 (y)), pure component property constraints (g 2 (y)), mixture property constraints (g 3 (x, y)) and process model constraints (g 4 (x, y)). The

GDP Formulation
The general formulation of a Generalized Disjunctive Program can be represented as follows: Y j,k ). K is the index set for the disjunctions and J k is the index set of the terms in each disjunction k ∈ K. The function g(x) represents general constraints that must hold regardless of the logic, while h j,k (x) are conditional constraints that hold when Y j,k is True. In mixture design problems the disjunctive constraints are related to the assignment of compounds and the number of components that participate in a mixture. Ω(Y ) represents logic relations for the Boolean variables expressed as propositional logic 61 .
It should be mentioned that any MINLP problem can be formulated as a GDP problem and vice versa. It may be beneficial to adopt a GDP formulation because, when compared to mixed-integer programming, it provides a more structured framework for modeling discrete-continuous choices and it expresses more directly both the quantitative and the qualitative parts of the optimisation task 62, 63 . In an MINLP problem (Eq. (1)) the logic needs to be expressed through the objective function and algebraic constraints, in the form f (x, y) and g(x, y), respectively.
In GDP, on the other hand, the logic is captured inside the disjunctions by relating Boolean variables (Y j,k ) to equations in the continuous form (h j,k (x)), whereas the logic that connects the disjunctive sets is expressed through the relations Ω(Y ) 64 . In order to formulate a general mixture design problem (Eq. (1)) as a GDP, several characteristics of the constraints must be taken into account. The constraints that do not depend on the logic conditions can be formulated as general constraints (g(x)), whereas the constraints that depend on the logic conditions, such as on the assignment of compounds or on the number of components in a mixture, are formulated within the disjunctions as conditional constraints (h j,k (x)).

Reformulation of GDP as an MINLP
Once an appropriate GDP formulation has been obtained, it can be converted into an MINLP where y is a vector of binary variables, which has one-to-one correspondence with the Boolean variable vector, Y , A is an m × n matrix, b is an m-dimensional real-valued vector and the parameter M j,k is a "sufficiently large" upper bound. The logic propositions in GDP, Ω(Y ) = True, have been converted into linear inequalities, Ay ≤ b 67 . The tightest value for M j,k can be calculated as 52 :

GDP formulation of the CAM b D problem
The design methodology proposed in our work integrates Generalised Disjunctive Programming

Problem definition
The aim of this study involves the generic formulation of mixture design problems in order to find the optimal number of mixture ingredients, the optimal identities of the components (chosen from a given list) and their compositions, such that all given specifications are satisfied and the specified performance objective is optimised.
The problem formulation is constructed in a systematic way by considering two problem number of components to be designed. The total number of components in the mixture is thus For clarity, we use the term "components" to refer to the ingredients/molecules in the mixture we are designing and the term "compounds" to refer to ingredients/molecules in the set S from which we choose the components. Those components in the mixture that are not fixed (i.e., components N + 1 to N c ) are referred to as the "designed components".
The vector h i,s in each disjunction represents the constraints that are active when compound Logic propositions: Logic conditions (Ω(Y ) = True) are included to avoid degeneracy by enforcing a specific ordering of the compounds. Degeneracy can be prevented by the following relations that ensure that the relative position of a compound in the set S is maintained in the mixture (set I) if the compound is selected: where the symbol ¬ implies negation (i.e., not Y i ,s or Y i ,s = False). They are translated into algebraic equations as follows: The constraints in Eq. (5) restrict the feasible space by eliminating identical degenerate solutions. Logic conditions are also derived to ensure that each candidate compound is selected at most once: This is equivalent to: GDP formulation The GDP formulation of the restricted problem is thus written as:

Reformulation of GDP as an MINLP
The GDP model is and its algebraic form is: The rest of the components are assigned compounds from the list only if they are participating in the mixture.
and it is transformed into the following constraint: whereỹ n is a binary variable equivalent toỸ n .
Logic propositions: Logic conditions to avoid degeneracy are also required in the general formulation. A specific ordering of the compounds is enforced by using the same logic propositions as were described in the restricted problem, i.e., Eq. (4). Eq. (6) is also used in the general model to ensure that each candidate compound is selected at most once. Additional logic conditions are required to ensure that at most one compound is assigned to components N + 2 to N c : This is written equivalently as: This differs from the corresponding constraint in the restricted formulation (Eq. (3) The above expressions are replaced by their equivalent disjunctions and the "OR" operator is distributed over the "AND" as described by Raman and Grossmann 68 . As shown in Table 1, the resulting clauses can then be expressed as a set of linear inequality constraints by replacing the Boolean variables with binary ones.

GDP formulation
The GDP formulation of the general model can be written as: where Ω (Y ) = True denotes the logic relations in Table 1. Since at least one component should be present in the mixture, the mole fraction of the first designed component (x N +1 ) has always a non-zero value.

Reformulation of GDP as an MINLP
Formulation (G-GDP) can be transformed into an MINLP problem by using the big-M approach as follows:   Table 1.
The formulations described in this section are applied to a solvent mixture design case study presented in the next section. 18 Case Study: Maximizing the solubility of Ibuprofen Ibuprofen (ibu) is a colourless anti-inflammatory compound that can be crystallised by cooling crystallisation 69 . Solubility is one of the key properties that determine the performance of the crystallisation process 11, 69 . Karunanithi et al. 11 have already addressed the problem of identifying appropriate solvents or solvent mixtures that enhance the crystallisation process of ibuprofen. This well-studied application is, therefore, a suitable example to investigate the use whereas all solvent molecules in the mixture are in a single liquid phase. Therefore, the solubility, which depends on the enthalpy of fusion and melting temperature of the solid and its liquid-phase activity coefficient, is expressed in terms of ibuprofen and calculated as follows 70,71 : where γ ibu is the liquid phase activity coefficient of ibuprofen at temperature T , composition x and pressure P , R is the gas constant, ∆H f us is the enthalpy of fusion of ibuprofen at temperature T m , and T m and T are the normal melting point of ibuprofen and the mixture temperature, respectively. The pressure is assumed to be atmospheric (P = 1 atm). The activity coefficient is evaluated using the UNIFAC 70,72 group contribution method, and it is calculated as the sum of two contributions, a combinatorial term (superscript C) and a residual term (superscript R), as shown below: The UNIFAC model proposed by Smith et al. 73 in a form convenient for implementation is employed in this design problem and the relevant equations are presented in appendix A for completeness.
The mutual miscibility of the solvent molecules also needs to be examined in order to ensure that the final mixture is in one phase. However, algebraic relations to describe this constraint are not available for multicomponent systems and therefore, in common with other works 10,11,40 , a miscibility constraint for every binary pair of solvent molecules is employed in this case study (i.e., each binary pair of solvents must be miscible for the chosen relative composition, temperature and pressure) 73 :

Scenarios considered
We consider several instances of the case study, with varying complexity. In particular, numerical difficulties may arise due to the highly nonlinear nature of the miscibility function. In order to All the design sets used in this case study are shown in Tables 2 and 3, and the list of candidate solvents is shown in Table 4. Although a list of promising pure solvents in which ibuprofen has a high solubility has sometimes been used in previous work 10, 11 , a list of common solvents that yield a range of solubilities is employed in this work in order to investigate mixtures where one compound is a poor performer when used on its own, but may lead to high solubility  Table 5. The number of groups of type k in ibuprofen (v ibu,k ) and in a solvent s (v s,k ) is presented in Appendix C in Tables C1 and C2, respectively. The group volume parameters (R k ), the group surface area parameters (Q k ) and the group interaction parameters (a k,m ) used in the UNIFAC model for the prediction of the activity coefficient are obtained from Poling et al. 76 and listed in Appendix C (Tables C3, C4, C5) for completeness.

Restricted problem: Fixed number of solvents
This problem aims to identify the optimal mixture of components for a fixed number of solvent molecules (i.e., 1, 2 or 3 selected solvents), along with the mixture composition, in order to maximise the solubility of ibuprofen. The formulation is presented for the selection of three solvents but it can readily be extended to any fixed number of solvents. The disjunctions for the choice of solvents are shown below: where v s,k defines the identity of the solvents, i.e.
where and Ω (Y ) in Table 1) are also derived. Valid upper bounds can readily be derived for the big-M parameters (M h is and MF n ) using Eq. (2). The upper bounds used in this case study are relaxed bounds rather than exact bounds, to avoid numerical difficulties arising from tight bounds and machine precision.

Task 2: Mixture design with the miscibility constraint
In this task, the introduction of the highly nonlinear and nonconvex miscibility function increases the complexity of the formulations and makes their solution quite challenging.

Restricted problem: Fixed number of solvents
The restricted problem is formulated in the same fashion as for the first task and it consists of the objective function (x ibu ), solubility (Eq. (14)), activity coefficient (Eq. (15)) and miscibility (Eq. (16)) constraints, and the logic relations Eq. (4), (6). The miscibility function is calculated for every binary pair of designed components, i.e. for the pairs (c 1 , c 2 ), (c 1 , c 3 ) and (c 2 , c 3 ) for N = 3, using the composition of the binary mixture derived from the overall mixture composition. Specifically, the miscibility constraint for a mixture of solvents i and j is given as: where dγ i,j i is the derivative of the natural logarithm of the activity coefficient of component i with respect to the mole fraction of i in the binary mixture (i.e. dγ i, The mole fraction of the solvent i in the binary mixture (x i,j i ) is expressed as: where x i and x j are the mole fractions of components i and j, respectively, in the overall mixture. Because the total number of components in the mixture is fixed, Eq. (17) and (18) can be treated as general equations and included outside the disjunctions. The resulting MINLP formulation is given by model (R-BM2) in Appendix D.

General problem: Unknown number of solvents
The more general problem formulation requires further adaptation to include the miscibility constraints. Recalling that the problem includes disjunctions for the assignment of the candidate solvents and disjunctions for the number of the solvents selected, we first note that the disjunctions for assigning solvents from the list to mixture components are unchanged from task 1. On

Results and discussion
All models were implemented and solved in GAMS 77 version 24.2.3, using DICOPT 78-80 , which is a local MINLP solver. The models were run on a single core of a dual 6 core Intel Xeon X5675 machine at 3.07GHz with 48GB of memory. Due to the highly nonlinear nature of the equations in the models, multiple initial guesses were used to identify good solutions. The best solutions obtained in the first task (i.e., without the miscibility constraint) and in the second task (i.e., with the miscibility constraint) are summarised in Tables 6 and 7, respectively. It can be observed that in the restricted problem for both tasks, the best solution is yielded by a mixture of two solvents. In the formulations where the miscibility function was not included, a mixture of chloroform and water was identified as the optimal solvent mixture, whereas a binary mixture of chloroform and methanol was identified as optimal when the miscibility constraint was added.
Indeed, the miscibility constraint is not satisfied for the pair of chloroform and water at the optimal composition of task 1. Chloroform and methanol, on the other hand, are fully miscible.
The mixtures with three components give slightly lower solubility than the mixtures with two components. In these cases, the mole fraction of the third solvent component is at the lower bound. The results obtained when solving the general problem, with N unknown, validate the solutions obtained in solving the three restricted problems, by confirming that the highest solubility is achieved by a binary mixture, in which the composition of ibuprofen is 0.34928 and 0.33383 in the first and second task, respectively. The results of both tasks show that a higher solubility can be obtained in a mixture of two or three components rather than in a pure solvent.
The problems of the first task, where the miscibility constraint was not included in the formulation, were also solved globally in GAMS, using BARON 81 which is a global MINLP solver. The results validate the solutions obtained with DICOPT, by proving that the highest solubility is achieved by the mixture of chloroform and water, as shown in Table 8

Conclusions
Computer aided mixture design is an important tool that has the potential to improve process and product design, but that often leads to challenging mixed integer optimization problems due to nonconvexities in the space of the continuous variables and a large combinatorial solution space. The number of interlinked decisions to be considered makes it difficult to formulate the problem in a way which can be easily understood, modified and solved. A general modeling framework for mixture design problems has been proposed in this work to address these difficulties.
Several problem formulations based on the GDP formalism have been presented. They provide a systematic approach to posing CAM b D problems in which the number of mixture components, the identities of the components and their compositions are to be determined. The proposed approach has been applied successfully to a solvent mixture design problem for maximising the solubility of ibuprofen. The methodology adopted in this case study included two problem 26 formulations: (i) with fixed number of solvents (restricted problem) and (ii) with unknown number of components (general problem). Both problems were first solved without taking any miscibility constraint into account in the problem formulation and then including a miscibility constraint for every binary solvent pairs. Logic conditions between the disjunctive sets were expressed as algebraic constraints, whereas disjunctions for the assignment and number of solvent molecules were transformed into mixed-integer constraints using the big-M approach.
High quality solutions of all problems were obtained using a local MINLP solver.
The findings from the case study provide evidence of the usefulness and versatility of a GDP-based approach to optimal mixture design. Integrating GDP techniques into the CAM b D framework can facilitate the formulation of the design problem, making it possible to optimise simultaneously the number, identities and compositions of components in the mixture. Numerical difficulties associated with the absence of components in the final mixture, which are a concern when miscibility constraints are included in the formulation, can be avoided, leading to computationally efficient solutions. In the case study, it was found that mixtures outperform pure solvents.
Future perspectives for this work include developing algorithms to solve these complex design problems globally and using alternative logic-based optimization techniques. The Big-M formulation employed in this study is the most common relaxation technique but it is known to give weak lower bounds for a minimization problem 43,82 . Other techniques, such as Hull Relaxation, can be used. In the case of convex problems, the resulting bounds are at least as tight or tighter 52,65 but the case of nonconvex problems 83,84 presents additional challenges.
Finally, the formulation of the design problems could be extended to the design of molecules from the basic building blocks (UNIFAC groups) so that the pre-selection of promising molecules to include in the list of candidates can be avoided. In this way, a comprehensive approach to mixture design problems can be adopted, where the optimal number of molecules, their identities and compositions are optimised simultaneously.

A UNIFAC Model
These equations are proposed by Smith et al. 73 in a form convenient for programming and they are slightly changed in order to avoid some numerical difficulties when the activity coefficient of ibuprofen is calculated.
Activity coefficient Combinatorial part of activity coefficient Residual part of activity coefficient

B Solvent properties
Experimental data for toxicity, and boiling and melting temperatures of the candidate solvents are presented in Table B1.

C Parameters of the UNIFAC model used in this case study
The number of groups of type k in ibuprofen (v ibu,k ) and in a solvent s (v s,k ) are presented in Tables C1 and C2, respectively; the group volume parameters (R k ), the group surface area parameters (Q k ) and the group interaction parameters (a k,m ) used in the UNIFAC model for the prediction of the activity coefficient are listed in Tables C3, C4 and C5, respectively.

D Problem formulations
For definition of indices and sets see Table 2.

MINLP formulations for task 1
Restricted problem (N=3) select only one disjunction: where variable b i,k is used to evaluate the miscibility constraint for binary pairs of designed components and it is non-zero only when a mixture of 2 or 3 solvents is designed.
select only one disjunction: