Sculpting inertial fluid flow using sequences of pillars is a powerful method for flow control in microfluidic devices. Since its recent debut, flow sculpting has been used in novel manufacturing approaches such as microfiber and microparticle design, flow cytometry, and biomedical applications. Most flow sculpting applications can be formulated as an inverse problem of finding a pillar sequence that results in a desired fluid transformation. Manual exploration and design of pillar sequences, while useful, have proven infeasible for finding complex flow transformations. In this work, we extend our automated optimization framework based on genetic algorithms (GAs) to rapidly design micropillar sequences that can generate arbitrary user-defined fluid flow transformations. We design the framework with the following properties: (a) a parameter encoding that respects locality to ensure fast convergence and (b) a multiresolution approach that accelerates convergence while maintaining accuracy. The framework also utilizes graphics processing unit (GPU) architecture via NVIDIA's CUDA for function evaluations. We package this framework in a user-friendly and freely available software suite that enables the larger microfluidics community to utilize these developments. We also demonstrate the framework's capability to rapidly design arbitrary fluid flow shapes across multiple microchannel aspect ratios.

## Introduction

Manipulating fluid flow in microchannels using a sequence of cylindrical obstacles (pillars) has been recently shown to be a powerful method for passive fluid flow control [1,2]. Previous methods of microfluidic fluid flow control were generally applied to fluid mixing [1] and used chaotic flow transformations with limited capability to purposely sculpt flow [3,4], or required active control [5]. This new technique, known as “pillar programming,” uses pillars that span the height of a microchannel to deform fluid. The basis of pillar programming is the fore-aft asymmetry in fluid streamlines around a single pillar in inertial flow, with 1 < Re < 100 (where $Re=(UDH)/\nu $ for a fluid kinematic viscosity *ν*, average downstream velocity *U*, and channel hydraulic diameter *D _{H}*). The inertial force present in the fluid causes it to be deformed as it moves past a pillar, unlike Stokes flow (Re ≈ 0, a flow regime typical of microfluidic devices), where the fluid will return to its location in the microchannel cross section. Furthermore, for Re < 100, the fluid flow around a confined pillar is still highly laminar and predictable [1]. Thus, the pillars of varying diameter and position will each produce a unique and time-invariant flow deformation for similar fluid flow conditions. The pillars can then be placed in a sequence, with each pillar's individual deformation acting as a building block toward a net deformation to the fluid at the microchannel outlet (see Fig. 1 for an overview of a pillar programming microfluidic device).

If the micropillars in a sequence are placed far enough apart in the streamwise direction to suppress possible crosstalk (spacing ≥6*D*/*w* for a pillar diameter *D* and microchannel width *w* [1]), the fluid flow deformation from each pillar can be independently simulated and used as a nested function. In other words, each pillar accepts the deformed flow shape from the previous pillar as an input for its own deformation. This provides a useful computational shortcut for simulating the arbitrary sequences of micropillars, as time-invariant deformations for single micropillars need only be simulated once. Thus, a library of precomputed fluid flow deformations for various micropillar configurations and flow conditions can be assembled from computationally difficult simulations for the Navier–Stokes equations over a three-dimensional (3D) domain for each micropillar. These simulations form state transition matrices, which relay the displacement information of the fluid within the cross section of the microchannel from inlet to outlet of each pillar domain [6] (see Appendix A). Subsequently, a fluid flow simulation for a microfluidic device consisting of an arbitrary sequence of pillars is well represented by piecing together each pillar's precomputed individual deformation, rather than simulating the device's entire 3D microchannel domain. Such computation is remarkably fast on modest consumer hardware, with typical runtime being well under 1 s for pillar sequences of reasonable length [6]. This method of simulation has been experimentally validated in microfluidic devices in Refs. [1,2,6].

Previous work incorporated this nested representation into a fast, user-friendly, and cross-platform utility for manual design called uFlow [2]. With an experimentally identified set of four micropillar diameters and eight lateral positions in the channel (for a total of 32 different micropillar configurations, which are easily manufacturable), a variety of possible fluid flow shape transformations have been predicted and experimentally validated for an inlet configuration [2]. These manually designed micropillar sequences have since been used to tailor polymer fiber cross sections [7], create 3D particles [8–10], and for solution transfer around particles in flow [11]. Such applications are still quite fundamental and broad in their scope, and the full potential of pillar programming in research and industry remains untapped. However, the immense phase space for pillar combinations and inlet configurations makes manual design of complex fluid structures practically difficult and time consuming. For example, in choosing from the 32 different pillar configurations as used in Ref. [2], a ten-pillar sequence has approximately 32^{10}≈10^{15} different permutations. Hence, the sheer number of possible flow shapes via pillar programming is well beyond manual exploration. This motivates the need for a computational framework that solves the inverse problem. That is, given a desired fluid flow shape, what is the micropillar sequence and inlet design that produces the closest possible match?

Stoecklein et al. [6] tackled the inverse problem in pillar programming by using an off-the-shelf GA,^{2} which has been previously used to optimize the geometry for the fluid flow applications [12–15]. The encoding of the design variables (a “chromosome” in the GA) was a set of integers, each of which acted as an index representing a micropillar configuration (i.e., a micropillar of some diameter and position for set flow conditions and channel geometry). The GA uses a cost function, referred to as a fitness function, with which to evaluate chromosomes during optimization. Stoecklein et al. [6] used a correlation function to compare the flow shapes created by the GA to a target fluid flow shape. The inlet flow design was manually specified by the user and held constant during optimization.

Though this work was successful in optimizing the pillar sequences for known transformations, we identified computational limitations that preclude widespread use. These can be categorized into three issues: long execution time, the need for user-supplied inlet design, and use of proprietary software. A single pillar sequence design search took as long as 24 h. The requirement for manual inlet flow design meant that multiple user-guided searches were often necessary for a single image. That is, an initial guess at what the flow should be at the inlet frequently had to be altered, especially since the fluid can undergo considerable lateral displacement. Hence, multiple iterations on the supplied inlet configuration are often required before a satisfactory result is found (see Fig. 2(a)). Additionally, the framework used matlab's built-in GA code. Beyond being commercial software, matlab uses interpreted code which is generally slower in execution than compiled code such as c++. Reducing GA runtime would have several benefits, the simplest of which is ease of use. Moreover, with faster operation comes the opportunity to repeat searches and use larger populations, which will more effectively search the design space. These issues motivated the development of a powerful, robust, and fast framework for pillar programming design that vastly improves on these problem areas and provides a roadmap to future enhancements. There are five key contributions in this work: use of multiresolution transition matrices, design of a real-valued GA chromosome, incorporating both inlet and pillar sequence design into the framework, adapting the fitness function to NVIDIA's CUDA, and the development of a freely available, open-source utility for automated pillar programming design.

## Problem Definition and Baseline Genetic Algorithm

**μ**

_{0}) and a sequence of

*k*transition matrices ${Mi}i=1k$, a simulated output image (vectorized as

**I**

_{sim}) is found by

We limit the possible values of pillar diameter and location to *D*/*w* = [0.375, 0.5, 0.625, 0.75] and *y*/*w* = [−0.375:0.125:0.5] (where *y*/*w* = 0.5 is two half-pillars on either side of the channel), for a total of 32 different pillar configurations (as in Refs. [2,6]). This defines a set $P={M[1]\u2026M[32]}$ from which each precomputed transition matrix in a sequence **M*** _{i}* can be selected.

**μ**

_{0}, such that the resulting flow deformation matches a desired flow shape. Given this forward model, we formulate the inverse problem as an unconstrained optimization problem

where *f* is a functional that quantifies the difference between the target image **I**_{T} and **I**_{sim}. This design space has been shown to be highly diverse, with a wide variety of possible fluid flow shapes [2,6]. Not only do many possible fluid flow shapes exist but there are also multiple micropillar sequences that will produce practically similar flow shapes [2]. However, some solutions (micropillar sequences) for a given fluid flow shape may be less desirable, as longer fluid flow sequences—which require longer microchannels—can be problematic. The longer microchannels require higher pressures to drive flow, potentially leading to flexure in the channel, which will in turn alter the geometry and, therefore, the sculpted flow shape. Moreover, the longer channels allow for additional transverse mass diffusion in the fluid streams, which will blur and distort the sculpted flow shape. Finally, the smaller micropillar sequences require less of a physical footprint on a lab-on-a-chip, which makes for more simple fabrication and multiplexing/parallelization, and provides room for additional microscale components. Consequently, it is often worthwhile to allow for fewer micropillars during optimization, rather than targeting only the complex spaces that come with longer micropillar sequences.

The finite cardinality of $P$ provides a discrete set of choices. The inverse problem will therefore have a discrete input space. This, coupled with the corrugated, nonconvex phase space, makes continuous approaches impractical. On the other hand, evolutionary heuristics such as the GA are well suited for such a problem [16]. The GA operates by evolving an initially random population of candidate solutions through a process of selection, crossover, and mutation, until some set of termination conditions are met. Each candidate solution is represented in the GA as a chromosome: a sequence of bits that is evaluated using a problem-specific fitness function. Selection is a mechanism for choosing pairs of chromosomes in the population that will crossover segments of their genetic material, creating offspring for the next generation that share traits from their parents. Chromosomes with good fitness are more likely to be selected to pass on their genetic code, while those with poor fitness tend to be washed out of the population. Those not selected for crossover still have a chance of being mutated. Mutation randomly alters bits of a chromosome, which can kick candidate solutions out of local optima into new areas of the search space. There is also an option to tag high-performing chromosomes as “elites.” If elites are used in a GA, the most-fit individuals within each generation pass on to the next generation untouched. This preserves the current best solution over the course of the GA. The process of selection, crossover/mutation, and evaluation will repeat for many generations with the overall population fitness improving. Termination criteria typically include a measure of convergence, e.g., some number of stall generations for which the best fitness in the population does not improve, as well as a generation cap. The final generation will contain a most-fit chromosome, which is the optimized solution for the problem.

In the current work, we seek to carefully improve the computational performance of our existing GA framework [6] and extend its functionality in a user-friendly way. We will refer to the framework in Ref. [6], which uses the matlab's built-in genetic algorithm, as a *baseline* for comparison in terms of speed, accuracy, and scope. The *baseline GA* was formulated with a chromosome that used a sequence of integer values 1–32 as indices for a set of precomputed transition matrices. The inlet flow design was user-defined. Each pillar sequence design involved executing the *baseline GA* through a user-defined number of allowed pillars in a sequence, repeating each search at least ten times (see Algorithm 1). The search was repeated in order to attempt sufficient coverage of the design space in using this stochastic evolutionary heuristic algorithm.

**Algorithm 1** Baseline GA Framework

**Inputs**:

**I*** _{T}* = Target flow shape image

**μ**_{0} = Inlet flow design

*N _{P}*

_{,start}= Initial pillar sequence length

*N _{P}*

_{,stop}= Final pillar sequence length

**Outputs:**

**M**_{i}_{,Opt} = Optimal pillar sequence

**I**_{sim,Opt} = Optimal flow shape image

*f*_{Opt} = Optimal fitness value

1: **procedure** Baseline

2: *f*_{Opt} = *∞*

3: **for***N _{P}* =

*N*

_{P}_{,start}

**to**

*N*

_{P}_{,stop}

**do**

4: **for***N*_{GA} = 1 **to** 10 **do**

5: $f,Mi,Isim=GA(IT,\mu 0,NP)$

6: **if***f* < *f*_{Opt}

7: *f*_{Opt} = *f*

8: **M**_{i}_{,Opt} = **M**_{i}

9: **I**_{sim,Opt} = **I**_{sim}

10: **return****M**_{i}_{,Opt}, **I**_{sim,Opt}, *f*_{Opt}

1: **function***GA*(**I*** _{T}*,

**μ**

_{0},

*N*)

_{P}2: Create random population of chromosomes

3: **while** not terminated **do**

4: Calculate *f*(population)

5: Select crossover, mutation, and elite chromosomes

6: Perform crossover and mutation on population

7: Select most-fit *f*, **M*** _{i}*,

**I**

_{sim}

8: **return***f*, **M*** _{i}*,

**I**

_{sim}

**I**

*and GA-produced image*

_{T}**I**

_{sim}, each of dimensions

*N*×

_{Y}*N*pixels, as

_{Z}The value of the correlation coefficient *r* varies between 1 (for exactly similar images) and −1 (for precisely complementary images), both of which are acceptable. We square the value of *r*, thereby treating both values equally. We formulate our GA as a minimization problem, meaning that perfectly matching flow shapes give *f* = 0 (optimal fitness), and worse matches tend toward *f* = 100.

The baseline GA [6] postprocessed the flow shape images generated by the forward model (along with the target image) with a low-pass filter based on the fast Fourier transform (FFT). This was done to compare coarse features of the targets, such as overall shape and size. Small discrepancies between shapes are generally irrelevant, given that the real-world flow shapes blur due to mass diffusion. Use of the low-pass FFT mimics this blur, making practically similar shapes effectively the same. However, we find that this step is not necessary for a successful GA search, as the correlation function used in *f* still effectively compares the flow shapes without the need for postprocessing. This results in a decrease in fitness function execution time, but a degradation in reported optimal fitness due to the “raw” output of the transition matrix method being compared to what is likely a smoothly drawn user-supplied target shape.

Other parameters used in the baseline GA include a population size of 100 (five elites), 200 maximum generations per GA search, scattered crossover, and tournament selection. Separate searches were conducted for a single target image using different numbers of pillars in the chromosome. Specifically, the framework swept through a user-specified number of pillars *N _{P}*

_{,start}to

*N*

_{P}_{,stop}, and repeated the entire search ten times (see Algorithm 1). We match these parameters in our comparisons to the baseline.

## Design of a Robust, Accelerated Optimization Framework

### Use of Multiresolution Simulations.

Stoecklein et al. [6] showed how higher-resolution streamtraces through the 3D pillar domain contribute to a more accurate representation of the sculpted fluid. In the conversion to transition matrices, less information from the fluid advection is lost due to a more refined grid onto which the fluid displacement is mapped. Nonetheless, a coarse streamtrace resolution can still relay useful information for bulk fluid movement. More importantly, the smaller transition matrices require fewer matrix algebra operations, which reduces the runtime.

We leverage this idea in two ways. First, we performed a study to determine the coarsest streamtrace sampling that maintains parity in accuracy with the baseline resolution as used in Ref. [6]. Then, we test a multiresolution GA that uses approximate (but extremely fast) low-resolution transition matrices in conjunction with high-resolution matrices during the GA. The smaller, low-resolution data (**M*** _{L}*) guide the population to several basins of attraction. From there, the high-resolution transition matrices (

**M**

*) continue the search with greater respect for detail in the optimized flow shape. Thus, a single GA is modified during evolution as in Algorithm 2. This is an example of the so-called “multiscale” GA, which has been used to improve the rate of convergence [17,18]. We recognize that the choice of generation switch and resolution is a complete study unto itself, and defer an exhaustive study to a future publication. Here, we demonstrate the possibilities that even nonoptimized values of resolution and generation switch produce in terms of significant computational gains.*

_{H}**Algorithm 2** Multiresolution GA

**Inputs**

Gen_{switch} = Generation at which **M*** _{L}* →

**M**

_{H}**Parameters**:

*Gen* = current generation

1: **function** Multi-Res GA(**I*** _{T}*,

**μ**

_{0},

*N*,

_{P}**M**

*,*

_{L}**M**

*)*

_{H} 2: **while** not terminated **do**

3: **if** Gen < Gen_{switch}

4: use *f* = *f*(**M*** _{L}*) {Fast, approximate}

5: **Else**

6: use *f* = *f*(**M*** _{H}*) {Slow, accurate}

7: Calculate *f*(population)

8: Select crossover, mutation, and elite chromosomes

9: Perform crossover and mutation on population

10: Select most-fit *f*, **M*** _{i}*,

**I**

_{sim}

11: **return***f*, **M*** _{i}*,

**I**

_{sim}

### Expanding the Design Space to Include Inlet Design.

The chromosome used in the baseline GA [6] contained information pertaining only to a pillar sequence. However, the complete forward model for simulated fluid flow must include some inlet flow design. We define this as number of channels at the inlet, and which channels contain the fluid being sculpted (see Fig. 1). As such, for a more complete search of the design space using the previous chromosome, the user would specify different inlet conditions for multiple design searches (see Fig. 2). This is an inefficient method of design which is exacerbated by the number of possible inlet configurations and the need for *a priori* knowledge of the design space. A user may end up running dozens of GAs, yet still miss an optimal solution. We address this by encoding the design of the inlet into the chromosome, thereby allowing the GA itself to tune the complete pillar geometry and fluid flow design in silico.

We utilize a simple scheme for encoding the inlet into the chromosome. We specify a fixed number of channels that join together at the inlet, the limit of which will usually depend on the microfluidic fabrication capability (e.g., Fig. 1 shows three channels being joined at the inlet). The GA then allocates the same number of “bits” in the chromosome to correspond with the channels at the microchannel inlet. These bits are limited to binary values (0 or 1) in the GA and constrained so that their sum is never less than one or equal to the total number of channels (thus ruling out “empty” and “full” solutions).

When the fitness function evaluates a chromosome, an inlet vector is created with ones or zeroes in the fractions of the channel corresponding to each channel bit. For example, a five-channel inlet can be represented in the chromosome as [0, 1, 0, 1, 0], with two-fifths of the inlet having tracked fluid separated by an empty central channel (see Fig. 3(a)). As the GA evolves its population, it will tune the inlet-specific portions of the chromosome for optimal fitness. Thus, the inlet design becomes an output of the GA framework.

### Improving the Pillar Sequence Chromosome Design.

We also change how the pillar sequence is encoded as a chromosome. Previously, the integer values were used as proxies for a fixed set of micropillar configurations. That is, an arbitrary mapping of the possible configurations to the integer set was performed, and these integers were used as indices in the chromosome. For example, a chromosome of [1, 7, 31, 25] represented a four-pillar sequence, with each integer referring to a micropillar of some diameter and position in the channel. This standard choice of representation results in an encoding that does not respect *locality*. That is, the baseline GA chromosome introduces arbitrary jumps in the design space, as incrementing an integer value may alter a pillar's design by a large change in diameter, position, or both. This could slow down convergence in the GA, as changes caused by mutation or crossover would have nonlinear sensitivity. For example, if a particular chromosome is very close to a good solution, the only necessary change may be modifying a single pillar's diameter. In using integer proxies, there is a chance that mutation or crossover will alter both the pillar's diameter and position. In contrast, splitting up each pillar's diameter and position into separate entries improves the odds that a selection event will alter only the required parameter. Thus, using an encoding based on individual representation of pillar diameter and position necessarily doubles the chromosome length, but ensures locality of perturbations. This new chromosome design will be referred to as the $\mathbb{R}$-chromosome, and the previous integer-proxy design as the $\mathbb{Z}$-chromosome.

To implement the $\mathbb{R}$-chromosome, we use two floating point values to describe a single pillar in the chromosome: one for its diameter and one for its lateral position in the channel (see Fig. 3(b)). When the GA evaluates a population, the chromosome parameters are interpreted in real-time to translate from real-valued pillar configurations to their nearest possible design (per the supplied precomputed transition matrices). In order to give the pillar configurations equal likelihood of selection during random processes, the bounds and interpretation code are created based on the intervals in the sampling (see Figs. 3(c) and 3(d)). This has the added advantage of being able to choose from arbitrary micropillar configurations and refine the set of transition matrices available to the GA without altering the chromosome. For example, some number of generations in a GA search could have a small, coarsely discretized selection of micropillar diameters and locations. Then, the GA can have an on-the-fly switch to a larger selection of pillar configurations with new chromosome bounds, possibly biased toward currently favored pillar sizes and locations.

### Optimized, Open-Source GA Framework.

Our long-term intention is to provide a lightweight, platform-agnostic, and freely available design framework that will enable wider utility by the microfluidics community. To accomplish this, we constructed a dependency-free, open-source GA application for pillar programing from the ground up in c++. Currently, the application runs on Windows, Linux, and OS X platforms, with an intuitive GUI designed using Python's Tkinter (see Fig. 4(a) for an image of the GUI). Although it is not necessary for use, the GUI allows users to draw their own fluid flow shape targets, load images, set GA parameters, and receive an output microfluidic device design without learning the command line interface. The application is available on the website.^{3}

The codebase contains a fundamental GA class that provides basic functionality, such as selection and evaluation methods and the general logic for a GA. Top level applications that use the GA can be added in modularly with minimal need for altering the framework. Moreover, almost every part of the algorithm can be customized. Users can easily modify the process of the GA to include more advanced features, such as thread-parallel evaluation of population members, or customized selection methods.

At the core of our GA framework is a single class called Solver. The Solver contains all functions and information needed to run the GA (see Fig. 4(b) for the class hierarchy). The Solver contains a configuration map which contains the settings for the GA. This includes which methods to use for selection/crossover, population size, fitness function evaluation, and other custom settings. If no methods are specified, the framework will fill in default methods. There are three basic methods used by the GA: selection, crossover, and mutation. The default selection method is tournament, which chooses parents for crossover by selecting the most-fit out of *k* random population members. The crossover method is required to be implemented by the user. This is done purposefully to provide greater flexibility in how the chromosome is represented. Finally, the default mutation rate of 5% randomly alters the population's chromosomes. All of these functions to rely to some degree on random information to be fed to the system. For this, a single instruction, multiple data (SIMD) oriented fast Mersenne twister (SFMT) library is included in the codebase to generate random data [19]. This generator has a very large period for random data, making the likelihood of repeat random information incredibly small.

The Chromosome interface is intended to be subclassed by client software implementing the required methods. That is, users should be able to encode their chromosome design into a custom subclass which will be evaluated by the Solver. A subclass of Chromosome and a provided allocation and crossover methods are the minimum required components needed to use the framework. For this application, the subclass of Chromosome is our implementation of a pillar sequence with an inlet.

### Using CUDA for the Fitness Function Evaluation.

Massive growth in GPU parallel computing capability in the last decade lends itself well to the execution of the GA. While the central processing unit (CPU) can perform arithmetic at high-speed (given appropriate problem size), it is still a serial process. Distributing the fitness function across multiple CPU cores will scale with the number of cores available to the user, which is typically below double digits for a normal user (without access to high performance computing resources). GPUs, on the other hand, have evolved to include many hundreds of arithmetic units, though they are each considerably slower than a modern CPU core. Smartly parallelizing a fitness function across a GPU's many processors can increase the optimization speed, which will have a greater impact as the GPU continues to add more processors with each hardware iteration. Furthermore, replacing an existing computer's GPU device is a far simpler and cost-effective upgrade than replacing the CPU, which often requires a complete hardware overhaul. By enabling computation of the fitness function on the GPU, we allow users multiple routes to fast design.

We leverage CUDA, NVIDIA's language for general-purpose computing on graphics processing units (GPGPU), to distribute our fitness function across many processors on consumer grade hardware (NVIDIA GTX 980). More specifically, we use cuSPARSE, a library within CUDA specifically designed and optimized to handle sparse-matrix operations on the GPU. The process of random initialization, selection, and evolution (crossover/mutation) is handled by a host (CPU) process written in c++, while the fitness function's sparse-matrix operations are handled by the GPU. The GA evaluates the fitness functions serially using cuSPARSE. In the future, we anticipate that the evaluations themselves be parallelized by the use of CUDA streams, which will allow multiple cuSPARSE calls to be active on the GPU at a single time. This will help realize the full potential of the GPU, as fitness functions can be easily evaluated in parallel. Moving these operations to the GPU should also see improvement to fitness function runtime with subsequent generations of NVIDIA hardware.

## Results and Discussion

Results are shown using six different test cases which are designed to demonstrate the speed and scope of the developed framework, along with comparisons with the baseline framework.

*Case 1: Transition matrix resolution.*Compares the use of different resolution transition matrices to the baseline resolution (*N*= 801 and_{Y}*N*= 101) in terms of speed and accuracy._{Z}*Case 2: Multiresolution GA.*Tests the multiresolution fitness function, which switches matrix resolution after 50 generations in the GA, for GA performance and accuracy.*Case 3:*$\mathbb{R}$*-chromosome versus*$\mathbb{Z}$*-chromosome.*Tests the $\mathbb{R}$-chromosome against the $\mathbb{Z}$-chromosome for GA performance and accuracy.*Case 4: Inlet design.*Uses the $\mathbb{R}$-chromosome with multi-inlet target images to test the GA's ability to tune the inlet to match target flow shapes of arbitrary inlet design.*Case 5: Custom GA.*Compares the low dependency, freely available GA created for this work against the matlab's GA.*Case 6: Demonstration of the modified GA.*Uses all modifications of the GA for complex fluid flow shapes.

For these studies, target images have been generated using precomputed transition matrices making an exact match in pillar sequence possible. The images have each been blurred and thresholded into binary images, giving them slightly larger area, more contiguous shapes, and smoother edges. This was done to simulate the nature of user-supplied designs. The fitness value, *f*, will tend to vary within the range of 15–45 for a successful result. Figure 5 shows examples of the randomly generated target images with corresponding solutions from the GA, along with their fitness value. A fitness value of *f* = 16.53, as seen in the first set of images in Fig. 5, shows an excellent agreement with the target image. But even twice this value of *f* = 33.38 shows a decent result, with most of the discrepancies lying with high-level features that would be unlikely to make a difference in fabricated devices.

### Case 1: Transition Matrix Resolution.

*N*= 801 and

_{Y}*N*= 101. A set of target flow shape images were created using quasi-random Sobol sequences for the pillar designs. Ten fluid flow shapes were selected from five to ten pillar sequence lengths (ten from five-pillar sequences, an additional ten from six-pillar sequences, etc.), for a total of 60 images of increasing complexity. A single inlet flow configuration of [0, 0, 1, 0, 0] (corresponding to the chromosome design) was used in order to allow for comparisons to the previous GA framework, which had no capability to tune inlet design. The target images were searched for using the transition matrices of resolution

_{Z}*N*= [101:100:801]. The microchannel aspect ratio was fixed at $h/w=1/4$, and since the simulation data are restricted to the top-half of the microchannel (due to symmetry in the microchannel and pillar geometry), $NZ=\u2308NY8\u2309$. The baseline and updated frameworks in case 1 both used the matlab's built-in GA. In all searches, a population size of 100 and generation limit of 200 were enforced, the GA swept five to ten pillar sequences, and repeated each sequence size ten times (for a total of 60 GA searches per image; per Algorithm 1,

_{Y}*N*

_{P}_{,start}= 5,

*N*

_{P}_{,stop}= 10, and

*N*

_{GA}= 10). To compare the GA accuracy to the baseline streamtrace resolution of

*N*= 801, we define a relative fitness

_{Y}*f*

_{R}The results are shown in Fig. 6. Figure 6(a) shows how changing transition matrix resolution affects the GA framework accuracy as measured relative to the baseline, as measured by *f _{R}* (Eq. (5)). We hypothesize that despite a considerable loss in accuracy (as much as 50% for

*N*= 101, see Fig. 6(a)), the smaller matrices can still be useful in a multiresolution fitness function (see earlier discussion). We validate this hypothesis in case 2.

_{Y}A GA framework runtime comparison is shown in Fig. 6(b). Resolutions of *N _{Y}* = 701 and

*N*= 601 have similar accuracy to the baseline of

_{Y}*N*= 801, but

_{Y}*N*= 601 has a runtime improvement of 50% over the baseline. Thus, going forward, the GA framework for pillar programming can cut computational effort in half by using

_{Y}*N*= 601, without an effective loss in accuracy to the baseline. Accuracy within 10% of the baseline is still obtainable in utilizing a streamtrace resolution of

_{Y}*N*= 401, for which the runtime reduces by a factor of four.

_{Y}### Case 2: Multiresolution GA.

This study tested the multiresolution fitness function as outlined in Algorithm 2, using $NY(MH)=601$ and $NY(ML)=51$ (not shown in case 1). The parameter Gen_{switch} = 50, which matches the stall generation limit (a termination condition). The results are shown in Fig. 7. As mentioned previously, the choice of Gen_{switch}, *N _{Y}* (

**M**

*), and*

_{L}*N*(

_{Y}**M**

*) lends itself to another study entirely, with a parameter study and adaptive switching being likely directions. The results for this particular study show that the multiresolution GA is a powerful method for accelerating convergence without sacrificing the accuracy.*

_{H}Coupled with the results from case 1, we can now give users several options weighing runtime against precision with respect to fine-scale flow shape features. On the one hand, using the transition matrices only of *N _{Y}* = 401 results in a fast but possibly less accurate result. On the other hand, using

*N*= 601 will give more accurate results to the target flow shape, but at twice the execution time. Using the multiresolution matrices offers a middle ground between the two and could even be used to accelerate the

_{Y}*N*= 401 resolution further. We anticipate offering these choices to users in future releases of the custom c++ framework.

_{Y}### Case 3: ℝ-Chromosome Versus ℤ-Chromosome.

The same set of 60 target flow shape images from case 1 was used to compare the $\mathbb{Z}$-chromosome to the $\mathbb{R}$-chromosome. From Section, the $\mathbb{Z}$-chromosome used a set of integer values as indices in a look-up table for each pillar configuration, which leads to highly nonlinear design space due to the jumps that will occur with changing pillar diameters or locations. The $\mathbb{R}$-chromosome, on the other hand, splits each pillar in the chromosome design into two real values: one each for diameter and location. We hypothesized that this will translate into faster GA convergence due to more smoothly varying neighborhoods in the input space. For this study, the baseline transition matrix resolution of *N _{Y}* = 801 and

*N*= 101 was used for both chromosome types. The inlet was fixed for both chromosomes, meaning the GA need only design the pillar sequence. The results are shown in Fig. 8. Despite the increased chromosome size and complexity (which can lead to a larger number of generations needed for convergence), there is no meaningful change in search capability per target image (Fig. 8(a)). Use of the $\mathbb{R}$-chromosome shows vast improvement in GA convergence (Fig. 8(b)). While the previously used $\mathbb{Z}$-chromosome repeatedly ran into the hard limit of 200 generations, the $\mathbb{R}$-chromosome converged by 88 generations on average, with none of the GAs requiring more than 100 generations. This reduces the runtime by nearly 50% (Fig. 8(c)). In fact, the distribution of generations required for the convergence shifts from skewed shape to a more normal distribution. Overall, these results fall in line with our hypothesis of a faster GA by using a more physically modeled chromosome.

_{Z}### Case 4: Arbitrary Inlet Design.

To test for arbitrary inlet design, a new set of 60 target flow shape images were generated with the same scheme as case 1, except the Sobol sequences now include values for a five-channel inlet design. Therefore, these target flow shapes have varying amounts of fluid and locations of the target flow shape. This is meant to test the GA framework's ability to design the inlet configurations. Here, the baseline framework has no means of effective comparison, as each image would require multiple user-guided GA searches. The results are in Fig. 9, showing fitness values for each target image in Fig. 9(a) and examples of the best and worst fitness in Figs. 9(b) and 9(c). Allowing the GA to design the inlet for the microfluidic device in this manner is highly successful, with even the worst reported fitness resulting in an overall viable design (*f* = 45.56, Fig. 9(c)). However, the increased complexity of the design leads to longer framework runtime, as seen in case 5 (Fig. 10). The median runtime of approximately 3600 s for a fixed inlet (as in case 2, Fig. 8) increases by nearly 40% to roughly 5000 s. Nonetheless, given the inlet design iterations that this new chromosome design avoids, using the GA to design the fluid flow inlet is by far the more desirable option.

### Case 5: Custom GA.

To compare our custom GA implementation against the matlab's GA, we use the test images from case 1. Each framework used the $\mathbb{R}$-chromosome, with the inlet being designed by the GA. The custom GA performed fitness function calculation using CUDA on an NVIDIA GTX 980, while matlab used an Intel i7-4970 at 3.60 GHz. We compare the runtime and fitness values in Fig. 10. The fitness values for the custom framework match, and in some cases even improve upon the values found using the matlab's GA. We do not see an improvement in runtime by moving the fitness function to the GPU, which is not entirely unexpected. The matrices being used are very sparse (as many nonzero values as rows in the matrix), and their size does not make for particularly difficult computation on the CPU. Despite this, the GPU runtime is competitive with the (currently) modern CPU used here. Moreover, future development to the custom framework could enable multiple CUDA streams to distribute the fitness functions across many GPUs. This would allow for generation of fitness functions to be evaluated simultaneously, which would accelerate the framework and make larger population sizes more practical.

### Case 6: Demonstration of the Modified GA.

In case 6, letters from the English alphabet were used as targets. Unlike all other test images used in this work, the letters were hand-drawn, making their existence in the design space completely unknown. Due to the top–bottom symmetry in sculpted flow shapes, we are only able to design for symmetric letters. Since the flow physics are entirely contained within the transition matrices, future optimization problems could incorporate new geometry that allows for top–bottom asymmetry, such as half-pillars, or “steps” in the microchannel. Here, we drew the capitalized four letters in “IOWA,” choosing the axis of symmetry in the microchannel to align with similar axes in the letters. The images were resized for three different microchannel aspect ratios: *h*/*w* = [0.25, 0.5, 1.0], where *h* is the height of the microchannel. In searching across microchannel aspect ratios, we expect the designed shapes to have considerably different styles of deformation. Amini et al. [1] established that changing the microchannel aspect ratio will introduce new modes of fluid deformation, with some modes being unique to particular aspect ratios. The GA searches used nine channels for *h*/*w* = 1.0 and seven channels for *h*/*w* = [0.25, 0.5]. The results for *h*/*w* = 0.25 are in Fig. 11, and the results for *h*/*w* = 0.5 and 1.0 are in Appendix B (Fig. 13). Matching fluid flow shapes are found for each letter, despite the unique flow deformations available across different channel aspect ratios. This is a highly useful result, as manual design would require memorization of the various modes for each aspect ratio in addition to combinatorial difficulty. All results show that while the framework is effective in designing for bulk fluid displacement and overall shape, subtle features are still difficult to optimize for. As an example, the hollow portion of the letter “A” is never truly found, though the outline of the letter does not seem difficult. This indicates room for improvement on the fitness function itself, which remains a simple correlation function. Promising avenues include using elliptic Fourier representations of shape, wavelet representation, or using methods from image classification like scale-invariant feature transforms.

## Conclusion

The concepts of inertial microfluidics and flow sculpting via pillar programming are increasingly being used by the microfluidics community for a diverse set of technologically and socially relevant applications. However, the manual exploration and design of pillar sequences have proven infeasible for generating complex flow transformations necessary for these applications. We formulate the flow sculpting problem as an optimization problem to enable automated design of pillar sequence design that results in a desired fluid flow transformation. In this work, we developed an automated optimization framework that enables robust and efficient lab-on-a-chip design. This provides a valuable design resource to the microfluidics community to leverage new understanding of inertial fluid flow.

We detail four computational modifications that improved the computational efficiency and functionality of the optimization framework. This includes a parameter encoding (i.e., chromosome) that respects parameter locality ensuring fast convergence, a multiresolution approach that accelerates convergence while maintaining accuracy, extending the chromosome to include inlet design, and enabling the use of GPU architecture via NVIDIA's CUDA for function evaluations. We package this framework into a user-friendly, freely available software suite that enables the larger microfluidics community to utilize these developments. We showcase the computational improvements using a series of case studies and conclude with nontrivial designs.

Future avenues of research include designing more sophisticated fitness functionals. For example, users may not have a preference for location, size, or orientation of the sculpted fluid flow shape. This suggests a fitness function that is invariant to translation, scale, or rotation. In addition, the use of simple transition matrices makes the addition of user-created 2D advection maps trivial. As long as new transition matrices can be effectively represented in the chromosome, entirely new microchannel geometry (e.g., half-pillars, steps, and curved regions) can be easily added to the GA toolkit for microfluidic device design. In summary, the framework we have developed not only supplants the previous design optimization efforts but also provides a roadmap for improvements to speed and functionality for future flow sculpting applications.

## Acknowledgment

The authors gratefully thank the Di Carlo group for their valuable suggestions in this work. This research is supported in part by the National Science Foundation (NSF) through NSF-1306866 and NSF-1149365.

### Appendix A: Transition Matrices

Transition matrices show how particles move from inlet to outlet. For this work, the streamtraces that form transition matrices are performed using reverse advection. This is done to avoid an issue with forward advection, which may displace multiple fluid elements to the same location (column) in the transition matrix. Such a matrix has “blank” elements at the outlet, where there is no apparent fluid. Reverse advection ensures a uniform and complete distribution of displacement information at the outlet, which translates to a more contiguous and accurate flow shape image.

### Appendix B: Additional Results From Case 6

Stoecklein et al. [6] discussed the rationale of using a metaheuristic, evolutionary approach to this problem.

www.me.iastate.edu/bglab/software