#### DMCA

## On Periodic Reference Tracking Using Batch-Mode Reinforcement Learning with Application to Gene Regulatory Network Control

### Citations

5599 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ... example, gene 2 is repressed by gene 1. near-optimal policies from one-step system transitions is the central question of batch-mode reinforcement learning [2], [3]. Note that reinforcement learning =-=[4]-=- focuses on a more general problem of policy inference from interactions with the system. Usually, in batch-mode reinforcement learning, very few assumptions are made on the structure of the control p... |

566 |
A synthetic oscillatory network of transcriptional regulators
- Elowitz, Leibler
(Show Context)
Citation Context ...eference tracking problem. In our extension, we explicitly take into account the future reference trajectory values and the period T . Moreover, regression is separated into T independent regression problems, which can save computational effort. The algorithm is implemented in Python and is available at www3.imperial.ac.uk/people/a.sootla. To illustrate our reference tracking method, we consider its application to synthetic biology gene regulatory networks, specifically to a network called a generalised repressilator. The repressilator is a ring of three mutually repressing genes pioneered in [9], and theoretically generalised to rings consisting of more than three genes in [10]. Repression is an interaction between two genes, such that the protein product of the repressing gene prevents protein expression of the 52nd IEEE Conference on Decision and Control December 10-13, 2013. Florence, Italy 978-1-4673-5716-6/13/$31.00 ©2013 IEEE 4086 repressed gene. Or simply put, where one gene can turn off the other. A generalised repressilator with a sufficiently large, even number of genes (such as the four-gene ring in Figure 1) can exhibit decaying but very long-lived oscillations [11]. The ... |

267 | Extremely randomized trees.
- Geurts, Ernst, et al.
- 2006
(Show Context)
Citation Context ...l) + γmax u∈U Q̂0(n + l ,u). (4) This expression gives Q̂1 only for nl, ul in F , while the entire function Q̂1(·, ·) is estimated using regression (e.g., EXTremly RAndomized or EXTRA Trees algorithm =-=[13]-=-). Now the iterative procedure is derived by generalising (4) as follows: Q̂k(nl,ul) = r(nl,ul) + γmax u∈U Q̂k−1(n+l ,u), (5) where at each step Q̂k(·, ·) is estimated using regression. If the iterati... |

224 | Tree-based batch mode reinforcement learning.
- Ernst, Geurts, et al.
- 2005
(Show Context)
Citation Context ...plines such as engineering [5], HIV treatment design [6], and medicine [7]. In this paper, one batch-mode reinforcement learning algorithm is considered, namely the Fitted Q Iteration (FQI) algorithm =-=[8]-=-. This algorithm focuses on the computation of a so called Q function, which is used to compute a near-optimal control policy. The algorithm computes the Q function using an iterative procedure based ... |

208 | Scikit-learn: Machine learning in Python.
- Pedregosa, Varoquaux, et al.
- 2011
(Show Context)
Citation Context ...most 300 samples. These state transitions are then gathered in the set F . The discount factor γ is equal to 0.75, the choice of which is guided by considerations similar to the ones in [8]. The stopping criterion is simply a bound on the number of iterations, which for the purpose of this paper is 30. At every iteration every Qvi function is approximated using EXTRA Trees, which was shown to be an effective regression algorithm for the FQI framework [13]. The parameters for the algorithm are set to the default values from [8]. The algorithm is implemented in Python using the machine learning [19], parallelisation [20], graphics [21] and scientific computation [22] toolboxes. C. Results One sinusoidal reference trajectory, different periods. In this example, we are going to force the concentration of the protein 2 to track a sinusoid with different periods. Here α = 0 and the sinusoid is chosen to resemble the natural oscillations in terms of amplitude and offset from zero: d2t = 8 + 7 · sin(Tt/(2π)). We test the algorithm for the following periods T = 50, 150, 250. We can increase the concentration of the protein 2 directly through application of the control signal u2. We can also dec... |

201 |
et al.: SciPy: open source scientific tools for Python
- Jones, Oliphant, et al.
- 2009
(Show Context)
Citation Context ...meters for the algorithm are set to the default values from [8]. The algorithm is implemented in Python using the machine learning [19], parallelisation [20], graphics [21] and scientific computation =-=[22]-=- toolboxes. C. Results One sinusoidal reference trajectory, different periods. In this example, we are going to force the concentration of the protein 2 to track a sinusoid with different periods. Her... |

200 |
SciPy: Open source scientific tools for Python,
- Jones, Oliphant, et al.
- 2001
(Show Context)
Citation Context ...t F . The discount factor γ is equal to 0.75, the choice of which is guided by considerations similar to the ones in [8]. The stopping criterion is simply a bound on the number of iterations, which for the purpose of this paper is 30. At every iteration every Qvi function is approximated using EXTRA Trees, which was shown to be an effective regression algorithm for the FQI framework [13]. The parameters for the algorithm are set to the default values from [8]. The algorithm is implemented in Python using the machine learning [19], parallelisation [20], graphics [21] and scientific computation [22] toolboxes. C. Results One sinusoidal reference trajectory, different periods. In this example, we are going to force the concentration of the protein 2 to track a sinusoid with different periods. Here α = 0 and the sinusoid is chosen to resemble the natural oscillations in terms of amplitude and offset from zero: d2t = 8 + 7 · sin(Tt/(2π)). We test the algorithm for the following periods T = 50, 150, 250. We can increase the concentration of the protein 2 directly through application of the control signal u2. We can also decrease the concentration of the protein 2 through increasing the conce... |

150 | Kernel-based reinforcement learning
- Ormoneit, Sen
- 1999
(Show Context)
Citation Context ...nce of Q̂k to a fixed stateaction value function Q̂∗ given a fixed set F as k grows to infinity. A sufficient condition for convergence is the use of kernel-based approximators on the regression step =-=[14]-=-. Note that to ensure convergence using the EXTRA Trees algorithm, it is required to fix the structure of the trees, i.e., the kernel representation of the Q̂ function approximation must not change fr... |

128 |
Matplotlib: a 2D graphics environment
- Hunter
- 2007
(Show Context)
Citation Context ...ions are then gathered in the set F . The discount factor γ is equal to 0.75, the choice of which is guided by considerations similar to the ones in [8]. The stopping criterion is simply a bound on the number of iterations, which for the purpose of this paper is 30. At every iteration every Qvi function is approximated using EXTRA Trees, which was shown to be an effective regression algorithm for the FQI framework [13]. The parameters for the algorithm are set to the default values from [8]. The algorithm is implemented in Python using the machine learning [19], parallelisation [20], graphics [21] and scientific computation [22] toolboxes. C. Results One sinusoidal reference trajectory, different periods. In this example, we are going to force the concentration of the protein 2 to track a sinusoid with different periods. Here α = 0 and the sinusoid is chosen to resemble the natural oscillations in terms of amplitude and offset from zero: d2t = 8 + 7 · sin(Tt/(2π)). We test the algorithm for the following periods T = 50, 150, 250. We can increase the concentration of the protein 2 directly through application of the control signal u2. We can also decrease the concentration of the protei... |

104 |
Matplotlib: a 2d graphics environment, Computing
- Hunter
- 2007
(Show Context)
Citation Context ...the FQI framework [13]. The parameters for the algorithm are set to the default values from [8]. The algorithm is implemented in Python using the machine learning [19], parallelisation [20], graphics =-=[21]-=- and scientific computation [22] toolboxes. C. Results One sinusoidal reference trajectory, different periods. In this example, we are going to force the concentration of the protein 2 to track a sinu... |

98 |
Repetitive control system: A new type servo system for periodic exogenous signals
- Hara, Yamammoto, et al.
- 1988
(Show Context)
Citation Context ...ene regulatory networks I. INTRODUCTION There are many problems in engineering, which require forcing the system to follow a desired periodic reference trajectory (e.g., in repetitive control systems =-=[1]-=-). One approach to solve these problems is to define first a distance between the state of the system nt and the desired reference point dt at time t. The next step is to seek for a control policy tha... |

94 |
Stochastic protein expression in individual cells at the single molecule level.
- Cai, Friedman, et al.
- 2006
(Show Context)
Citation Context ...protein concentration produced by the expression of gene 2. The oscillations have a period of approximately 150 arbitrary time units but are, however, not stable. for instance via fluorescent markers =-=[15]-=-, [16] (e.g., green fluorescent protein, GFP or red fluorescent protein, mCherry). On the other hand, we assume that the control inputs are implemented as light pulses of specific non-overlapping wave... |

91 |
Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Pr I Llc,
- Busoniu, Babuska, et al.
- 2010
(Show Context)
Citation Context ...e gene product onto the other one. For example, gene 2 is repressed by gene 1. near-optimal policies from one-step system transitions is the central question of batch-mode reinforcement learning [2], =-=[3]-=-. Note that reinforcement learning [4] focuses on a more general problem of policy inference from interactions with the system. Usually, in batch-mode reinforcement learning, very few assumptions are ... |

87 | Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method,”
- Riedmiller
- 2005
(Show Context)
Citation Context ...ment learning algorithms a high degree of flexibility in comparison with other control methods. Batch-mode reinforcement learning has had numerous applications in many disciplines such as engineering =-=[5]-=-, HIV treatment design [6], and medicine [7]. In this paper, one batch-mode reinforcement learning algorithm is considered, namely the Fitted Q Iteration (FQI) algorithm [8]. This algorithm focuses on... |

87 | Optimal dynamic treatment regimes,”
- Murphy
- 2003
(Show Context)
Citation Context ...exibility in comparison with other control methods. Batch-mode reinforcement learning has had numerous applications in many disciplines such as engineering [5], HIV treatment design [6], and medicine =-=[7]-=-. In this paper, one batch-mode reinforcement learning algorithm is considered, namely the Fitted Q Iteration (FQI) algorithm [8]. This algorithm focuses on the computation of a so called Q function, ... |

48 |
Dubourg et al., “Scikit-learn: Machine learning in python,” The
- Pedregosa, Varoquaux, et al.
- 2011
(Show Context)
Citation Context ...n effective regression algorithm for the FQI framework [13]. The parameters for the algorithm are set to the default values from [8]. The algorithm is implemented in Python using the machine learning =-=[19]-=-, parallelisation [20], graphics [21] and scientific computation [22] toolboxes. C. Results One sinusoidal reference trajectory, different periods. In this example, we are going to force the concentra... |

46 |
Microfluidic devices for measuring gene network dynamics in single cells
- Bennett, Hasty
(Show Context)
Citation Context ...n concentration produced by the expression of gene 2. The oscillations have a period of approximately 150 arbitrary time units but are, however, not stable. for instance via fluorescent markers [15], =-=[16]-=- (e.g., green fluorescent protein, GFP or red fluorescent protein, mCherry). On the other hand, we assume that the control inputs are implemented as light pulses of specific non-overlapping wavelength... |

34 |
Oscillations and multiple steady states in a cyclic gene model with repression,
- Smith
- 1987
(Show Context)
Citation Context ...work called a generalised repressilator. The repressilator is a ring of three mutually repressing genes pioneered in [9], and theoretically generalised to rings consisting of more than three genes in =-=[10]-=-. Repression is an interaction between two genes, such that the protein product of the repressing gene prevents protein expression of the repressed gene. Or simply put, where one gene can turn off the... |

33 |
Spatiotemporal control of cell signalling using a light-switchable protein interaction.
- Levskaya, Weiner, et al.
- 2009
(Show Context)
Citation Context ...hand, we assume that the control inputs are implemented as light pulses of specific non-overlapping wavelengths activating a photo-sensitive promoter controlling the expression of genes 1 and 2 [17], =-=[18]-=-. When a photosensitive promoter is activated through a light pulse the concentration of the gene product 1 (or 2) is increased by a small amount through the expression of gene 1 (or 2). The control s... |

31 |
A lightswitchable gene promoter system,”
- Shimizu-Sato, Huq, et al.
- 2002
(Show Context)
Citation Context ...other hand, we assume that the control inputs are implemented as light pulses of specific non-overlapping wavelengths activating a photo-sensitive promoter controlling the expression of genes 1 and 2 =-=[17]-=-, [18]. When a photosensitive promoter is activated through a light pulse the concentration of the gene product 1 (or 2) is increased by a small amount through the expression of gene 1 (or 2). The con... |

10 | Batch mode reinforcement learning based on the synthesis of artificial trajectories.
- Fonteneau, Murphy, et al.
- 2013
(Show Context)
Citation Context ...om one gene product onto the other one. For example, gene 2 is repressed by gene 1. near-optimal policies from one-step system transitions is the central question of batch-mode reinforcement learning =-=[2]-=-, [3]. Note that reinforcement learning [4] focuses on a more general problem of policy inference from interactions with the system. Usually, in batch-mode reinforcement learning, very few assumptions... |

8 |
Modelling the influence of activationinduced apoptosis of CD4+ and CD8+ T-cells on the immune system response of a HIV-infected patient,”
- Stan, Belmudes, et al.
- 2008
(Show Context)
Citation Context ... high degree of flexibility in comparison with other control methods. Batch-mode reinforcement learning has had numerous applications in many disciplines such as engineering [5], HIV treatment design =-=[6]-=-, and medicine [7]. In this paper, one batch-mode reinforcement learning algorithm is considered, namely the Fitted Q Iteration (FQI) algorithm [8]. This algorithm focuses on the computation of a so c... |

7 | Switchable genetic oscillator operating in quasi-stable mode,”
- Strelkowa, Barahona
- 2010
(Show Context)
Citation Context ...gene can turn off the other. A generalised repressilator with a sufficiently large, even number of genes (such as the four-gene ring in Figure 1) can exhibit decaying but very long-lived oscillations =-=[11]-=-. The objective of this paper is to force one or several protein concentrations to follow a priori defined reference trajectories, namely: sinusoids and ramps. A deeper background on synthetic biology... |

3 | Toggling the genetic switch using reinforcement learning,”
- Sootla, Strelkowa, et al.
- 2013
(Show Context)
Citation Context ... or several protein concentrations to follow a priori defined reference trajectories, namely: sinusoids and ramps. A deeper background on synthetic biology and its control applications is provided in =-=[12]-=-, where we consider a regulation problem of a toggle switch system. The paper is organised as follows. Section II covers the FQI Algorithm. In Section III, an extension of FQI to the reference trackin... |