USC Information Sciences Institute
Theory of stochastic processes grew out of thermodynamics and probability theory. Thermodynamics, developed during the 1700’s and early 1800’s, consisted of empirical laws derived from a large body of experimental data. It was not until mid-1800’s that scientists hypothesized a generative model – called kinetic theory of gases. The main advance was an observation that bulk properties of matter (such as the thermodynamic laws for gases) arose from the dynamics of its constituent elements (gas molecules). This was also the time that probabilistic methods were first used to calculate gas properties. Kinetic theory of gases matured into statistical mechanics in the late 1800’s-early 1900’s. Along with quantum theory, the theory of stochastic processes is the mathematical foundation of modern physics.
We will apply the theory of stochastic processes to study the collective behavior of groups of robots. The main advantage of using stochastic processes is captured in the following observation (Van Kampen, 1992):
An individual robot’s behavior is subject to external forces that cannot be anticipated; noise and fluctuations in the environment; other robots with equally complex trajectories, making it difficult to predict which robots will interact. Randomness can even be programmed directly into robot’s behavior rules: for example, during collision avoidance a robot turns a random angle before moving forward. In short, an individual robot’s behavior is so complex and unpredictable, it can be viewed as a stochastic process.
Although the behavior of an individual robot is unpredictable, group behavior is often described by simple rules.
Reactive robot is a minimalist robot in which perception and action are tightly coupled. Reactive robot makes decision about what action to take based on the action it is currently executing and input from sensors. Therefore, a reactive robot can be described as a Markov process. A Markov process is a stochastic process that takes values n0, n1, … at times t0, t1, and its value at time ti+1 depends only on its value at time ti and no other times.
Stochastic Master Equation (SME) describes the dynamics of a stochastic Markov process. It can be derived by examining how p(n,t) – the probability a process takes a value n at time t – changes in time. Dynamics of the collective state – probability density are described by the collective Stochastic Master Equation. In most cases, however, the probability of the collective state is difficult to specify. Instead, we assume that the system is composed of independent and indistinguishable stochastic processes, and average the single process SME over the entire ensemble. What we get is the Rate Equation
The Rate Equation describes how the average number of processes taking value n changes in time.
A Markov process with a finite set of values can be represented as a Finite State Automaton (FSA). For a reactive robot, this FSA can be thought of as the controller. Consider the simplified foraging scenario below. The robot starts searching for pucks, if it detects a puck, it transition to pickup state, and after it has finished picking up the puck, it starts homing, or moving to a “home” location, where it deposits the puck and goes back to searching.
The FSA of the controller also describes the collective state of the system – where each state now represents the average number of robots executing that action or behavior. Moreover, the FSA serves as a blueprint for writing the equations describing the dynamics of the system: each state becomes a dynamic variable, and each arrow a term in the equations – positive for incoming and negative for outgoing.
Foraging can be accomplished by a single robot or a group of robots. Benefits of group foraging include the fact that group is robust to individual’s failure; group can speed up collection by working in parallel. Disadvantages of group foraging include increased interference due to collision avoidance maneuvers, which take time to execute. Naturally the question arises if there is some optimal group size,
beyond which interference outweighs the benefits of the group’s increased robustness and parallelism. We constructed a mathematical model for foraging and validated results in the Player/Stage sensor-based simulator (Lerman and Galstyan, Adaptive Robots, 2002).
For other applications, as well as general survey, please see Lerman et al, ALife, 2001; Lerman, Martinoli and Galstyan, Swarm Robotics Workshop, 2004.
The methodology presented above does not apply to adaptive or learning robots, or robots interacting with spatially non-uniform processes. We have extended our theory to cover these cases.
A generalized Markov process is a stochastic process whose value at time t+1 depends not only on its value at time t, but also on its values at time t-1… t-m. We call these past states collectively, the history of length m of the process. Similarly to the previous treatment, we can derive a SME that describes how an individual generalized Markov process changes in time, and also the Rate Equation describing collective dynamics of the ensemble (Lerman and Galstyan, IROS, 2003; Lerman, Adaptive Behavior, 2005).
Note, that it is the same as before, except that transition rates, w, are now averaged over all histories.
Consider a multi-foraging scenario, where there are Red and Green pucks, and each robot can collect only a single type of puck at one time. Suppose we want the fraction of robots going after Red pucks be the same as the fraction of Red pucks in the arena. Furthermore, we want to dynamically achieve an appropriate division of labor, even as the distribution of pucks changes. One mechanism for accomplishing this is for the robot to make local observations of pucks (and maybe tasks other robots are engaged in), estimate the global distribution of tasks and jobs and decide whether to go after Red or Green pucks based on these observations. Such a mechanism can be described as a generalized Markov process, with m the number of observations robots make.
We have constructed a mathematical model of task allocation. Figure below presents results of the model, as well as simulations performed in Gazebo, a 3D physically realistic world with full dynamics (Lerman et al, Int. J. of Robotics Research, 2006).
Each plot is for a different number of observations robots are considering in making decision to switch states. The red line represents the fraction of Red pucks. It starts at 30%, jumps to 80% at t=500, then back to 50% at t=2000. The green line shows the fraction of robots in the simulations that are engaged in the Red foraging task. You can see that it follows the puck distribution with some adaptation lag. Also, for longer histories (memories), there are oscillations, which eventually die away as the system converges to a steady state. The blue line represents predictions of the mathematical model for the set of experimental parameters. The model reproduces experimental observations very well.
In addition to memory, we have considered other extension of theory. Specifically, we can now spatially extended systems, where transition rates have a spatial dependence. This mathematical formulation is necessary to study systems with significant spatial variation, for example, self-reconfigurable robots made of many smaller robots (“modules”) that use spatial fields defined by modules for local control
Copyright USC 2001-2006