**Kristina Lerman**

**USC Information Sciences Institute**

Theory of stochastic processes grew out of thermodynamics and probability theory. Thermodynamics, developed during the 1700’s and early 1800’s, consisted of empirical laws derived from a large body of experimental data. It was not until mid-1800’s that scientists hypothesized a generative model – called kinetic theory of gases. The main advance was an observation that bulk properties of matter (such as the thermodynamic laws for gases) arose from the dynamics of its constituent elements (gas molecules). This was also the time that probabilistic methods were first used to calculate gas properties. Kinetic theory of gases matured into statistical mechanics in the late 1800’s-early 1900’s. Along with quantum theory, the theory of stochastic processes is the mathematical foundation of modern physics.

We will apply the theory of stochastic processes to study the collective behavior of groups of robots. The main advantage of using stochastic processes is captured in the following observation (Van Kampen, 1992):

certain averaged features vary in a regular way, which can be described by simple laws."

An individual
robot’s behavior is subject to external forces that cannot be anticipated;
noise and fluctuations in the environment; other robots with equally complex
trajectories, making it difficult to predict which robots will interact.
Randomness can even be programmed directly into robot’s behavior rules: for example, during collision avoidance a
robot turns a random angle before moving forward. In short, ** an individual robot’s behavior is so complex
and unpredictable, it can be viewed as a stochastic
process**.

Although the
behavior of an individual robot is unpredictable, group behavior is often
described by simple rules.

Reactive robot is a
minimalist robot in which perception and action are tightly coupled. Reactive
robot makes decision about what action to take based on the action it is
currently executing and input from sensors. Therefore, a reactive robot can be
described as a Markov process. A Markov process is a stochastic process that
takes values *n _{0}*,

**Stochastic Master Equation** (SME) describes the dynamics of a stochastic Markov process. It can be
derived by examining how *p(n,t)* – the
probability a process takes a value *n*
at time *t* – changes in time. Dynamics
of the collective state – probability density are described by the collective
Stochastic Master Equation. In most cases, however, the probability of the
collective state is difficult to specify. Instead, we assume that the system is
composed of *independent* and *indistinguishable* stochastic processes,
and average the single process SME over the entire ensemble. What we get is the
**Rate Equation**

The Rate Equation
describes how the ** average number**
of processes taking value

A Markov process
with a finite set of values can be represented as a Finite State Automaton
(FSA). For a reactive robot, this FSA can be thought of as the controller.
Consider the simplified foraging scenario below. The robot starts searching for
pucks, if it detects a puck, it transition to pickup state, and after it has
finished picking up the puck, it starts homing, or moving to a “home” location,
where it deposits the puck and goes back to searching.

The FSA of the controller
also describes the collective state of the system – where each state now
represents the average number of robots executing that action or behavior.
Moreover, the FSA serves as a blueprint for writing the equations describing
the dynamics of the system: each state becomes a dynamic variable, and each
arrow a term in the equations – positive for incoming and negative for
outgoing.

Foraging can be accomplished by a single robot or a group of robots. Benefits of group foraging include the fact that group is robust to individual’s failure; group can speed up collection by working in parallel. Disadvantages of group foraging include increased interference due to collision avoidance maneuvers, which take time to execute. Naturally the question arises if there is some optimal group size,

beyond which interference outweighs
the benefits of the group’s increased robustness and parallelism. We
constructed a mathematical model for foraging and validated results in the
Player/Stage sensor-based simulator (Lerman and Galstyan, *Adaptive Robots*, 2002).

For other applications, as well as general survey, please
see Lerman et al, *ALife*, 2001;
Lerman, Martinoli and Galstyan, *Swarm
Robotics Workshop*, 2004.

The methodology presented above does not apply to adaptive or learning robots, or robots interacting with spatially non-uniform processes. We have extended our theory to cover these cases.

A generalized Markov process is a stochastic process whose
value at time t+1 depends not only on its value at time t, but also on its
values at time t-1… t-m. We call these past states collectively, the history of
length *m* of the process. Similarly to
the previous treatment, we can derive a SME that describes how an individual
generalized Markov process changes in time, and also the Rate Equation
describing collective dynamics of the ensemble (Lerman and Galstyan, *IROS*, 2003; Lerman, *Adaptive Behavior*, 2005).

Note, that it is the same as before, except that transition
rates, *w*, are now averaged over all
histories.

Consider a multi-foraging scenario, where there are Red and
Green pucks, and each robot can collect only a single type of puck at one time.
Suppose we want the fraction of robots going after Red pucks be
the same as the fraction of Red pucks in the arena. Furthermore, we want to *dynamically achieve* an appropriate
division of labor, even as the distribution of pucks changes. One mechanism for
accomplishing this is for the robot to make local observations of pucks (and
maybe tasks other robots are engaged in), estimate the global distribution of
tasks and jobs and decide whether to go after Red or Green pucks based on these
observations. Such a mechanism can be described as a generalized Markov
process, with *m* the number of
observations robots make.

We have constructed a mathematical model of task allocation.
Figure below presents results of the model, as well as simulations performed in
Gazebo, a 3D physically realistic world with full dynamics (Lerman et al, *Int. J. of Robotics Research*, 2006).

Each plot is for a different number of observations robots are considering in making decision to switch states. The red line represents the fraction of Red pucks. It starts at 30%, jumps to 80% at t=500, then back to 50% at t=2000. The green line shows the fraction of robots in the simulations that are engaged in the Red foraging task. You can see that it follows the puck distribution with some adaptation lag. Also, for longer histories (memories), there are oscillations, which eventually die away as the system converges to a steady state. The blue line represents predictions of the mathematical model for the set of experimental parameters. The model reproduces experimental observations very well.

In addition to memory, we have considered other extension of theory. Specifically, we can now spatially extended systems, where transition rates have a spatial dependence. This mathematical formulation is necessary to study systems with significant spatial variation, for example, self-reconfigurable robots made of many smaller robots (“modules”) that use spatial fields defined by modules for local control

e.g., to adjust shape to move
through pipes. It also allows us to consider interesting applications involve
environments with spatial gradients; e.g., medical nano-robots responding to
injury by following chemical gradients to injury site (Galstyan, Hogg and Lerman., *Swarm
Intelligence Symposium*, 2005).

*** This work is supported in part
by in part by DARPA under contract number F30602-00-2-0573 and in part by the
NSF under grants No. 0074790 and IIS-0413321. **

**Analyzing Swarms: A Stochastic Processes Approach to Studying Robotic Swarm Behavior**:
a tutorial presented with Tad Hogg (HP Labs) at the 2005 Swarm Intelligence Symposium.

Please see publications for links to papers.

Copyright USC 2001-2006