Carlos Gershenson's homepage




3. Behaviour-Based Intelligence





"Before being humans, we are animals"





AhQueTiemposAquellos.jpg (222195 bytes)

"Ah, qué tiempos aquellos". Carlos Gershenson, Mexico City, 1999. Oil on canvas, 60 x 100 cm. Mata García collection.



In Chapter 1 we exposed the idea that intelligence might be perceived from the adaptiveness of the behaviour in an individual. For example, if an animal avoids successfully its predators, we will say that he behaved intelligently (at least, more intelligently than the eaten ones...). If a robot is capable of successfully navigating through crowded corridors, we will say that he also behaved intelligently.

In this chapter we expose first a brief review of action selection mechanisms and what are they. Then, we present the Behavioural Columns Architecture (1) (BeCA) (González, 2000), a behaviours production system (BPS) for AAAs inspired in ethology and implemented in a double blackboard architecture. We do this by first defining and describing Behaviours Production Systems and giving a brief description of the Blackboard Node Architecture. Next we introduce the elements of BeCA, in order to model in an evolutionary bottom-up fashion reflex, reactive, and motivated behaviours. Then we refine our BPS by implementing two learning schemes: associative learning, and a simple reinforcement learning of the motivation degree. Finally we describe the properties of BeCA.

BeCA was used to provide the control of the animats of our Behaviours Virtual Laboratory, presented in Chapter 5.



3.1. Action Selection Mechanisms



"Look to nature, and let simulated nature take its course"

--Andy Clark

An action selection mechanism (ASM) computes which action should be executed by a BBS in dependence of the internal state and the external perceptions of the agent controlled by the BBS.

The building of ASMs has two benefits, which feedback each other: the better understanding of adaptive behaviour (how animals are able to adapt to their environment), and the development of adaptive artificial creatures.

Reviews of ASMs can be found in (González, 2000) and (Tyrrell, 1993).

Here we present a brief review of works related to ASMs, taken from González et. al. (2000):





We can see that ASMs have been inspired in many different areas, and that they present many diverse properties. There has not been proposed a "best" ASM, since different systems have different requirements. We can say that each ASM is the best for what it was created for: for controlling an artificial creature in the context it was proposed.



3.2. Behaviours Production Systems



"It is the nature of the mind that makes individuals kin, and the differences in the shape, form, or manner of the material atoms out of whose intricate relationships that mind is built are altogether trivial"

--Isaac Asimov



Through this chapter, we will discuss and illustrate the building, in an evolutionary fashion, of a behaviours production system (BPS), that exhibits many of the principles and properties present in animal behaviour, following an evolutionary bottom-up approach. We define a BPS as a system that produces adaptive behaviours to control an autonomous agent. A BPS must solve the well known action selection problem (ASP), but it needs to be more than an action selection mechanism (ASM). A BPS is characterized by the following features: (1) adaptiveness to the environment (preprogrammed, learned, and/or evolved), (2) a set of autonomous and independent modules interacting among them, (3) behaviours are produced emergently through the interaction among the different modules that compose the system, also giving opportunity to other properties to emerge, (4) behaviour patterns emerge from the execution of simple behaviours through time, (5) new behaviours can be incorporated over the existing repertoire of behaviours, (6) new principles or properties to improve the behaviour production can be added taking into account the existing structure and functioning, and (7) several parameters regulate the behaviour production, and if they are fixed by an observer through an interface, the results that are originated of this adjustment can be observed (such as in a virtual laboratory). In this sense, the neuroconnector network of Halperin (Hallam, Halperin and Hallam, 1994) may be considered as an example of a BPS.

The behaviours production system presented here has been structured from a network of blackboard nodes (González and Negrete, 1997; Negrete and González, 1998). We believe that the blackboard architecture constitutes an ideal scenario for the implementation of behaviours production systems, due to its capacity of coordination and integration of many activities in real time. Also, it provides a great flexibility in the incorporation of new functionality, and it handles the action selection as knowledge selection in the solution of the problem. Another property of the blackboard architecture is the opportunism in the problem solving, which is a property of the behaviour production in animals desirable in autonomous systems.

The evolutionary bottom-up approach followed by us can be described in the following terms: we will first try to solve one problem, and once we have a BPS that solves this problem, we will strive to co-evolve the BPS alongside the problem as itself evolves and becomes more complex, but without losing the capabilities of solving the previous problem(s). In this way, and taking into account the scheme shown in Figure 1, we will first build a BPS for the problem of reflex behaviours, which constitutes an initial layer. Then, we will add a second layer to model reactive behaviours. Next, we will add another layer dealing with the problem of motivated behaviours, but without losing the functionality of the two previous ones. Finally, we will refine these layers, incorporating learning schemes to obtain a higher adaptiveness in the behaviour production.

With this work we have intended to reach two goals: (1) to map the main principles and properties that characterize animal behaviour onto a bottom-up, evolutionary construction of behaviour-based systems, and (2) to use the BPS to experiment with animal behaviour properties that this one is able to reproduce, also providing a better understanding of adaptive behaviour. This implies a journey, from biology to behaviour-based systems and back (Maes, 1991).



3.3. Blackboard Node Architecture



The concept of blackboard architecture (Nii, 1989; Engelmore, Morgan and Nii, 1988) was conceived by AI researchers in the 1970's. The goal of this architecture was to handle the problem of shared information among multiple expert agents involved in problem solving. The blackboard architecture was implemented for the first time in the language understanding system Hearsay II (Engelmore, Morgan and Nii, 1988), and later it has been used in a great variety of problem domains, and abstracted in many environments for systems implementation. Figure 6 shows the basic components of the blackboard architecture.





The behaviours production system presented here has been structured from a network of blackboard nodes (González and Negrete, 1997; Negrete and González, 1998). A blackboard node is a blackboard system integrated by the following components: (1) a set of independent modules called knowledge sources, which have specific knowledge about the problem domain; (2) the blackboard, a shared data structure through which the knowledge sources communicate with each other by means of the creation and modification of solution elements; (3) the communication mechanisms, which establish the interface between the nodes, and the interface between a given node and the external or internal media; and (4) a control mechanism, which determines the order in which the knowledge sources will operate on the blackboard.

The main characteristics exhibited by the blackboard architecture and desired in the implementation of behaviours production systems include the following: (1) a high capacity of coordination and integration of many activities in real time, (2) great flexibility in the incorporation of new functionality, (3) the handling of the action selection as knowledge selection in problem solving, and (4) the opportunism in problem solving. These characteristics support the evolutionary and bottom-up construction approach of our BPS discussed in the next sections.



3.4. Behavioural Columns Architecture: An Evolutionary Bottom-Up Approach



"Everything should be made as simple as possible, but not simpler"

--Albert Einstein



In this section we will present the basic components of our BPS, which we refer to as Behavioural Columns Architecture (BeCA) (González, 2000): the set of internal behaviours, the blackboards and their levels, the interface/communication mechanisms, the emergent behavioural columns, and the blackboard-nodes.

Internal behaviours are information processing mechanisms that operate within the BPS, whose function involves the creation, combination, and modification at different blackboard levels (2). An internal behaviour in BeCA is equivalent to a knowledge source in the blackboard node architecture or a hidden layer in an artificial neural network. Internal behaviours can also be seen as agents embedded within node agents, and composed of elementary agents. Internal behaviours are constituted by elementary behaviours, which can be seen as the rules that are packed in a knowledge source in the blackboard architecture, or an artificial neuron in a neural layer. An elementary behaviour has three elements: a list of parameters, a condition component, and an action component. The list of parameters specifies the condition elements, the action elements, and the coupling strengths related with the elementary behaviour. The condition of an elementary behaviour describes the configuration of signals that is necessary on the blackboard, so that the elementary behaviour contributes to the solution processes of the problem. The way in which an elementary behaviour contributes to the solution of the problem is specified in its action, which can consist in the creation or modification of solution elements in certain blackboard levels. A coupling strength is represented by a vector Fa = (Fai1, Fai2,..., Fain) of n real components, where each of these components represents the efficiency with which an elementary behaviour can satisfy a particular condition. Depending on the nature of the elementary behaviour, the components of the vector Fa may be of a fixed or modifiable value. The existence of modifiable coupling strengths is important because it allows the refinement of previously defined layers, incorporating learning schemes. The vector Fa of coupling strengths of an elementary behaviour is equivalent to a weight vector in an artificial neuron.

The blackboard acts as an internal memory, where the internal behaviours read, create, and modify information at different blackboard levels. Each blackboard level contains information at a different processing stage (3). The actions of the internal behaviours on the blackboard incrementally lead to the solution of a given problem. The blackboard can itself be seen as the environment of the internal behaviours.

On the blackboard, the interface/communication mechanisms can also read and create signals. They provide the interface between a blackboard node, and other media, such as the external medium, internal medium (needs or goals), and other nodes. The interface/communication mechanisms are also structured by elementary behaviours (4).

The control mechanism is distributed in the functionality of the elementary and internal behaviours.

Different types of elementary behaviours are organized forming emergent behavioural columns, which vertically cross different blackboard levels. They emerge when the signal created by an elementary behaviour constitutes the condition of an elementary behaviour of a different type, and the signal created by this one is in turn the condition of another elementary behaviour, until reaching the last blackboard level. Elementary behaviours from different internal behaviours and communication/interface mechanisms interact with each other through the blackboard. The result is the behavioural columns which thus emerge from this interaction, and represent the route that signals follow through different blackboard levels. Behavioural columns might converge or diverge.

BeCA has two defined blackboard nodes: a node that receives signals from the external medium and determines which action should be taken upon it, and a node for processing signals from the internal medium. Different internal behaviours, blackboard levels, and mechanisms will be defined in these blackboard nodes as we incrementally build our BPS.

We define our BPS separated from the perceptual system, the internal medium (needs, motivations or goals), and the motor system. This allows BeCA to be defined in a generic way, making possible its implementation in different environments and problem domains (perceptual and motor systems are dependent on their environment).

In the next sections, we will build our behaviours production system (BPS) following an evolutionary bottom-up approach. We will first try to solve one problem, and once we have a BPS that solves this problem, we will evolve the BPS as the problem evolves and becomes more complex, but without losing the capabilities of solving the previous problem(s). In this way, we will first build a BPS for the problem of reflex behaviours, which constitutes the initial layer. Then, we will add a layer to our BPS for reactive behaviours. Next, we will add another layer dealing with the problem of motivated behaviours. Finally, we will refine these layers, incorporating learning schemes to obtain a higher adaptiveness in the behaviour production. We will illustrate each layer and refinement process with experiments using our Behaviours Virtual Laboratory in Chapter 6.



3.5. Modelling Reflex Behaviours



Reflex is one of the simplest forms of behaviour exhibited in animals. In this type of behaviour a fast action is triggered when a particular external stimulus is perceived. The key characteristic of a reflex is that the intensity and duration of the triggered action completely depend on the intensity and duration of the stimulus. There is a rigid relationship between the stimuli and the action executed (Manning, 1979; McFarland, 1981; Anderson and Donath, 1990). Duration and intensity of reflex behaviours might depend on internal states, but for one type of stimuli, the triggered action will be of a specified type. This means that in reflex behaviours there is no action selection problem, because for every stimulus perceived, the corresponding behaviour will always be executed.

In BeCA we will model reflex behaviours in the following way: for every signal received from the perceptual system, a corresponding signal will be sent to the motor system.

In the initial approach of our BPS, the reflex behaviours are modelled as a first layer, which includes the definition of the following components: the External Perceptions, Actions, and Internal Perceptions blackboard levels, the reflex actions internal behaviour, and the interface mechanisms exteroceptors, interoceptors, and actuators. These last elements will allow us to establish connections between the perceptual and motor systems.

From this first layer, we will assume the existence of an internal medium (needs/goals), although it does not play a role in the control of reflex and reactive behaviours. This is the reason why the connections between the nodes will appear only at the third layer, for the modelling of motivated behaviours.

At the External Perceptions level the signals from the external medium are projected, first sensed and processed by a perceptual system. At the Actions level signals that indicate which external behaviour must be executed are created. When a signal is created at this level, the external behaviour associated with this element will be invoked, and the action will be executed by a motor system. At the Internal Perceptions level signals from the internal medium are projected, which are sensed by the interoceptors mechanisms.

The exteroceptors mechanisms establish the interface between the perceptual system and BeCA. Once they receive signals from the perceptual system, they process them (multiplying them by a specific coupling strength) and register the resulting signals in the External Perceptions level. In a similar way the interoceptors establish the interface between the internal medium and BeCA, registering signals in the Internal Perceptions level.

The role of the reflex actions internal behaviour is to allow the immediate activation of the behavioural columns representing reflex actions, which do not require an internal input for the execution of the external action associated with this column. The winners of a competition among the elementary behaviours, which were previously activated by corresponding signals in the External Perceptions level representing reflex behaviours, will register the specified signal in their action component directly at the Actions level.

The actuators establish the interface between BeCA and the motor system. When a signal is created at the Actions level, the actuators send it to the motor system, executing the motor action of the signal.

At this first construction stage we assume the existence of a default external behaviour executed by the motor system, when no stimuli have been perceived, which could be, for example, "stand by" or "wander". A diagram of our BPS at the stage of reflex behaviours is shown in Figure 7.





3.6. Modelling Reactive Behaviours



"Desiring without measure is a matter of children, not of a man"

--Democritus



Reactive animal behaviours are those behaviours that show a rigid and complete dependence on external stimuli (Manning, 1979; McFarland, 1981). In Section 3.5 we discussed and modelled the simplest of this type of behaviour: the reflex response. Other two types of reactive behaviours are the taxes and fixed-action patterns, which involve more specific and complex external stimuli and more elaborated response patterns than reflex behaviours.

Taxes or orientation responses consist in the orientation of an animal towards or away from some external stimulus, such as light, gravity, or chemical signals. A fixed-action pattern is an increased and stereotyped response to an external stimulus (Lorenz, 1981; Manning, 1979; McFarland, 1981). This response comprises an elaborated temporal sequence of component actions. Unlike reflex behaviour, the intensity and duration of a fixed-action pattern is not controlled by the presence of a given stimulus. In other words, the execution of the fixed-action pattern could continue even if the stimulus is removed. The escape response in animals is an example of fixed-action patterns. This type of reactive behaviour involves a sequence of evasive actions, and requires a persistence of the environmental signals.

The reactive behaviours are modelled in our BPS by incorporating a new layer over the first one. The creation of this second layer includes the definition of the following components: the Perceptual Persistents blackboard level, and two new internal behaviours: perceptual persistence and external behaviour selector. The inclusion of these components in BeCA allows us to model reactive behaviours by taking into account two new elements not present in the first layer: the persistence of external signals and a process of behaviour selection among different reactive behaviours.

The Perceptual Persistents level models a type of short term memory. At this level, the strongest external signals, initially projected onto the External Perceptions level, persist for more time. The signals at the Perceptual Persistents level are created or modified by the perceptual persistence internal behaviour. The condition of an elementary behaviour of this type is satisfied when at least one of the following facts has taken place: at the External Perceptions level there has been created an external signal specified in the condition of an elementary behaviour, and/or at the Perceptual Persistents level there has been created a signal specified in the condition of an elementary behaviour, or the intensity of this signal has been modified. The elementary behaviours that have satisfied their condition enter into a competition process. The activation level of each elementary behaviour is specified by expressions (1) and (2). The new OiT signal created on the Perceptual Persistents level by the perceptual persistence internal behaviour will be the activation level AiT in expression (2), if it is greater than a threshold thetaT, and zero otherwise. The time during which this signal will be active in the Perceptual Persistents level will depend on the value of parameter kappa, which is a decay factor, in expression (1):



(1)



where OiT is the strength of the previous signal on the Perceptual Persistents level, FaiiS is the coupling strength related to the signal OiS on the External Perceptions level, and FaijT is the negative coupling strength with which the signal OjT laterally inhibits the signal OiT. The final activation level AiT is calculated by hyperbolically converging the temporary activation level AtmpiT to a value MaxiT using expression (2):



(2)



Another internal behaviour required by this layer is external behaviour selector. The role of external behaviour selector is to decide which external behaviour will be executed in the current moment, a process which occurs by taking into account the signals recorded at the Perceptual Persistents level, through a competition process.

A diagram of our BPS at the stage of the modelling of reactive behaviours can be appreciated in Figure 8. As it can be seen, at this stage of the modelling, two types of external behaviours can be produced by BeCA: reflex responses, modelled as direct pathways between the External Perceptions and the Actions levels; and reactive behaviours, mediated by the internal behaviours perceptual persistence and external behaviour selector and involving a simple type of action selection.

If no external signals have been perceived, the motor system will execute a default behaviour (e.g. stand by or wander).





3.7. Modelling Motivated Behaviours

"When ruling, rule yourself beautifully"

--Thales of Miletus



Motivated behaviours are those that by necessity require an internal state in order to be executed (Manning, 1979; McFarland, 1981). That is to say, unlike reflex and reactive behaviours, which show a rigid dependence of external stimuli, motivated behaviours are controlled mainly by the internal state of the animal. For example, that an animal executes the consummatory behaviour drink water depends not only on the presence of the external stimulus water, but also on the internal need thirst. The absence of a stimulus might also be itself an external stimulus capable of triggering a motivated behaviour. For example, the exploratory behaviour in animals is a motivated behaviour, exhibited when the external signal appropriate for the actual internal need is not present in the surrounding environment. Motivation is hence a class of internal process that produces changes in the behaviour (McFarland, 1981). Motivated behaviours are commonly characterized by: sequencing of component behaviours in time, goal-directedness, spontaneity, changes in responsiveness, persistence in the execution of behaviours, and several types of learning (Kupfermann, 1974; Beer, 1990).

Motivated behaviours are modelled in our BPS by incorporating a third layer over the two previous ones. The creation of this third layer can be seen as a process of improvement and refinement of the node related with the processing of external signals and of the node responsible for processing of internal signals or motivations.

The process of improvement and refinement of the node related with the processing of external signals includes the definition of three new blackboard levels: Consummatory Preferents, Drive/Perception Congruents and Potential Actions, the refinement of internal behaviours perceptual persistence and external behaviour selector, the definition of two new internal behaviours: attention to preferences and reactive response inhibition, and the further definition of the communication mechanisms receptor and transmitter. On the other hand, the node responsible for processing internal signals will grow functionally and structurally from the inclusion of the following components: three new blackboard levels, External Perceptions, Intero/Extero/Drive Congruents and Drive; the intero/extero/drive congruence and consummatory preferences selector internal behaviours that carry out the processing of internal signals, and the communication mechanisms receptor and transmitter. We will begin the construction of this layer improvement and refinement of the first node, assuming that it receives signals coming from the node related with the internal states, which indicates the internal need that should be satisfied.

At the Consummatory Preferents level, signals coming from the node responsible for the processing of internal signals are recorded. These signals indicate which internal need should be satisfied. The signals placed in the Drive/Perception Congruents level are derived from the combination of signals recorded at the Perceptual Persistents and the Consummatory Preferents levels. If a signal has been recorded at the Potential Actions level, then one of the two following things will occur: this signal will reinforce the external behaviour firing, initiated by a signal on the Drive/Perception Congruents level; or this signal by itself will be able to invoke an external behaviour.

The role of the perceptual persistence internal behaviour continues to be the representation of the external signals in the Perceptual Persistents level, although its activity has been refined. This means that for the persistence of a signal in Perceptual Persistents, this will be taken into account if a signal associated with this one has been created at the Drive/Perception Congruents level. This last signal participates in the competition among the perceptual persistence elementary behaviours, reinforcing the persistence of the corresponding signal at the Perceptual Persistents level. Expression (3) shows this refinement in the perceptual persistence internal behaviour.



(3)



where FaiiI is the coupling strength of the signal OiI of the Drive/Perception Congruents level. The rest of the notation is the same as used in expression (1), and AiT is still determined by expression (2).

The signals placed at the Consummatory Preferents level are combined with signals placed at the Perceptual Persistents level to decide the possible external actions to execute. This task is carried out by the attention to preferences internal behaviour. At this level, we can see how the internal needs mediate the selection of the external behaviour to be executed. The elementary behaviours encapsulated in this internal behaviour work as operators AND or operators OR depending of the value of gammai in expression (4). This parameter is used to modulate the reactivity degree in the observed behaviour of the entity. The final action of the elementary behaviour i consists in the creation of the solution element OiI at the Drive/Perception Congruents level. The value of solution element OiI is given by expression (4).



(4)



where OiI is the value to be inscribed in the Drive/Perception Congruents level; OiT is the signal from the Perceptual Persistents level and FaiT its corresponding coupling strength; OjC is the signal from the Consummatory Preferents level and FajC its corresponding coupling strength; and gammai and phi modulate the reactivity degree in the observed behaviour of the agent.

When the value of phi equals one, the total value of the signals from the Consummatory Preferents level, representing the most imperative internal needs, is taken. This makes the external behaviour motivated. As phi decreases, less importance is given to the signals from the Consummatory Preferents level, making the external behaviour less motivated. If phi is equal to zero, there will be no flow of signals from the Consummatory Preferents level, and the agent will not have any knowledge of its internal needs. Therefore, by modifying the value of phi we can produce a motivational lesion in the BPS.

For a value of gammai greater than zero, greater importance is given to the external stimuli, represented by the signals on the Perceptual Persistents level, than to the signals from the motivational node, found in the Consummatory Preferents level. This makes that even in the absence of motivation for an external behaviour, the behaviour might be executed reactively. Behavioural columns modelling reactive behaviours would have a gammai value greater than zero, while pure motivated behaviours require their gammai to be equal to zero.

The role of the reactive response inhibition internal behaviour is to establish the following hierarchical organizational principle: the activation of internal behaviours perceiving external signals from the Perceptual Persistents level that have a corresponding congruence with the internal needs, represented in the Drive/Perception Congruents, will have a higher opportunity to be activated, and hence inscribe their signal in the Potential Actions level, than the internal behaviours without a corresponding congruence with the internal needs. A competition takes place among the elementary behaviours, and the value of the signal created at the Potential Actions level (OiH) will be equal to the activation level AiH (determined by expression (5)) if the activation is greater than zero, and zero otherwise.



(5)



where OiT is the signal read from the Perceptual Persistents level, and FaiiT is its corresponding coupling strength; and OjI is the signal from the Drive/Perception Congruents, and FaijI its corresponding coupling strength, which is negative for ij and positive for i=j.

The activity of external behaviours selector has been modified to take into account the internal needs in the selection of an external behaviour. Now, the external behaviours selector decides which external behaviour will be executed in the current moment taking into account both signals recorded in the Drive/Perception Congruents and Potential Actions levels. The elementary behaviours that structure external behaviours selector behave as OR operators. But, when for a signal recorded in the Potential Actions level, there is a corresponding signal recorded at the Drive/Perception Congruents level, the strength of the external behaviours selector elementary behaviour will be greater than those of elementary behaviours with signals represented at only one of the blackboard levels. After this, a competition takes place, in order to decide which signal(s) calculated with expression (6) will be inscribed at the Actions level.



(6)



where OjH is the signal read from the Potential Actions level, and FaiiH is its corresponding coupling strength; OjI is the signal read from the Drive/Perception Congruents level, and FaiiI is its corresponding coupling strength; and AiM is the intensity of the signal to be created by external behaviour selector.

The node responsible for the processing of internal signals or motivations receives signals from the internal medium through the interoceptors, and from the node related with the processing of the external signals through the receptor mechanism, and it sends signals to this node through the transmitter mechanism. The role of the node responsible for the processing of internal signals includes the representation of internal signals, the combination of internal and external signals, and the competitive processes among motivationally incompatible behaviours. This produces the observed final external behaviour to be strongly dependent on the internal states. All the internal states registered by the interoceptors compete among them to determine which external behaviour will be executed. This competition is of the type winner-take-all.

The blackboard of this node organizes the signals in four levels of abstraction: Internal Perceptions, External Perceptions, Intero/Extero/Drive Congruents, and Drive. The signals recorded at Internal Perceptions correspond to the current values of the internal states, sensed and preprocessed by the interoceptors mechanism, multiplied by a coupling strength. At the External Perceptions level the values of the external signals, still represented at the Perceptual Persistents level of the node related with the external signals, are recorded (these signals are transmitted and received by the communication mechanisms). The signals placed at the Intero/Extero/Drive level are derived from a combination of signals at the Internal Perceptions, External Perceptions and Drive levels. The signals created at the Drive level represent the strongest internal needs that should be satisfied.

The signals at both the Internal Perceptions and External Perceptions levels are combined by the intero/extero/drive congruence internal behaviour. This combination may be increased if a corresponding signal has been created at the Drive level. The model for the combination of internal and external signals is given by expression (7).



(7)



where AiC is the intensity of the signal to be created at the Intero/Extero/Drive Congruents level of the motivational node; OiE is the signal from the Internal Perceptions level and FaiE its coupling strength; OjS is the signal from the External Perceptions level and FaijS its coupling strength; OiD is the signal from the Drive level and FaiD its coupling strength; tau is a lesion factor; and alfa regulates the combination of the internal and external signals. This combination model is discussed in detail in (González et. al., 2000).

For a value of alfa equal to zero, the internal signal and external signals interact in a multiplicative way. If one of the signals (internal or external) is very small, it decreases the importance of the other signal. In this way, external signals that contribute to weak motivations, will make the corresponding external behaviour to have little chance of being selected. The same occurs with small external signals for strong motivations. If we consider a value of alfa greater than zero, then the internal state will have more importance than the external signal. In this way, external signals that contribute to strong motivations, will make the corresponding external behaviour to have a strong chance of being selected, even in the total absence of external signals. This results in the external behaviour being a motivated one.

Once the external and internal signals are combined by the intero/extero/drive congruence internal behaviour and the resulting signals are placed on the Intero/Extero/Drive Congruents level, a competition process takes place in order to select the consummatory preferent signal which will be finally placed on the Drive level and sent to the other node. The first of these two processes is carried out by the consummatory preferences selector internal behaviour, whereas the second process is executed by the transmitter mechanism.

The consummatory preferences selector internal behaviour is composed of a set of specific elementary behaviours associated with specific needs and a default elementary behaviour. The condition of an elementary behaviour is satisfied when at the Intero/Extero/Drive Congruents level has been created the signal Ci or its value or intensity AiC has been actualized; and further, when for any of these two cases the value AiC surpasses a threshold theta previously established. All elementary behaviours that have satisfied this condition enter a competition of the type winner-take-all, which decides which elementary behaviours will execute their final action on the Drive level. The final action consists in the creation of the signal Di with intensity OiD, the last value being calculated from expressions (8) and (9).



(8)

(9)



where AiD is the intensity of the signal to be inscribed on the Drive level, OiC is the value of the signal from the Intero/Extero/Drive, inhibited by the rest of the signals OjC multiplied by a negative coupling strength FaijC.

If no element fulfills the condition OiD > theta, then there will be no winner behaviour and the competition ends without a specific behaviour executing its final action on the Drive level. When more than one elementary behaviour has satisfied the condition OiD > theta, then the competition takes place until it converges to a state in which only one elementary behaviour will be the winner. This happens by successive actions of the intero/extero/drive congruence and the consummatory preferences selector internal behaviours.

The communication between both nodes is carried out by the receptor and transmitter mechanisms. Each node has a receptor and a transmitter.

As part of the construction of this third layer, a new default external behaviour is incorporated: the explore external behaviour, preserving the default external behaviour defined in the second layer. The explore external behaviour is oriented towards searching a specific external signal that is required to satisfy an imperative internal need. The explore external behaviour might be executed when for a signal received from the Drive level and placed on Consummatory Preferents, there is not a corresponding signal at the Perceptual Persistent level. This is, when there is an internal need to be satisfied for which the corresponding external signal has not yet been perceived.

Reflex behaviours are still controlled by reflex actions, and reactive behaviours can be implemented in columns with a gammai greater than zero.

Figure 9 shows the components of BeCA once the improvement and refinement processes concerning the construction of the third layer is concluded. At this stage of the bottom-up construction, BeCA is able to model three types of external behaviours: reflex, reactive and motivated, the last two of these mediated by action selection. As it can be appreciated in Figure 9, we have named cognitive to the node related with the external medium, and motivational to the node related with the internal medium. The cognitive node interacts directly with the external medium through its sensors and actuators. The motivational node responds to the different processes related with the motivations that take place at the level of this node.





3.8. Modelling Learning

Adaptation is one of the desirable characteristics in behaviour production. Adaptation in a behaviours production system can be obtained from three main approaches: preprogrammed adaptive behaviours, learned adaptive behaviours, and evolved adaptive behaviours (Meyer and Guillot, 1990). In our BPS, the preprogrammed adaptive behaviours were obtained from the construction of the three layers discussed in Sections 3.5, 3.6 and 3.7, whereas the learned adaptive behaviours can be seen as a refinement process of these layers. In this sense, we will refine the functionality of the BPS to provide it with two types of adaption by learning: (1) associative learning, which allows new behaviours and emergent properties to arise within BeCA, thus increasing its level of adaptiveness, and (2) a simple reinforcement learning approach, which allows the motivation degree in the behaviour to be adjusted dynamically.



3.8.1. Associative learning



All external signals sensed and preprocessed by the perceptual system are projected at each instant on the External Perceptions level of the cognitive node. Several of these external signals represent environment stimuli that are able to trigger the execution of either a reflex, reactive, or motivated behaviours; whereas other external signals are not associated to the execution of external behaviours. The last type of external signal is frequently known as neutral stimuli, because these signals do not have an initial meaning for the agent (i.e. they are not able to produce a behaviour). The associative learning is concerned with the acquisition of meaning by these stimuli under certain conditions explained below. The two forms of associative learning to be incorporated in BeCA are classical primary conditioning and classical secondary conditioning.

Classical primary conditioning can be explained in the following terms: if an initially neutral stimulus appears before each presentation of an unconditioned stimulus (US), the neutral stimulus will be associated with the unconditioned stimulus, and this is why now the first one will be able to produce the same answer produced by the unconditioned stimulus. This stimulus, initially neutral, is now called conditioned stimulus (CS) (Kandel, 1976; Kandel, 1985).

In BeCA, neutral stimuli initially are not able to form behavioural columns through all the levels crossed by unconditioned stimuli. In this sense, the principle to model associative learning in BeCA consists in the modification of the coupling strength values (Fa) of determined elementary behaviours, in order to form behavioural columns. In this way, the trajectory through the different levels of the two blackboards initiated by a neutral stimulus projected on the External Perceptions level of the cognitive node will be able to reach the Actions level.

The modification of the coupling strengths of elementary behaviours takes place when the following things occur: (1) a neutral stimulus is projected on the External Perceptions level of the cognitive node before each projection of an unconditioned stimulus (for example, water source, food source, etc.), and (2) a signal (representing an internal need) associated with the unconditioned stimulus from the Drive level of the motivational node was projected to the Consummatory Preferents level of the cognitive node. These modifications of the coupling strengths can be seen as learning processes, and they are similar to the adjustment of the connection weights in an artificial neural network.

The classical conditioning in BeCA is expressed in terms of three types of learning, which are referred to as: (1) learning of the motor action pattern, consisting in the modification of coupling strengths of the external behaviour selector elementary behaviours, (2) learning of the biological meaning, which consists in the modification of the coupling strengths of the intero/extero/drive congruence elementary behaviours, and (3) learning at a motivational level, which consists in the modification of coupling strengths of the attention to preferences elementary behaviours. The rule for the modification of the coupling strengths used by the three types of learning is given by expression (10).



(10)



where: Faij is the modifiable coupling strength of the elementary behaviour i with respect to the signal j; Ojin is the value associated to the condition signal j; Oiout is the value associated to the action signal i; beta is a parameter that determines the proportion taken from the coupling strength corresponding to the previous instant, (0 beta 1); lambda is a factor regulating the speed of the conditioning; and mu determines the speed of the extinction of conditioning (0 mu 1). The first part of expression (10) regulates the conditioning, whereas the second part regulates the extinction of the conditioning. For values of beta equal to mu, the two parts of this equation could be reduced to the first.

The coupling strengths will be modified when Oiout is greater than zero. If Ojin is also greater than zero, the coupling strength will be increased, while if Ojin is equal to zero, the coupling strength will be decreased.

As it can be seen in Figure 10 and Figure 11, each of these learning processes is able to form or reinforce a segment in the corresponding behavioural column to a neutral stimulus. Figure 10 shows the crossed trajectory segments by neutral stimuli (grey solid circles) and unconditioned stimuli (black solid circles) before classical conditioning. Figure 11 shows the crossed trajectory segments by the neutral stimulus (now, a conditioned stimulus) when the three types of learning have occurred.



learn1.jpg (25399 bytes)

learn2.jpg (25999 bytes)



The secondary conditioning is another type of associative learning incorporated in BeCA, as a part of the refinement process of the three created layers. This type of conditioning can be described in the following terms: if a stimulus that initially is neutral appears before each presentation of a stimulus that already was conditioned (CS), the neutral stimulus will become a conditioned stimulus. Thus, the neutral stimulus will be able to evoke the external behaviour that before evoked the CS, becoming itself also a CS. In other words, in the secondary conditioning the role of an unconditioned stimulus (US) is played by the previously conditioned stimulus (CS) (Kandel, 1976; Kandel, 1985).

In BeCA, the events that originate the secondary conditioning process are the same as the ones already described in the primary classical conditioning. The main difference between both processes of conditioning can be explained in the following terms: In primary classical conditioning, the stimulus that plays the role of the conditioner is by nature an US. This US is able to evoke an external behaviour without the need of a previous learning process (conditioning). This means that the behavioural columns have been previously established. The trajectories of the columns are given by the high values of the coupling strengths of the elementary behaviours associated with each column. In the secondary conditioning, the stimulus that plays the conditioner role is a CS, this is, a stimulus that initially was neutral but that was conditioned by an US in a previous process of primary classical conditioning. Although for this CS the behavioural columns have already been created as well, these were not preestablished, but were created through the process of primary classical conditioning instead (learning of the motor action pattern, learning of the biological meaning and learning at a motivational level). In this way, the second neutral stimulus would be able to form a behavioural column. The main properties of classical and secondary conditioning in BeCA are mentioned in Section 3.9.



3.8.2.Dynamic adjustment of the motivation degree



The other type of learning included in BeCA consists in the dynamic adjustment of the parameter alfa in the model for combination of internal and external stimuli, used by the intero/extero/drive congruence internal behaviour, presented in Section 3.7. This parameter allows to regulate the dependence degree of the external behaviour executed by the agent from his internal state. We have named this parameter "motivation degree". The effects produced by values of alfa equal to zero or alfa near to one in the observed external behaviour were already discussed in Section 3.7. We will rewrite again expression (7) taking into account that there is a different parameter alfa for each intero/extero/drive congruence elementary behaviour. Expression (11) incorporates this change.



(11)



The learning model that controls the dynamic adjustment of the motivation degree (parameter alfai in expression (11)) can be explained in the following terms: when for a strong internal state (e.g. need, goal), if the external signal able to satisfy this need is not present, then parameter alfai is reinforced, so that the external behaviour related with column i begins to be more motivated. Therefore, after this adjustment, the BPS will begin to attach more importance to the internal state which has not been satisfied. In this case, the default exploratory behaviour can be activated for not so strong values of the internal need, although other external signals may have already been perceived. When an external signal is perceived and the value of the associated internal state to this signal is irrelevant, then the parameter alfai of expression (11) is decreased, and the behaviour production begins to become less motivated. Therefore, in later situations with this adjustment, the BPS will begin to give more importance to the available external signal, although the associated internal state is not a strong one. In this case, the default exploratory behaviour will need stronger values of the internal need to be activated. In any other case the parameter alfai is not modified. These cases are represented in expression (12).



(12)



where OiE is the value of the internal signal from the Internal Perceptions blackboard level, jFaijSOjS represents the value of all the external signals that are associated to the internal state i, and theta is a threshold value.

The increase in the value of parameter alfai is determined by expression (13). This increase can be seen as a hyperbolic divergence from alfamin, as seen in Figure 12. In expression (13), delta determines the length of the divergence (how much time it will take alfai to go from alfamin to alfamax), and rho determines the speed of the divergence. This smooth modification behaviour simulates a historic memory of the environment (remembered scenario), so that the value of alfai is increased only after several iterations within a certain environment.



(13)     alfas+.jpg (5971 bytes)



The decrease of the parameter alfai is determined by expression (14). This is similar to the increase described by expression (13), only that it hyperbolically diverges from alfamax, as seen in Figure 13.



(14)     alfas+-.jpg (7074 bytes)



In Figure 14, we can see an example of the behaviour of the parameter ai, as it is increased, decreased, or remains constant, in dependence of the perceived scenario and the internal needs. Note that the increase is faster than the decrease, because the values of ai are closer to alfamax than they are to alfamin.



alfas+-.jpg (7074 bytes)



A detailed discussion related to the learning of the motivation degree in BeCA can be found in (Gershenson and González, 2000).



3.9. Properties of BeCA



"... and many are not amazed because they do not know about it"

--José Luis Mateos



One of the most notable properties of a behaviours production system like BeCA, is that, although it is formed of elemental behaviours, each of which does not have a significance in the survival of the creature it is controlling, the elemental behaviours interact in such a way that from this very interaction emergent behaviours are produced. In a similar way, words in a spoken language may have little meaning by themselves. But, since they have enormous possibilities of combination, almost an infinity of meanings can be created with these words. The articulation of different elemental behaviours in behavioural columns gives the possibility to the BPS to produce a wide variety of behaviours and behaviour patterns, which are not selected, but emergent.

BeCA is a context-free BPS. This means that it can be implemented in different environments and problem domains. This is possible because BeCA is defined in a general way and it is independent of the motor and perceptual systems. If BeCA is desired to be used as the behaviour production system of an artificial or virtual creature, the developers need only to connect the signals from the perceptual system and from the internal medium to BeCA, then define behavioural columns by setting appropriate coupling strengths, and finally connect the output to the motor system. The creature should present the properties of animal behaviour described in this section. Refinements on the resulting system would lead to still more emergent properties.

Our BPS is robust. While lesioning different components of BeCA (Gershenson, González and Negrete, 2000b), its functionality degrades "gracefully".

BeCA is able to model reflex behaviours. External signals perceived by the reflex actions internal behaviour will be directly sent to the Actions level of the cognitive node.

BeCA presents regulated reactive behaviours. The parameter gammai in the attention to preferences internal behaviour regulates how reactive a behavioural column will be. If gammai is equal to zero, then the behavioural column will not be reactive.

Our BPS has motivated behaviours: regulated and/or learned by reinforcement (Gershenson and Gonzalez, 2000). The parameter alfai in the intero/extero/drive congruence internal behaviour regulates the degree of the motivation of an external behaviour. If alfai is near zero, the behaviour will be less motivated than if it is near one. The wealth or scarcity of the environment is taken into account in the learning of this parameter. If the environment is scarce, alfai will be increased. If the environment is rich, then alfai will be decreased.

BeCA has associative learning implemented within it. At this stage, primary and secondary conditionings are present, and the following properties emerge from the interaction of the different components of BeCA: blocking, decreasing of the stimulus activity in time, overshadowing, extinction of the conditioning, reacquisition of the conditioning, temporal interruption of the conditioning, inhibition of the conditioning, and the stronger conditioning occurs for intermediate values of inter-stimuli intervals (González, 2000). BeCA also exhibits delay conditioning, in its two variants, and trace conditioning. The first variant of delay conditioning consists in the length of the neutral stimulus (or the stimulus in conditioning process) being equal to the inter-stimulus interval, whereas the second variant establishes that the length of the neutral stimulus is equal to the inter-stimulus interval plus the length of the unconditioned stimulus. In trace conditioning, the presentation of the neutral stimulus terminates before the arrival of the unconditioned stimulus (Balkenius, 1994).

The following properties of BeCA are emergent:

Opportunism. The blackboard architecture allows the possibility of opportunistic behaviour to arise. The elementary behaviours take the opportunity to execute their actions when their conditions allow it.

Preactivation of internal behaviours. Once a competition is carried out at a motivational level, the winning signal will be sent to the Consummatory Preferents level of the cognitive node. If there is no external signal corresponding for the internal need represented by the winning signal, then the attention to preferences internal behaviour will be "preactivated", focussing attention on the satisfaction of the need. If a corresponding signal appears, the corresponding external behaviour will be executed, without the need for waiting for signals from the motivational node.

Goal-directedness. If a behaviour is motivated, we can say that it is directed by the goals (or needs) of the entity.

Non indecision in the action selection. The different competition processes assure that there will be no indecision, or randomness, in the action selection. For example, if a creature controlled by BeCA has the same degree of hunger and the same degree of thirst, and has food and water in the same amount at the same distance, he will not decide randomly which behaviour will be executed. The motivations will compete until only one will be able to execute its corresponding behaviour.

Satiation. When a creature controlled by BeCA executes a consummatory behaviour to satisfy one of its needs, this need will decrease. Once the need is satisfied, it will not motivate the execution of the behaviour any longer.

Changes in responsiveness. When an internal need is satiated, the entity controlled by BeCA will have a change in its responsiveness, selecting a different behaviour.

Persistence in the execution of a consummatory behaviour. The feedback from the Drive/Perception Congruents level to the perceptual persistence internal behaviour allows the consummatory behaviour previously executed to have a higher possibility to be executed, as long as the external and internal signals corresponding to the behaviour are still strong. For example, if an agent controlled by BeCA is hungry and thirsty, and he finds food and water, there will be no switching from eating to drinking and back with every time step. The agent will execute a consummatory behaviour until the corresponding internal need is adequately satisfied.

Interruption in the execution of a consummatory behaviour. If a creature controlled by BeCA is executing a consummatory behaviour, this can be interrupted in the presence of a sudden need or a reflex or more imperative reactive behaviour. For example, if the creature is drinking, and he perceives a predator nearby, he might interrupt the satiation of his thirst in order to run away.

Varying attention. This property is defined by ethologists as the less importance that an animal gives to danger (e.g. a predator) when the animal has an extreme motivation (e.g. starvation) (McFarland, 1981). This property emerges from the competition at a motivational level. If an agent controlled by BeCA is very hungry, even if he is perceiving a predator, he may try to satisfy his hunger, because of the intensity of the signal representing the internal need.



3.10. About the Behavioural Columns Architecture



"All things are what one thinks of them"

--Metrodorus of Chius



The BeCA evolutionary bottom-up style of engineering, and many of its properties, were facilitated by the blackboard architecture, which provides a great flexibility and capacity of integration and of being intrinsically opportunistic.

There are many reasons to think that the BPS presented and discussed in previous sections is something more than a simple action selection mechanism. BeCA integrates in a single model an extensive repertoire of properties and principles desired in adaptive autonomous agents. Although different subsets of these properties can be found characterizing other ASMs and BPSs reported in the literature (Tinbergen, 1950; Tinbergen, 1951; Lorenz, 1950; Lorenz, 1981; Baerends, 1976; Brooks, 1986; Brooks, 1989; Rosenblatt and Payton, 1989; Maes, 1990; Beer, 1990; Beer, Chiel and Sterling, 1990; Hallam, Halperin and Hallam, 1994; Negrete and Martínez, 1996; Goetz and Walters, 1997), none of them present all of them as a whole, and the incorporation of all these properties in a single model provides great robustness in the behaviour production. The result is a BPS with a very high degree of adaptation.

The bottom-up and evolutionary approach followed in the construction of our BPS allows the increase of a given configuration of the BPS with the incorporation of new layers over existing layers, while preserving the capabilities of the previous ones. The new incorporated layers define types of behaviours that are more complex, which are required when the problem to solve also becomes itself more complex. In this sense, we could think that when the manipulation of concepts and logic is required in order to select behaviours, then cognitive behaviours could be incorporated into the BPS as a new layer over the layer of motivated behaviours, in the same way in which this last layer was incorporated over the layer of reactive behaviours, when the motivations were taken into account for the action selection. Of course, not only would the complexity of the BPS be increased, but we would need also to take into account other issues, such as societies, language, and culture. Therefore, our BPS is capable to be evolved when the problem to solve becomes more complex.

Our BPS is context-free, because it is independent of the motor and perceptual systems of the artificial creature to be controlled. Since the perceptual and motor systems are environment-dependent, our BPS can be easily used in different environments (robots, virtual animats, software agents, etc.), by just designing the appropriate perceptual and motor systems for the given environment of the artificial creature.

The two types of learning schemes present in the BPS, associative learning and dynamic adjustment of motivation degree, were obtained through a refinement process of the previously defined layers. Both types of learning have improved the behaviour production, doing it more adaptive. That is to say, our BPS is characterized by adaptation by learning (Meyer and Guillot, 1990). The associative learning allows new behaviours and emergent properties to arise, which increase the adaptive level of the BPS. The dynamic variation of parameter alfai in the model for combination of external and internal stimuli (expression (11)) allows the autonomous agent to contend with an environment from which the agent possesses certain knowledge, which is summarized in the value of this parameter.

We can also say that BeCA presents emergent cognition, in a Turing style (Turing, 1950). This is, an observer of an artificial creature controlled by BeCA (e.g. animats in our Behaviours Virtual Laboratory) may judge that the creature knows what he is doing. Our intention was not that BeCA would provide cognition to an artificial creature, not even the simple cognition that emerges for observers, but it does. Of course it is low cognition, present in animal behaviour. But we believe that this cognition is also emergent in animals, and that higher cognition should also be emergent. Cognition is not a mechanism. It is an exhibition of capabilities. And this exhibition must be perceived by an observer in order to be considered as cognition.

BeCA was implemented in the Behaviours Virtual Laboratory, to be presented in Chapter 5, providing the behaviour production of animats. In the next section we will present a simple model of social action, which allows complex social phenomena to emerge from the interactions of agents.

In Chapter 6 we will present experiments showing some properties and capabilities of BeCA, of our model for social action and of our Behaviours Virtual Laboratory.

 


1. BeCA has been developed by Pedro Pablo González, José Negrete, Ariel Barreiro, and the author.

2. The names of internal behaviours in BeCA will be typed with italics throughout the text.

3. The names of the blackboard levels will be written starting with capitals.

4. The names of interface/communication mechanisms will be also typed with italics.


Main    Contents    Prev    Next

Carlos Gershenson