Learning

Cognition is the mental faculty by which we know the world, and cognitive psychology is concerned with the acquisition, representation, transformation, and utilization of knowledge by humans (and animals). Learning is the first step in that process.

In terms of human information processing, the mind performs a sequence of activities:

picking up information through sensory and perceptual processes;
storing this information in memory;
transforming it through thought;
communicating it through language; and
executing relevant actions through the skeletal musculature.

Traditionally, this activity was described in terms of the formation of associations of three types:

between environmental events;
between environmental events and the organism's responses to them; and
between the organism's own actions and their effects on the environment.

Reflexes, Taxes, and Instincts

Some of these associations are innate or inborn, part of the organism's native biological endowment.

Reflexes

The reflex is the simplest possible connection between an environmental stimulus and an organismic response. Examples are:

the patellar reflex, the "knee-jerk" reflex commonly tested during routine physical examinations;
the eyeblink reflex, where the eye closes in response to a puff of air striking the cornea;
the other spinal reflexes, such as those that are preserved in cases of paraplegia.

Reflexes are automatic, in that they occur inevitably in response to an adequate stimulus, and occur the first time that stimulus is presented. They do not require the involvement of the higher centers in the nervous system: they persist even when the spinal cord is severed from the brain. Most reflexes are fairly simple, but even fairly complicated activities can be reflexive in nature.

The 19th-century French physiologist Marie-Jean-Pierre Flourens conducted a series of classic studies of reflexes in the decorticate pigeon. He removed both lobes of the cerebral cortex in the bird, and then attempted to determine which patterns of behavior remained in the repertoire. Certain behaviors were preserved:

the animal righted itself when its equilibrium was disturbed;
it walked when it was pushed, and flew when thrown into the air;
it would swallow when water was introduced into its beak;
and when irritated, it would move away from the stimulus.

However, other behaviors disappeared:

it did not flee from the irritation;
it did not avoid obstacles placed in its path;
it showed no voluntary action (i.e., behavior in the absence of any apparent stimulus);
and it showed no signs of emotionality.

Thus, Flourens characterized the decorticate pigeon as a reflex machine, that merely reacted to external stimulation by means of reflexes, but displayed no spontaneous or self-initiated behavior.

Human beings also come "prewired" with a repertoire of reflexes: automatic responses to stimulation that appear soon after birth, before the infant has had any opportunity for learning.

	Some of these are reflexes of approach, elicited by weak stimuli, and which have the effect of increasing contact with the stimulus.Among these is rooting: when the infant's cheek is touched, it will turn its head in the direction of the touch and open its mouth; if its mouth makes contact with any object, it will close and begin to suck (this reflex will occur even if the infant is asleep or comatose).
	Another reflex of approach is grasping: if the palm of the hand is touched, the fingers will flex and close around the object;
	the grasping reflex can be very strong.
	Similarly, if the sole of the foot is touched, the response will be "plantarflexion": the toes will stretch and turn downward.
	Other stimulus-response patterns are reflexes of avoidance, which are elicited by intense or noxious stimuli, and have the effect of decreasing contact with the stimulus.
	For example, the infant's eyes will close automatically in response to a bright light, and the mouth will close at the introduction of an unpleasant taste (e.g., quinine).
	If the palms or soles are scratched, pinched, or pricked, there will be spreading of the fingers or toes, and withdrawal of the hands or feet (in the case of the feet, the toes will also show "dorsiflexion", or turning upward -- the "Babinski reflex").
	A very interesting set of behaviors is the stepping reflex. Infants appear to "learn to walk", but this appearance is deceiving. If the infant's body is supported, and it is moved forward along a flat surface, it will show synchronized stepping. If its toes strike the riser of a set of stairs, it will lift its feet. Neonates don't learn to walk: they can't walk because their skeletal musculature has not matured so that they can support themselves.

Despite the large repertoire of reflexes, infants do not show much initiation of directed activity. The behaviors of the young infant are pretty much confined to reflexes, which are gradually replaced with voluntary action.

Reflexes are an important part of the organism's behavioral repertoire, but they have their limitations.

They permit only a small number of responses to be elicited by stimulation.
They do not permit action to be controlled by internal goals or intentions.

With subsequent development, reflexes tend to disappear. But they are not abolished entirely: the knee-jerk and rooting reflexes can be elicited in adults; and adult paraplegics display a full repertoire of reflexes. However, adult behavior is dominated by voluntary action, and reflexes slip into the background.

Reflexes involve relatively small portions of the nervous system. In principle, the reflex arc requires only three neurons -- though in practice, spinal reflexes involve entire afferent and efferent nerves, as well as the spinal cord. Other innate stimulus-response connections consist of more complicated action sequences, that involve larger portions of the nervous system, and skeletal musculature.

Taxes

A taxis (plural, taxes) is a gross orientation response: after presentation of a stimulus, the whole organism turns and moves. Taxes come in two forms:

In positive taxes, the organism moves toward the stimulus.

A common example is a moth flying into a candle.

In negative taxes, the organism moves away from the stimulus.

A common example is a cockroach scurrying out of the light (actually, it is responding to slight breezes created by the motion, rather than light, such as a human entering a room and turning on a light).

Phototaxes involve responses to light, geotaxes involve responses to gravity (these can be observed in worms and ants as they move up and down inclines).

There are actually lots of other taxes, which can be observed mostly at the cellular level:

Chemotaxes, or responses to the presence of certain chemicals in the environment;
Thermotaxes, or responses to warmth or cold;
Rheotaxes, or responses to the movement of fluids;
Magnetotaxes, or responses to magnetic fields; and
Electrotaxes, or responses to electrical fields.

Taxes are not simple reflexes, because they involve the entire skeletal musculature of the organism. But they are still innate, and involuntary.

Taxes and Reflexes in the Neonate Kangaroo

The behavior of the newborn kangaroo illustrates an effective combination of reflexes and taxes. The kangaroo, like all marsupials (e.g., the opossum), has no placenta. The female gives birth after one month of gestation, and carries the developing fetus in a pouch. But how does the fetus get into the pouch?

Immediately after birth, the newborn climbs up the mother's abdomen -- perhaps by virtue of a negative geotaxis. If it reaches the opening of the pouch, it reverses its behavior and climbs in -- maybe a positive geotaxis. If it does not encounter the opening of the pouch, it will continue climbing until it reaches the top, stop -- or maybe fall off -- and eventually die. The mother kangaroo has no way of helping the infant -- the appropriate behaviors simply aren't in her instinctual repertoire, and -- a point I'll expand on later -- she has no opportunity to learn them through trial and error.

Once in the pouch, if the neonate encounters a nipple, it will attach to it and begin to nurse -- probably a variant on the (rooting reflex. If not, it will simply stop at the bottom of the pouch and eventually die.

Assuming that all goes well, the baby kangaroos emerges from the pouch after about six more months of gestation.

Note that the neonate gets in the pouch by its own automatic actions, with no assistance from its mother. The behavior is entirely under stimulus control, and if it fails to contact the appropriate stimulus it will simply die.

Instincts

Other innate behaviors involve more complicated action sequences, and more specific, discriminating responses. These are known as instincts or fixed action patterns. Instincts have several important properties. As a rule, they are:

complex, stereotyped patterns of action,
rigidly organized,
innate,
unmodified by learning,
species-specific (i.e., some species show them but others do not); and
universal within the species (i.e., every member of the species shows the behavior under appropriate conditions).

Instincts are studied by ethology, a branch of behavioral biology devoted to understanding animal behavior in natural environments, viewed from an evolutionary perspective. As a biological discipline ethology asks four questions about behavior -- all of them variants on Why does an animal behave the way it does?

Causation: What are the mechanisms by which the behavior works?
Function: What is the survival value of the behavior?
Ontogeny: How does the behavior arise in the life of the individual?
Phylogeny: How did the behavior arise in the evolution of the species?

These constitute different levels at which behavior can be analyzed.

Note, however, the focus on ethology on behavior -- and, in particular, on natural behavior. Ethologists analyze animal behavior in its ecological and evolutionary context; they do experiments, but their experiments are performed under field conditions (or something very closely resembling them), not in the sterile confines of the laboratory. Ethologists are not really psychologists, because they are interested only in behavior, not in mind per se. Nevertheless, psychology is a big tent, and many ethologists have found their disciplinary home in a department of psychology, as well as in departments of biology (especially integrative biology as opposed to molecular and cellular biology).

A Nobel Prize for Ethology

Three important ethologists, Konrad Lorenz, Nikolas (Niko) Tinbergen, and Karl Von Frisch, won the 1973 Nobel Prize in Physiology or Medicine for their pioneering research on instincts (four years earlier, in 1969, Tinbergen's father Jan had shared the first Nobel Prize in Economics for his pioneering research on econometrics). For an intellectual biography of Tinbergen, see Niko's Nature: A Life of Niko Tinbergen and His Science of Animal Behaviour by H. Kruuk (2003).

The concept of instinct is well illustrated in Konrad Lorenz' research on imprinting in newly hatched ducks and geese. Once out of the egg, the hatchling follows the first moving object it sees. This is usually the mother, but the hatchling will also follow a wooden decoy, block of wood on wheels, or even a human -- provided that it is the first moving object that the bird sees. The emphasis on the "first" moving object is somewhat overstated, because there is a critical period for imprinting: the imprinted object must be present soon after birth; if exposure to a moving object is delayed for several hours or days, imprinting may not occur at all. If imprinting occurs, the imprinted object will be followed even under adverse circumstances, over or around barriers, etc. When the imprinted object is removed from the bird's field of vision, the bird will emit a distress call. If imprinting has occurred to an unusual object that object will be preferred to the bird's actual parent, or any other conspecific animal.

Link to a film of Lorenz demonstrating imprinting: http://www.youtube.com/watch?v=eqZmW7uIPW4.

The power and perils of imprinting are vividly illustrated by an incident that occurred in Spokane, Washington, in 2009. George Armstrong, a banker, had been watching a female duck nesting on a ledge outside his office window. In the usual course of events, the ducklings would hatch, imprint on their mother, and then follow her as she led them to water. But -- they're on a ledge! And they can't fly yet!. The mother duck knew nothing of this. She's built to wait until her eggs have hatched, and then go to water; and ducklings are built to follow her. The mother jumped off the ledge and -- she's built for that, too -- flew down to the street. The chicks were stranded. Armstrong went out on the street, stood below the ledge, and caught each of the ducklings as they stepped off the ledge, instinctually following their mother (actually, he had to collect a couple from the ledge). Then he served as a crossing guard while the mother collected her young and led them to water. The power of imprinting is that the ducklings will follow their mother -- or Konrad Lorenz everywhere. The peril of imprinting is that the behavior has been selected for a particular environmental niche -- in the case of ducks, the grassy area near water where they usually nest; if that environment changes, for whatever reason, the instinctive behavior may be very maladaptive.

Link to a video of Armstrong catching the ducks: http://blogs.abcnews.com/theworldnewser/2009/05/the-duck-parade.html (sorry about the ad).

There are actually two kinds of imprinting.

What we've been discussing is filial imprinting, which concerns the relationship between adult and youngster.
There is also sexual imprinting, which concerns the relations between males and females. At the International Crane Foundation, in Baraboo Wisconsin, there once lived a Siberian crane named Tex (now deceased), who imprinted on the Foundation's manager, George Archibald. In order to get Tex to mate with a female crane, Archibald had to perform an imitation of the Siberian crane mating dance!

Imprinting is extremely indiscriminate: basically, the bird imprints on the first object that moves within the critical period. However, other instincts are much more discriminating.

Another good example of an instinct is the alarm reaction in some birds subject to predation by other birds (studied by Tinbergen). If an object passes overhead, the birds will emit a distress call and attempt to escape. However, these birds do not show alarm to just any stimulus: it must have a birdlike appearance; moreover, birdlike figures with short (hawk-like) necks elicit alarm, while those with long (goose-like) necks do not (the length and shape of the tail and wings is largely irrelevant).

Imprinting and the alarm reaction involve, basically, only one organism. Other instincts involve the coordinated activities of two (or more) species members.

A good example is food-begging in herring gulls (studied by Tinbergen). Hatchling birds don't forage for their own food, but must be fed a predigested diet by their parents. But the parents do not do this of their own accord. Rather, the chick must peck at the parent's bill: the parent then regurgitates food, and presents it to the chick; the chick then grasps the food and swallows it. But the chick will not peck at any bird-bill. Rather, the bill must have a patch of contrasting color on the lower mandible. The precise colors involved do not matter much, so long as the contrast is salient. Food-begging exemplifies the coordination of instinctive behaviors: the patch is the releasing stimulus for the hatchling to peck; and the peck is the releasing stimulus for the parent to present food.

An excellent example of a complex, coordinated sequence of instinctual behaviors is provided by the "zig-zag" dance, part of the mating ritual of the stickleback fish (Tinbergen).

A male stickleback, when it is ready to mate, develops a red coloration on its belly.

It then establishes its territory by fighting off other sticklebacks. But he fights only sticklebacks, not other species of fish; and only males; and only males who display red bellies and enter his territory in the head-down "threat posture" (other colorations indicate that the other male is not ready to mate; other postures indicate that the other male is only passing through the territory; in either case, there is no territorial fighting).

Experiments by Tinbergen, employing "dummy" models of fish, show that It actually doesn't matter much whether the other fish looks like a stickleback, so long as it has a red-colored belly. Sticklebacks without red bellies may enter this fish's territory, because they don't constitute threats.

Other experiments, in which fish were enclosed in capsules to control their orientation, show that a male who elicits aggression when it enters a territory with its head down will not elicit aggression if it enters the territory with its head level -- perhaps indicating that it is just "passing through".

After the territory has been cleared of threatening males, the male builds a nest out of weeds.

Then he entices a female into the nest -- but only a female stickleback who enters his territory with a swollen abdomen, and in the head-up "receptive posture".

The female enters the nest only if the male displays a red belly, and performs a "zig-zag" dance.

Once in the nest, the female spawns eggs -- but only if she is stimulated at her hind quarters.

Once the eggs are laid, the female leaves the nest and the territory.

The male fertilizes the eggs, fans them to maintain an adequate oxygen supply around them, and cares for the young after hatching (until they're ready to go off to school).

When the young are hatched the red belly fades, and the male no longer incites males and attracts females -- until the next mating cycle starts.

Notice the serial organization to this pattern of stickleback behaviors. It is as if each act is the releasing stimulus for the next one. There is no flexibility in this sequence: once initiated, it does not stop, provided that the appropriate releasing stimulus is present. If any element in the sequence is left out, the entire sequence will stop abruptly. All three parties go through this pattern of behaviors, even if one of them doesn't remotely resemble a stickleback. For example, a female, ready to mate, will enter the nest if she observes a tongue depressor, painted red on one half, imitate the zig-zag dance!

Instincts in Humans?

Taxes and instincts are important elements in behavior, especially of invertebrates, birds, and reptiles. Some psychologists and behavioral biologists argue that much human behavior is also instinctual in nature. One of the first to make this argument was MacDougall, who argued that human behavior was rooted in instinctual behaviors related to biological motives. One of his examples, which is offered here without comment (except to note that similar descriptions could be made of the behavior of men), is reminiscent (at least in tone) of what Tinbergen discovered in sticklebacks:

The flirting girl first smiles at the person to whom the flirt is directed and lifts her eyebrows with a quick, jerky movement upward so that the eye slit is briefly enlarged. Flirting men show the same movement of the eyebrows. After this initial, obvious, turning toward the person, in the flirt there follows a turning away. The head is turned to the side, sometimes bent toward the ground, the gaze is lowered, and the eyelids are dropped. Frequently, but not always, the girl may cover her face with a hand and she may laugh or smile in embarrassment. She continues to look at the partner out of the corners of her eyes and sometimes vacillates between looking at, and looking away.

Among modern biological and social scientists, this point of view is expressed most strongly by the practitioners of sociobiology, especially E.O. Wilson, who argue that much human social behavior is instinctive, and part of our genetic endowment. More recently, similar ideas have been expressed by proponents of evolutionary psychology such as Leda Cosmides, John Tooby, and David Buss. At their most strident, evolutionary psychologists claim that our patterns of experience, thought, and action evolved in an environment of early adaptation (EEA) -- roughly the African savanna of the Pleistocene epoch, where homo sapiens first emerged about 300,000 years ago -- and have changed little since then. Although this assertion is debatable, to say the least, the literature on instincts makes it clear that evolution shapes behavior as well as body morphology. Many species possess innate behavior patterns that were shaped by evolution, permitting them to adapt to a particular environmental niche. Given the basic principle of the continuity of species, it is a mistake to think that humans are entirely immune from such influences -- although humans have other characteristics that largely free us from evolutionary constraints. For a discussion of evolutionary psychology, see the lectures on Psychological Development.

Meanings of "Instinct"

The concept of instinct has had a difficult history in psychology, in part because early usages of the term were somewhat circular: some theorists seemed to invoke instincts to explain some behavior, and then to use that same behavior to define the instinct. But, in the restricted sense of a complex, discriminative, innate response to some environmental stimulus, the term has retained some usefulness. For example, the psychologist Steven Pinker has referred to language as a human instinct.

Nevertheless, the term instinct has evolved a number of different meanings, as outlined by the behavioral biologist Patrick Bateson (Science, 2002):

present at birth (or at a particular stage of development);
not learned;
developed before it can be used;
unchanged once developed;
shared by all members of the species (at least those of the same sex and age);
organized into a distinct behavioral system (e.g., foraging);
served by a distinct neural (brain) module;
adapted during evolution;
differentiated across individuals due to their possession of different genes.

Bateson correctly notes that one meaning of the term does not necessarily imply the others. Taken together, however, the various meanings capture the essence of what is meant by the term "instinct".

From Instinct to Learning

Innate response tendencies such as food-begging can be very powerful behavioral mechanisms, especially for invertebrates and non-mammalian vertebrate species. In their natural environment, some species seem to live completely by virtue of reflex, taxis, and instinct.

Limitations on Innate Behaviors

But at the same time, these innate behavioral mechanisms are extremely limited. They have been shaped by evolution to enable the species to fit a particular environmental niche, which is fine so long as the niche doesn't change. When the environment does change, evolution requires an extremely long time to change behavior (or body morphology, for that matter) accordingly -- much longer than the lifetime of any individual species member.

Consider, for example, the behavior of newborn sea turtles. Female turtles lay their eggs on the beach above the tide line, and these eggs hatch at night in the absence of the parents. As soon as they have hatched, the hatchlings begin walking toward the water (what you might call a "positive aquataxis"): when they reach it, they begin to swim (another innate behavior), and live independently. However, the young turtles are not really walking toward the water: they are walking toward the reflection of the moon on the water (thus, a positive phototaxis). This hatching behavior evolved millions of years ago. Since then, however, the beaches where the turtles hatch have become crowded with hotels, marinas, oil refineries, and other light sources. Accordingly, these days, the hatchling turtles will also move toward these light sources, and die before they ever reach water. The animals' behavior evolved when the only light in the environment was from the sun and the moon, and they just don't know any better. In order to prevent a disaster, beach-side hotels and oil refineries now take steps to employ different kinds of light, or block their lights entirely.

Now perhaps, there is some subtle difference (like polarization) between moonlight and electrical light. If so, individual animals who can make this distinction, moving toward one and not the other, will survive, reproduce, and, over time, generate more individuals who can make this distinction. But again this takes time -- assuming that any individual can make the distinction in the first place. But even so, each individual gets only one chance. If it makes the right "choice", this behavioral tendency will pass on to successive generations, and the species may eventually come to distinguish between "good" and "bad" light -- provided that the species doesn't go extinct first. But that just illustrates the point that evolved behavior patterns take a very long time to change.

In June 2011, a group of diamondback terrapins caused the temporary shutdown of Runway 4 Left at New York's Kennedy International Airport. And it's happened before. The runway crosses a path that the turtles take from Jamaica Bay one side to lay their eggs on the sandy beach on the other side. Usually, in egg-laying season, the runway is not in frequent use, due to prevailing winds. But that day was an exception, and the turtles brought takeoffs and landings to a halt for about an hour until they could be moved to their destination (we don't know what happened when they tried to get back in the water). It's another example of the difficulty that animals have in adjusting evolved patterns of behavior to rapidly changing environmental circumstances. (See "Delays at JFK? This Time, Blame the Turtles" by Andy Newman, New York Times 06/30/2011).

Here's another example: seabirds, like albatrosses, feed their young through the same sort of instinctual food-begging shown by herring gulls. Adult albatrosses forage over open water, dive to catch fish swimming near the surface, and then regurgitate the fish into the mouths of their young. But it's not only fish that are near the surface. There's a lot of garbage in the ocean, as well. The birds don't know the difference -- they're operating solely on reflex. That garbage is of relatively recent vintage, so there hasn't been enough time -- assuming it were even possible -- for the birds to evolve a distinction between fish and garbage. The result is that adult albatrosses pick up garbage and regurgitate it into the bills of their chicks, who promptly die of starvation -- such as this albatross chick photographed on Midway Atoll in the Pacific.

And here's yet another example, a little closer to home. Wind farms like the one in Altamont Pass produce a large amount of electrical energy for California, reducing carbon emissions from coal-fired plants, and our dependence on Middle East oil. But they also create a hazard for birds, especially raptors, who like to forage for small mammals over open areas. Never mind that wind farms are built where there is strong, steady wind, and therefore often on migratory flight paths. The result is that a large number of raptors and other birds are killed every year because they run into the blades of the windmills.

In general, we can identify several limitations on innate response patterns:

The releasing stimulus must be physically present in the current environment. There is no way for the animal to respond to an image or idea or memory of a releasing stimulus.
Instincts and similar fixed action patterns only permit responses to be elicited by external stimuli; they do not permit action to be directed by internal goals.
Because the response patterns are built in over evolutionary time, the organism cannot respond flexibly to new stimuli, or quickly generate new behaviors in response to old or new stimuli.

Thus the problem: everyday life requires many organisms to go beyond simple, innate patterns of behavior, and acquire new responses to new stimuli in their environment.

Evolutionary Traps

Ecologists and evolutionary biologists are becoming increasingly aware of the problems caused by rapid environmental change. The United Nations Summit on Sustainable Development, held in Johannesburg, South Africa, in 2002, drew international attention to the fact that "nature", far from being "natural", has in fact been remade by human hands. According to Andrew C. Revkin, "People have significantly altered the atmosphere, and are the dominant influence on ecosystems and natural selection (see his article, "Forget Nature. Even Eden is Engineered", and other articles in a special section on "Managing Planet Earth", New York Times, 08/20/02). Even in the early part of the 20th century, Revkin notes, the geochemist Vladimir I. Vernadsky had suggested that "people had become a geological force, shaping the planet's future just as rivers and earthquakes had shaped its past". Now in the 21st century, with the growth of megacities, the increase in population, and the disappearance of the forests, to name just a few trends, we are beginning to recognize, and deal with, the impact of human activity on the environment.

The human impact on the environment doesn't just affect the conditions of human existence. Nature is a system, and what we do affects animal and plant life as well, and sometimes in non-obvious ways.

In a recent paper in Trends in Ecology & Evolution (10/02), Paul W. Sherman and his colleagues, Martin A. Schlaepfer and Michael C. Runge, detail a number of "evolutionary traps", mostly caused by the impact of human activity which alters the natural environment -- activity which goes beyond the simple destruction of habitat, which would be bad enough. More subtle changes alter the environment in such a way that a species' evolved patterns of behavior are no longer adaptive, reducing the chances of individual survival and reproduction, and eventually leading to the decline and extinction of the species as a whole. As Sherman puts it, "Evolved behaviors are there for adaptive reasons. If we [disrupt] the normal environment, we can drive a population right to extinction" ("Trapped by Evolution" by Lila Guterman, Chronicle of Higher Education, 10/18/02).

The concept of evolutionary trap is a variant on the more established notion of an ecological trap, in which animals are misled, through human environmental change, to live in less-than-optimal habitats, even though more suitable habitats are available to them. For example, Florida's manatees have progressively moved north, attracted by the warm water discharged by power plants; but when the plant goes down for maintenance, the water cools to an extent that they can no longer survive in it.

Some examples of evolutionary traps:

The male buprestid beetle (Julodimorphabakewelli) of Australia recognizes the female of its species as a brown, shiny object with small bumps on its surface. However, this is also what some Australian beer bottles look like. Accordingly, males will frequently be found attempting to mate with beer bottles, instead of with more appropriate partners. The solution is to get Australians not to litter.
American wood ducks, Aix sponsa, build nests in the cavities of dead trees. When wildlife managers constructed nesting boxes for them, in an attempt to help them meet the demands of habitat loss, the animals actually declined. The reason is that female wood ducks adapted to the loss of natural nesting places by following each other to the few sites that were still available. When the artificial nesting boxes appeared, they all gravitated to the same ones, and laid too many eggs in individual boxes to incubate properly. The solution was to hide the boxes in the woods, increasing the likelihood that individual ducks would find their own nesting sites.
Male Cuban tree frogs, Osteopilus septentrionalis, attempt to mate with females that are actually roadkill (at least they don't move!). Not only does this increase the chance that they themselves will be run over by cars and trucks, but of course the exercise yields no offspring.
Due to global warming, yellow-bellied marmots, Marmota flaviventris, come out of hibernation too early in the season for food to be available, and so many will starve.

Learning Defined

In vertebrates, and especially mammalian species, everyday action goes beyond such innate behavior patterns. These organisms can also acquire new patterns of behavior through learning.

Psychologists define learning as:

a relatively permanent change in behavior that occurs as a result of experience.

This definition excludes changes in behavior that occur as a result of insult, injury, or disease, the ingestion of drugs, or maturation. Learning permits individual organisms, not just entire species, to acquire new responses to new circumstances, and thereby to add behaviors to the repertoire created by evolution. In addition, social learning permits one individual species member to share learning with others of the same species (this is one definition of culture). The pace of social learning far outstrips that of evolution, so that learning provides a mechanism for new behavioral responses to spread quickly and widely through a population. Although all species are capable of learning, at least to some degree, learning is especially important in the natural lives of vertebrate species, and especially in mammalian vertebrates. Like us. And, it turns out, most human learning is social learning: we learn from each other's experiences, and we have even developed institutions, like libraries and schools, that enable us to share our knowledge with each other.

For a good treatment of instinctual behavior, see N. Tinbergen, The Study of Instinct (1969).

For a positive treatment of sociobiology, see E.O. Wilson, Sociobiology: The New Synthesis (1975).

For extensions of sociobiology to psychology, see The Adapted Mind : Evolutionary Psychology and the Generation of Culture edited by Jerome H. Barkow, Leda Cosmides, and John Tooby (1992), and Evolutionary Psychology: The New Science of the Mind (1999) by David M. Buss.

Classical Conditioning

One important form of learning, classical conditioning, was accidentally discovered by Ivan P. Pavlov, a Russian physiologist who was studying the physiology of the digestive system in dogs (work for which he won the Nobel Prize in Physiology or Medicine in 1904). Pavlov's method was to introduce dry meat powder to the mouth of the dog, and then measure the salivary reflex which occurs as the first step in the digestive process. Initially, Pavlov's dogs salivated only when the meat powder was actually in their mouths. But shortly, they began to salivate before the powder was presented to them -- just the sight of the powder, or the sight of the experimenter, or even the sound of the experimenter walking down the hallway, was enough to get the dogs to salivate. In some sense, this premature salivation was a nuisance. But Pavlov had the insight that the dogs were salivating to events that were somehow associated with the presentation of the food. Thus, Pavlov moved away from physiology and initiated the deliberate study of the psychic reflex -- not, as the term might suggest, something out of the world of parapsychology, but rather a situation where the idea of the stimulus evokes a reflexive response. Pavlov called these responses conditioned (or conditional) reflexes.

In honor of Pavlov's discovery, this form of learning is now called "classical" conditioning. A classical conditioning experiment involves the repeated pairing of two stimuli, such as a bell and food powder. One of these stimuli naturally elicits some reflex, while the other one doesn't. With repeated pairings, the previously neutral stimulus gradually acquires the power to evoke the reflex. Thus, classical conditioning is a means of forming new associations between events (such as the ringing of a bell and the presentation of meat powder) in the environment.

The apparatus for Pavlov's experiments included a special harness to restrict the dog's movement; a tube (or fistula) placed in its mouth to collect saliva, a mechanical device for introducing meat powder to its mouth, and some kind of signal such as a bell. (Some writers have questioned whether Pavlov actually used a bell, as the myth has it. Pavlov was actually unclear on this detail in his own writing. But a 1997 article by the American psychologist R.K. Thomas documented this historical tidbit conclusively).

In Phase 1 of the conditioning procedure, Pavlov presented the bell and the food separately. The dog would salivate to the food but make no response to the bell.
In Phase 2, Pavlov presented the bell immediately followed by food. The dog would still salivate to the food; but after several trials, it would begin to salivate to the bell as well.
In Phase 3, Pavlov presented the bell alone, no longer followed by food. After several more trials, the conditioned salivary response would eventually disappear.

The Basic Vocabulary of Classical Conditioning

The procedure just described illustrates the basic vocabulary of classical conditioning:

The unconditioned stimulus (or US) is a stimulus (like the presentation of meat powder) that reliably evokes a reflexive response (like salivation).
The unconditioned response (or UR) is the innate reflexive response that is reliably evoked by an unconditioned stimulus.
The conditioned stimulus (or CS) is a stimulus (such as the ringing of a bell) that does not itself reliably evoke any particular reflexive response. In classical conditioning, the CS is paired with the US.
The conditioned response (or CR) is the response that comes to be evoked by a previously neutral conditioned stimulus (CS), after many pairings between the CS and the US. The CR generally resembles the UR.

The process by which a conditioned stimulus acquires the power to evoke a conditioned response is known as acquisition. In traditional accounts of conditioning, acquisition of the CR occurs by virtue of the reinforcement of the CS by the subsequent US. The strength of the CR is measured in various ways:

The magnitude of the CR (such as the number of drops of saliva or its liquid volume). The magnitude of the CR is typically limited by the magnitude of the UR.
The probability that the CR will occur at all (e.g., the likelihood that any amount of salivation will follow presentation of the bell.

On the initial acquisition trial, when the CS and the US are paired for the very first time, there is only an unconditioned response to the US; there is no conditioned response to the CS.

On later trials, we begin to observe a response that resembles the UR, occurring after presentation of the CS but before presentation of the US. This is the first appearance of the CR.

Even later, we may observe the CR immediately after the presentation of the CS, well before the presentation of the US.

The characteristic curve portraying the acquisition of the CR is an ogive, in which there is a slow increase in response strength on the initial trials, followed by a rapid increase in middle trials, and a further slow increase in the last trials.

The learning curve is commonly characterized as negatively accelerated, and that's true so far as the middle and latter portions of the learning curve are concerned.
But the very early portions are more accurately characterized as positively accelerated.

Generalization, Frequency, and Musical Pitch

In discussing generalization of response among stimuli, it is easiest to use the example of the frequency of tones, because differences in frequency -- whether a tone is high or low -- are easy to appreciate. And the example is accurate so far as it goes. If you condition an animal to a tone CS of 250 cycles per second (cps; also known as hertz, abbreviated hz, after the physicist Heinrich Rudolf Hertz, 1857-1894), it will emit a stronger conditioned response to a tone of 300 hz than to one of 350 hz -- because a tone of 300 hz more closely resembles a tone of 250 hz than does a tone of 350 hertz.

With humans, though, things can get a little more complicated, because musical pitch is also related to the frequency of tones, but similarity among pitches is not just a matter of relative frequency.

Tones that are an octave apart, such as Middle C and third-space C on the treble clef, are perceived as more similar than any other pair of tones.
Tones that are a major fifth apart, such as Middle C and second-line G on the treble clef, are also perceived as highly similar.
And tones that are a major third apart, such as Middle C and first-line E on the treble clef, are also perceived as similar, though not as similar as those separated by an octave or a major fifth.

Thus, when tones are presented in the context of the diatonic scale familiar in Western music, the generalization gradient may be distorted by the vicissitudes of pitch similarities.

Consider an experiment in which a subject is initially conditioned to respond to a tone of 262 hertz, roughly corresponding to Middle C. Such a subject may well show larger conditioned responses to tones of 524 hz (roughly 3rd-space C), 392 hz (second-line G), and 262 hz (1st-line E), than to either B-flat (233 hz) or D (292 hz), even though the former tones are more distant from the original CS, in terms of frequency, than the latter.

However, this may only occur if we establish a musical context for the tones in the first place -- for example, by embedding the C in the other pitches of the diatonic scale. Or by beginning the experiment by playing a tune in the key of C major. There are some experiments here....

Discrimination provides a further check on generalization. Consider an experiment in which we present two previously neutral stimuli: one, the CS+, is always reinforced by the unconditioned stimulus; the other, the CS-, is never reinforced. As conditioning proceeds, the CS+ will come to elicit the CR, but the CS- will not acquire this power. If the CS+ and CS- are close to each other on the generalization gradient, both will initially elicit a conditioned response. But as conditioning proceeds, the CR to the CS+ will grow in strength, while the CR to the CS- will extinguish. The CR is only elicited by CSs that are actually associated with the US.

Before we Habituation is a very primitive form of learning.

New conditioned responses can also appear even if they are very dissimilar to the original conditioned stimulus. Consider the phenomenon known as sensory preconditioning, which occurs before acquisition trials in which a CS is paired with a US.

In Phase 1 of a sensory preconditioning experiment, two neutral stimuli, CS1 and CS2, are initially presented together, without any reinforcing US. Neither of these CSs elicits any particular reflexive response. Because no US is involved, there will be no evidence of any CR being formed.
In Phase 2, the CS2 is reinforced by pairing it with a US, until a CR appears.
In Phase 3, we test CS1. If we have done the experiment right, the CR will also appear in response to CS1, even though CS1 has never been paired with the US.

Something similar happens in higher-order conditioning, except that the first two phases are reversed, so that higher-order conditioning occurs after acquisition trials in which CS is paired with US.

In Phase 1 of a higher-order conditioning experiment, a neutral stimulus, CS1, is paired with a reinforcing US, just as in the standard classical conditioning paradigm, until the usual CR appears.
In Phase 2, CS1 is preceded by another neutral stimulus, CS2, without any reinforcing presentation of any US.
In Phase 3, we test CS2. Again, if we have done the experiment right, the CR will also appear in response to CS2 -- even though, as in sensory preconditioning, CS1 has never been paired with the US.

The Scope of Classical Conditioning

By means of acquisition, extinction, generalization, discrimination, sensory preconditioning, and higher-order conditioning, stimuli come to evoke and inhibit reflexive behavior even though they may not have been directly associated with an unconditioned stimulus. By means of classical conditioning processes in general, reflexive responses come under the control of environmental events other than the ones with which they are innately associated.

The phenomena of classical conditioning are ubiquitous in nature, occurring in organisms as simple as the sea mollusk and as complicated as the adult human being. Pavlov himself thought that all learning entailed classical conditioning, but this position is too extreme. Still, classical conditioning is important because, in a very real sense,

The laws of classical conditioning are the laws of emotional life.

Classical conditioning underlies many of our emotional responses to events -- our fears and aversions, our joys and our preferences.

The Physiological Basis of Learning

The ability to learn -- to change one's behavior as a result of experience -- obviously must reflect changes in the organism's nervous system, and indeed the ability to learn is an important example of the plasticity of the nervous system -- the ability of the nervous system to be modified. But what exactly is going on in the nervous system when an organism learns something.

The fact that at least some phenomena of classical conditioning can be observed in every organism that has a nervous system has allowed behavioral neuroscientists to gain important insight into precisely how the nervous system is modified when organisms learn something. In work that won the Nobel Prize for Physiology and Medicine in 2000 (shared with Arvid Carlsson and Paul Greengard) Eric Kandel of Columbia University examined synaptic changes in the marine mollusk, Aplysia, as it acquired a simple conditioned response.

The most important of these changes is long-term potentiation, an increase in the sensitivity of a postsynaptic neuron as a result of repeated stimulation by a presynaptic neuron. This is the neural representation of both a simple association -- an association between neurons that is created as a result of repeated pairing of CS and US.

Instrumental Conditioning

At roughly the same time as Pavlov was beginning to study classical conditioning, E.L. Thorndike, an American psychologist at Columbia University, was beginning to study yet another form of learning -- what has come to be known as instrumental conditioning. Beginning in 1898, Thorndike reported on a series of studies of cats in "puzzle boxes". The animals were confined in cages whose doors were rigged to a latch which could be operated from inside the cage. The animal's initial response to this situation was agitation -- particularly if it was hungry and a bowl of food was placed outside the cage. Eventually, though, it would accidentally trip the latch, open the door, and escape -- at which point it would be captured and placed back in the cage to begin another trial.

Over successive trials, Thorndike observed that the latency of the escape response progressively diminished. Apparently, the animals were learning how to open the door -- a learning which seemed to be motivated by reward and punishment.

On the basis of his studies of cats in puzzle boxes, Thorndike formulated a set of 8 Laws of Learning, of which three are particularly important for our purposes:

The Law of Readiness states that motivational states such as hunger arouse behavior.
The Law of Effect states that responses that lead to reward are strengthened, occurring more quickly and reliably, while responses that are unrewarded, or even punished, are weakened.
The Law of Exercise states that associations between stimuli (such as the puzzle box) and responses (such as tripping the latch) are strengthened by practice and weakened by disuse.

For the record, the other laws were:

The Law of Multiple Responses: organisms must be able to vary their responses to a stimulus, to give them the opportunity to stumble on the response which will be rewarded.
The Law of Set (or Attitude): an organism's momentary set or attitude will determine which rewards are effective (the opportunity to play tennis may not be rewarding to a golfer).
The Law of Prepotency of Elements: organisms must be able to distinguish between those elements of a situation that are really important, and those that are merely adventitious.
The Law of Response by Analogy: organisms respond to novel situations by drawing analogies to familiar situations.
The Law of Associative Shifting: a response that has been conditioned to a number of different stimuli will be likely to be given in response to a new stimulus.

The general principle of instrumental conditioning is that adaptive behavior is learned through the experience of success and failure. Instrumental learning is also sometimes called operant conditioning, because the organism "operates" on the environment, changing it in some way (for example, changing the cage from one whose door is closed to one whose door is open), and this behavior is "instrumental" in obtaining some desired state of affairs (like food or simply escape from confinement).

Beginning in the 1930s, the study of instrumental conditioning was taken up by B.F. Skinner, a radical behaviorist. Behaviorism was a school of psychology founded by John B. Watson, then at Johns Hopkins University, who believed that psychology could become a legitimate science only by eliminating references to hypothetical mental states (which cannot be publicly observed) and confining the analysis to the relations between publicly observable behavior and the publicly observable environmental conditions under which it is observed. (Watson was forced to resign from Hopkins over a sexual scandal, and went on to a career in advertising. He invented the notion of the "coffee break" as a promotion for Maxwell House Coffee.) Like Watson, Skinner thought that behavior could be, and should be, explained solely in terms of the associations between stimuli and responses, and without reference to hypothetical states (such as hunger) existing in a hypothetical mind of an organism (including humans). Thus the term S-R behaviorism. Skinner was something of a visionary, and he is famous for his utopian novel, Walden II, which describes a community organized along behaviorist lines (he was an English major in college, contemplated a career as a writer, and indeed wrote some very beautiful stuff); and for his meditation on human nature, Beyond Freedom and Dignity. Both are very provocative books. A collection of Skinner's scientific papers, most of which are very readable, is entitled Cumulative Record.

A Note on Two "Functionalisms"

Tracing the relations between environmental stimuli (inputs) and organismic responses (outputs) is often called functional behaviorism, or simply functionalism, but this brand of functionalism (which is currently popular among some philosophers of mind and some theorists in artificial intelligence, a branch of cognitive science) should be clearly distinguished from the 19th-century "Chicago functionalism" of John Dewey and James Rowland Angell (Angell was, however, Watson's graduate mentor), which had its roots in the work of William James and which underlies this course.

Skinner refined Thorndike's apparatus into what has become known as the Skinner box, though Skinner himself did not use the term and actually disliked it. He preferred the term operant chamber. A generic operant chamber, intended to house an animal during learning trials, includes lights for presenting signals, levers or keys for collecting responses, a hopper for presenting food pellets, and a floor grid for presenting electrical shock.

In Phase 1 of a typical instrumental conditioning experiment a food-deprived animal is placed in the operant chamber. Notice that I did not describe the animal as "hungry". Like all behaviorists, Skinner abjured the use of mental- state terms like "hunger", as unobservable and unscientific. Instead, he defined states of the organism in terms of publicly observable external referents, like hours and days of food-deprivation. Anyway, in Phase 1 of a typical instrumental conditioning experiment a food-deprived animal, like a pigeon, is placed in the operant chamber. Unlike Pavlov's dogs, which were restrained by harnesses, Skinner's pigeons were able to move about freely, and so they displayed a wide variety of behaviors -- including pecking at the key (pigeons love to peck). Under these conditions, the experimenter observes the base rate of key-pecking behavior (or whatever other behavior is of interest) in the absence of reinforcement.
In Phase 2 of the experiment, the key is connected to the food hopper in such a way that pecking the key causes a food pellet to drop into the hopper; and the pigeon eats it. Pigeons, especially food-deprived pigeons, love to eat. During this phase of the experiment we observe an increase in key-pecking behavior over the baseline. The animal's key-pecking behavior leads to reward, and so, in accordance with Thorndike's Law of Effect, this behavior is strengthened. This is the acquisition phase.
In Phase 3 of the experiment we change the situation a little, so that key-pecks produce food only when the key is illuminated (alternatively, there may be two keys in the chamber, one illuminated and the other dark). When the light is on, key-pressing produces food; when it is off, key-pressing has no effect. During this phase of the experiment, the bird will peck only when the key is illuminated (or, alternatively, the bird will peck only at the key which is illuminated). This is discrimination learning.
In Phase 4 of the experiment we disconnect the key from the hopper entirely, so that key-pecking no longer leads to food at all. Under these circumstances key-pecking eventually returns to the baseline level. This is extinction. As in the case of classical conditioning, we can also observe the spontaneous recovery of an extinguished response, as well as savings in relearning if the key is reconnected to the hopper.

The "Superstition" Experiment

B.F. Skinner demonstrated the power of Thorndike's Law of Effect with the following classic "superstition" experiment. A food-deprived (remember, if you're a behaviorist you can't say hungry) pigeon was placed in an operant chamber. As pigeons are wont to do, it displayed a variety of random pigeon behaviors: it wandered around the chamber, it groomed itself, it flapped its wings and stretched its neck, it cooed, and it pecked at various locations. Every 30 seconds, a food pellet was dropped into the hopper of the operant chamber; this occurred regardless of the pigeon's behavior. Over trials, each bird developed a stereotyped pattern of behavior, but the precise nature of this pattern was different for each bird. The only regularity was this: whatever behavior that had been emitted at the time that the first pellet dropped now began to occur more frequently.

This is a classic illustration of the Law of Effect. Initially, the association between behavior and reward was purely accidental. Nevertheless, following the principle that rewarded responses are strengthened, while unrewarded and punished responses are weakened, that particular behavior began to occur more frequently. Therefore, the bird was more likely to be displaying that behavior the next time a food pellet dropped into the hopper. So, that behavior was strengthened even more. Eventually, whatever behavior had originally coincided with reinforcement comes to dominate the behavior of that individual bird -- all because of an initially accidental link between behavior and reward.

And the "Air Crib"

There's a kind of urban legend circulating that Skinner raised his children in an infant-sized Skinner box: it's not true. Skinner, an inveterate tinkerer, did invent what he called the "Air Crib", a climate-controlled environment which he hoped would ease some of the burdens of child-rearing and foster child development. The Air Crib looked like a regular, if somewhat large, crib. It had a ceiling, three opaque walls, and a glass pane which could be opened to move the infant in and out. There were controls for temperature and humidity, a canvas floor, and sheeting which could be removed and washed when soiled. In this way, the infant had considerable freedom of movement. Skinner was publicized the Air Crib in an article in the Ladies Home Journal entitled "Baby in a Box: The Mechanical Baby-Tender" (1945). It has been estimated that at least 300 infants were raised in a version of the Air Crib (see Robert Epstein, "Babies in Boxes", Psychology Today, 1995). And contrary to rumors that Deborah eventually sued her father and committed suicide, she was alive and well in 2004, when she wrote a newspaper Op-Ed piece in the (Manchester) Guardian that was very positive about both Skinner and the device.

The Vocabulary of Instrumental Conditioning

The experiment described above illustrates the basic vocabulary of instrumental conditioning, whose terms largely parallel that of classical conditioning -- though be careful, because their meaning sometimes changes slightly.

Reinforcement (Rft) is an event which increases the strength (probability) of the behavior (the conditioned response) which preceded it.

Positive reinforcement is presented following the conditioned response;
Negative reinforcement is terminated following the conditioned response.

Note that "positive" and "negative" do not necessarily mean "pleasant" (e.g., food) and "aversive" (e.g., shock). As it happens, positive reinforcers are typically pleasant (presentation of food is a good thing if you're a food-deprived pigeon); but then again, so are negative reinforcers (termination of shock is also a good thing). Reinforcers always increase the probability of the behavior being reinforced. This is the hardest thing about instrumental conditioning to get straight, because it is the most counterintuitive use of language. Blame Skinner, don't blame me. When most people think of "negative reinforcement", they really mean "punishment". Punishment has a technical meaning in the literature on instrumental conditioning, as it entails the presentation of a negative reinforcer.

A conditioned response (CR) is the behavior which is strengthened by reinforcement. The strength of the CR is usually indicated by response rate, or the frequency with which the organism displays the behavior.

A conditioned stimulus (CS) is an environmental event which leads to the performance of a conditioned response. Put another way, the CS is a signal or cue that the CR will be reinforced. Sometimes, as in Phase 2 of the typical experiment described above, the CS is the operant chamber itself. That is, the presence of the pigeon in the chamber is a cue that key-pecking will produce food. Other times, as in Phase 3 described above, the CS is some discrete feature of the environment -- such as a lighted key, or a buzzer or tone.

These technical definitions of CS and CR give us the term stimulus-response (or S-R) learning theory. The animal learns that emitting the CR (key-pecking) in the presence of the CS (the illuminated key) leads to reinforcement (food in the hopper). Or, to be a strict, radical, Skinnerian, functional behaviorist, reinforcement of the CR in the presence of the CS leads to an increase in the rate of the CR.

Classical conditioning can also be described in S-R terms. The key is to remember how instrumental conditioning defines reinforcement -- as any stimulus that increases the likelihood of the conditioned response. Thus, in classical conditioning, the CR (e.g., salivating) is reinforced by the US (meat powder) in the presence of the CS (the bell). By virtue of this reinforcement, the CR comes to be emitted in the presence of the CS.

Note that in instrumental conditioning there is no discussion of unconditioned stimuli or unconditioned responses. This is because the behaviors in question are not reflexive in nature, as they are in classical conditioning. Rather, these behaviors are emitted spontaneously by the organism. They are what we ordinarily call voluntary, as opposed to the involuntary behaviors involved in classical conditioning -- except that radical behaviorists like Skinner didn't like to talk about "voluntary" responses, or anything else that smacked of "free will", because they felt that all behaviors were under control of environmental stimuli and reinforcements.

The Phenomena of Instrumental Conditioning

Similarly, the major phenomena of instrumental conditioning parallel the classical case.

There is the acquisition of a conditioned response by means of reinforcement;
the extinction of that response by withholding reinforcement;
the generalization of the CR across a generalization gradient as a function of the similarity between the test stimulus and the original CS;
and discrimination learning in response to a discriminative stimulus which indicates when the CR will be reinforced.

Schedules of Reinforcement

To a great degree, the major phenomena of instrumental conditioning parallel those observed in the classical case: acquisition, extinction, generalization, and discrimination. However, studies of instrumental conditioning also illustrate a new concept: schedules of reinforcement, each schedule resulting in a different pattern of behavior.

The term refers to the contingent relationship between the organism's emission of its response and the environment's delivery of reinforcement. In the continuous case, reinforcement is delivered after every CR. In the partial case, reinforcement is occasionally withheld. Partial reinforcement retards acquisition, but it also increases resistance to extinction.

Continuous and partial reinforcement are also terms that occur in the vocabulary of classical conditioning, and they have the same effects. But there is another category of reinforcement schedules, intermittent reinforcement, that is unique to instrumental conditioning. There are four general types of intermittent schedules of reinforcement.

In fixed ratio (FR) schedules, reinforcement is delivered after a specific number of CRs (thus, an FR7 schedule delivers reinforcement after the organism has made 7 CRs).
In variable ratio (VR) schedules, the ratio of responses to reinforcements varies randomly around some average (thus, in VR7 schedule, the organism may be reinforced after 5, 6, 7, 8, or 9 CRs, etc., but the ratio will average 7 CRs to every reinforcement).
In fixed interval (FI) schedules, reinforcement is delivered following the first CR after a specific time interval has elapsed (thus, in a FI30 schedule, the organism is reinforced 30 seconds after it performs the CR, but not before, regardless of the number of CRs it has emitted).
In variable interval (VI) schedules, the required delay varies randomly around some average (thus, in a VI30 schedule, reinforcement might be delivered 20, 25, 30, 35, or 40 seconds after the CR, averaging out to 30 seconds).
Another schedule is the differential reinforcement of low rates (DRL), in which reinforcement is delivered only if a long interval (say, 30 seconds) elapses between CRs. In the differential reinforcement of high rates (DRH), reinforcement is delivered only if the interval is very short (say, 1 second).
Other schedules of reinforcement represent variations and combinations of these.

The Cumulative Record

In textbook figures that depict the effects of various schedules of reinforcement, the organism's cumulative responses are plotted as a function of time (plotted on the horizontal or X axis). This is known as a cumulative record of responses. Every time the organism makes a response, the line moves up a notch on the vertical (Y) axis. Thus, a horizontal tracing means that the organism has made no responses. The slope of the tracing indicates the response rate: shallow slopes indicate a slow rate of response (relatively few responses per unit time), while steep slopes indicate a relatively rapid response rate (relatively many responses per unit time).

B.F. Skinner invented the cumulative record technique, and the term served as the title for his autobiography.

Each schedule of reinforcement produces its own characteristic pattern of behavior. For example, DRL schedules typically produce a string of "ritualistic" responses, that are ineffective in terms of controlling reinforcement but nevertheless effectively fill the long interval between reinforcements.

FR schedules produce a two-valued learning curve, showing a pause immediately after reinforcement, and then an abrupt shift to a very high response rate.
FI schedules produce a scallop-shaped learning curve, in which response rate diminishes immediately after reinforcement, and then gradually increases as the time for the next reinforcement approaches.

Both features are eliminated by switching from fixed to variable schedules, which produce constant, stable rates of response.

With VR, the organism displays a relatively high rate of responding.
With VI, the rate is somewhat lower.

Both VR and VI schedules are highly aversive for the organism being conditioned.

The Matching Law and the Monty Hall Problem

Animals (and humans) can also be put on concurrent schedules of reinforcement. For example, pecking a green key might be reinforced on a VI5 schedule, while pecking a red key might be reinforced on a VI10 schedule. In such cases, the organism will distribute its responses between the two keys in proportion to their rate of reinforcement -- for example, pressing the red key about twice as frequently as the green key. The fact that animals will distribute their responses in proportion to the rate at which those responses are reinforced is called the matching law, which was first announced by Richard Herrnstein (1970), B.F. Skinner's protege at Harvard; see also the review by Peter deVilliers (1977) -- who was, in turn, Herrnstein's protege.

The matching law, in turn, was one of the first contacts between experimental psychology and neoclassical economic theory, as it seemed to reveal a fundamental, perhaps universal, law governing rational choice.

An interesting illustration of the matching law is provided when pigeons are confronted with a version of the Monte Hall problem, popularized by Let's Make a Deal, a television game show. The show's host, Monte Hall, would offer a contestant a valuable prize, such as a car or a vacation, which is hidden behind one of three closed curtains; behind another curtain is nothing; but behind the third curtain is a booby-prize, like a goat. After the contestant makes his choice, Hall opens one of the curtains to reveal nothing, and then offers the contestant the opportunity to change his mind. Note that, at this point, the prize lies behind one of the remaining curtains, while the goat is behind the other one.

Most contestants choose to stick with their original choice (pose this to your friends, and see what they do). But this is the wrong choice. The prior probability that the prize lies behind the contestant's original choice is 1/3. But that's the probability that the prize lies behind any of the curtain. Accordingly, the probability that the prize lies behind the other curtain -- the one that the contestant did not originally choose -- has now doubled to 2/3. Many people don't get this, even after multiple trials with the problem. But it turns out that pigeons catch on pretty quickly -- they're really good at matching responses to reinforcement rates, perhaps because they don't over-analyze the problem, using erroneous theories that lead them to misestimate probabilities. We'll return to the liabilities of estimation later, in the lectures on "Thought and Language".

The Scope of Instrumental Conditioning

By means of instrumental conditioning in general, and schedules of reinforcement in particular, voluntary behaviors come under the control of environmental events. The phenomena of instrumental conditioning are ubiquitous, or nearly so: every vertebrate organism, and some invertebrates as well, is capable of acquiring behaviors under conditions of reward and punishment.

Thorndike and Skinner believed that most adaptive behavior is the product of instrumental conditioning. Again, their position is probably too extreme. But the laws of instrumental conditioning do appear to account for the acquisition, maintenance, and loss of both adaptive and maladaptive voluntary behavior -- habitual behaviors of all sorts, and actions performed under conditions of incentive.

Classical and Instrumental Conditioning Compared and Combined

In several respects, classical and instrumental conditioning appear to represent two quite different forms of learning.

*Classical Conditioning*		*Instrumental Conditioning*
Reinforcement is not contingent on the organism's behavior. The US is delivered following the CS, no matter what the organism does.		Reinforcement is contingent on the organism's behavior. The "reward" or punishment is not delivered unless the organism makes the response to be conditioned.
The response to be conditioned is elicited involuntarily by the US.		The response to be conditioned is spontaneously emitted by the organism as a "voluntary" behavior.
The response being conditioned is "involuntary" (or reflexive) in nature.		The response being conditioned is a "voluntary" (or spontaneous) response.
Because classical conditioning is limited to involuntary, reflexive responses, relatively few responses can be conditioned.		Because instrumental conditioning is open to any behavior (or combination of behaviors) the organism is capable of emitting, a large, possibly infinite, number of responses can be conditioned.

One Form of Learning After All?

Procedurally, the two forms of conditioning represent quite different procedures for studying learning:

In classical conditioning, the organism forms an association between two stimuli, the CS and the US.
In instrumental conditioning, the organism forms an association between a stimulus (the CS) and behavior (the CR).

Donahoe and Vegas (2004) have argued that these differences are more apparent than real, and that classical conditioning also entails an association between the CS and the CR.

On the other hand, it seems equally likely that in instrumental conditioning the organism is forming an association between two stimuli -- between the CS and the reinforcement.

Ultimately, as Donahoe and Vegas argue, it may be that classical and instrumental conditioning are simply two forms of the same underlying learning process. But for now, the procedural differences between them are great enough that we will continue to consider them to be different forms of learning. As will be argued later, in classical conditioning the organism learns to predict events; in instrumental conditioning the organism learns to control them.

Avoidance Learning

Although classical and instrumental conditioning appear (to me, anyway) to represent two different forms of learning, most examples of adaptive behavior appear to involve combinations of classical and instrumental conditioning. That is, through classical conditioning the organism learns to anticipate some future event; through instrumental conditioning it learns to cope with that event.

This sort of combination has been studied in the laboratory in the form of avoidance learning. The procedure in a typical avoidance learning experiment is as follows:

A dog is placed in a long apparatus known as a shuttlebox, consisting of two compartments separated by a low barrier.
A tone CS is followed by a shock US, as in a standard classical conditioning experiment.
If the dog moves to the other compartment during the shock, the tone and the shock are both terminated immediately. This is known as an escape response.
If the dog moves to the other compartment during the tone-shock interval, after the tone comes on but before the shock comes on, the tone is terminated immediately and the shock never comes on at all, until the next trial. This is known as an avoidance response.

Early in training, the animal neither escapes nor avoids, but (naturally) shows agitation when the shock is presented.

This agitated behavior leads, inadvertently, to escape -- much like Thorndike's cats inadvertently tripped the latch to open the door to their puzzle boxes. Over successive trials, the latency of the escape decreases.
Eventually the animal makes the "escape" response during the tone-shock interval, before the shock even comes on; this is, effectively, the first true avoidance response. Over further successive trials, the latency of the avoidance response decreases, until the animal makes it shortly after the tone is presented.

At this point, the experimenter may turn the shock off entirely. Even so, the animal will continue to make avoidance responses, as if the shock were still connected. In this sense, avoidance learning shows a failure of extinction.

The two-factor theory of avoidance learning proposed by O. Hobart Mowrer (1947) illustrates how avoidance combines classical and instrumental conditioning. According to Mowrer, by virtue of the pairing of the tone CS with the shock US two kinds of learning occur.

Because the unconditioned response to shock is fear, the animal acquires a classically conditioned fear response to the tone.
On the instrumental side, the escape response is reinforced by the termination of the shock (and the reduction of unconditioned fear), while avoidance is reinforced by the termination of the tone (and the reduction of conditioned fear).

As we will see later, Mowrer was somewhat wrong to attribute avoidance learning to the reduction of conditioned fear, but his essential point, that avoidance combines classical and instrumental conditioning, remains valid.

Theories and Theories of Learning

Arising in 1898 with the research of Pavlov and Thorndike, the next half-century saw a vast proliferation of research on learning, summarized by E.R. Hilgard in his 1948 Theories of Learning -- a book which went through five editions (the last in 1981, co-authored with Gordon H. Bower), and initiated a large number of popular "Theories Of" courses in developmental, social, and personality psychology.

Already in the 2nd (1956) edition, written before the formal beginning of the cognitive revolution in psychology, Hilgard classified these theories into two major categories:

Stimulus-Response theories, including not just Pavlov and Skinner but Thorndike, Guthrie, and Hull as well.

In the 1948 edition, Hilgard had referred to these as "associative" theories.

Cognitive theories, including Tolman and the Gestalt psychologists.

In the 1948 edition, Hilgard had referred to these as "field" theories, including Lewin and other Gestalt theorists, as well as Tolman.

In Hilgard's view, these theories were distinguished by three theoretical preferences:

"Peripheral" vs. "Central" Intermediaries

S-R theories preferred to think of behavior as mediated by movements, as exemplified by Watson's notion of thinking as sub-vocal speech, so that the environmental stimulus and the organismal response is mediated by a whole chain of stimuli and responses.
Cognitive theorists preferred to think of behavior as mediated by central, ideational processes, such as memories and expectations.

Acquisition of Habits vs. Cognitive Structures

S-R theories characterize what the organism learns in terms of habitual responses.
Cognitive theorists characterize what is learning as factual knowledge (we might add procedural knowledge too), represented in internal, cognitive structures.

Trial and Error vs. Insight in Problem-Solving

When an organism confronts a new situation, S-R theories believe that it first generalizes from familiar situations, responding to cues in the new situation that are similar to those present in old situations; if that fails, then the organism begins from zero and acquires new habits through trial and error.

Cognitive theorists tend to favor a process of perceptual restructuring of the past, leading to insight about the present.

Hilgard himself, for example, speculated that even on the first trial, the animal generates something like a hypothesis that he tests through subsequent experience:

Pavlov's dog hears a bell and thinks "Now what?:".

Thorndike's rat sees a latch and wonders, "What if I press this?".

Tolman's rats come to a choice point, and say to themselves, "I wonder where it goes if I turn right?".

Hilgard points out that all of these theories were "behaviorist" in nature, in that they took behavior, rather than introspections, as their data. There's a difference between between methodological and radical behaviorism.

Even Tolman viewed his theory as behavioristic, in that he rejected any form of introspection, even in the form of verbal reports.

Notions if ideas and expectations, and purposes referred to interpretations of the animal's behavior.

There is a distinction between molar and molecular behaviorism.

Molecular behaviorism describes behavior at the level of the individual muscle group.

Watson's behaviorism, for example, was often disparaged as "muscle-twitch" behaviorism.

Molar behaviorism describes behavior at a level independent of the specific muscle groups involved, and focuses more on what the organism is doing.

As Tolman discovered, a rat who had learned to run a maze could also swim through it to the goal box, even though running and swimming involve completely different muscle movements.
In this way, behavioristic analyses are independent of physiology.

For cognitive theorists such as Tolman, behavior was purposive, in that it was directed toward a goal.

One might also suggest that S-R and cognitive theorists differ in their choice of experimental subjects -- S-R theorists preferring nonhuman animals, and cognitive theorists preferring humans, as subjects. But this is a false distinction.

Hull, whom Hilgard classifies as a S-R theorist (though I disagree a little), worked almost exclusively with humans;
Tolman, the paradigmatic case of a cognitive theorist, worked almost exclusively with rats.

The fact of the matter is that, under the spell of Watsonian behaviorism, almost all research on learning in the first half of the 20th century was on animals -- mostly rats and pigeons. This was, in large part, because the use of nonhuman animals forced psychologists to rely on objective behavior, rather than subjective introspections, as their data. Still, I think we can see Hilgard's two categories in researchers' views about the human-animal distinction.

Lloyd Morgan's Canon (1903) states that "In no case is an animal activity to be interpreted in terms of higher psychological processes if it can be fairly interpreted in terms of processes which stand lower in the scale of psychological evolution and development."
Hilgard himself (1948) proposed a Reversal of Lloyd Morgan's Canon: "Only if a process demonstrable in human learning can also be demonstrated in nonhuman animals is the comparative method useful in studying it."

In any event, Hilgard noted that all learning theorists must accept all of the same facts discovered through research; they differ in terms of interpretation. And all learning theorists seek to answer the same small set of questions:

What are the limits of learning?

How are these affected by species differences, age differences, innate factors, and practice?

What is the role of practice in learning?

Is the relationship monotonically linear? Can too much practice hurt?

How important are drives and incentives, rewards and punishments?

Do rewards and punishments have equal and opposite effects? How strong are intrinsic and extrinsic motives?

What is the place of understanding and insight?

How much is learning governed by automatic and unconscious processes?

Does learning one thing help you learn something else?

What conditions facilitate transfer of training?

What happens when we remember and when we forget?

Is forgotten knowledge permanently lost? How can we remember things that did not happen?

Note what is missing here: there is nothing about the brain (the term barely appears in Hilgard's index). Partly, of course, this reflected the primitive state of neuroscience at the time. But the reasons went deeper than that.

Scientific psychology implicitly adopts a materialist stance.

Whatever they study, psychologists just assume that "the brain does it".
There just wasn't much more to be said in 1948.

But in one of those lovely contradictions that makes psychology the field it is, psychologists are also dualists.

While agreeing that the brain is the physical basis of mind, psychologists believe that mind can be analyzed independently of the body.

Based on behavioral data (including self-reports, reaction time, and other aspects of human performance, psychologists make inferences about underlying mental processes.

For the most part, the classical learning theories have been confined to the dustbin of history. But it's worth reviewing at least some of them, for their relevance to the modern cognitive psychology of learning and memory. Herewith are some summary notes, based mostly on the 3rd edition of Hilgard's Theories of Learning, published in 1966, before the cognitive revolution really took hold in psychology. This edition was the first to be co-authored with Gordon H. Bower, his Stanford colleague. Bower, for his part, had begun his career as a mathematical psychologist focused on animal learning, and became a distinguished first-generation cognitive psychologist whose most famous research focused on verbal learning and memory.

Pavlov

First things first: We have to start with Pavlov, whose studies of classical conditioning got the whole ball rolling. Of course, Pavlov wasn't a psychologist at all. he was a physiologist, who worked first on the cardiovascular and circulatory systems, and then on the gastrointestinal system. I usually give the beginning of Pavlov's work on conditioned reflexes as 1898, the same year as Thorndike, with the first publication being Wolfson's dissertation published in 1899.

First, a couple of notes on terminology:

We call what Pavlov discovered conditioned reflexes, but a better translation from the Russian would have been conditional reflexes, because they appear only under certain circumstances -- in contrast to innate reflexes, which occur unconditionally. "Conditioned reflexes" flows more easily off the tongue, and it's stuck -- except we now understand that they're not reflexes at all.

Pavlov, in turn, was influenced by another Russian physiologist, Sechenov, whose book, Reflexes of the Brain (1866) interpreted voluntary motor behavior as a product of "psychic reflexes".

Pavlov didn't use the term classical conditioning. The term "classical" was bestowed on the Pavlovian paradigm by Hilgard and Marquis (1940), to honor Pavlov's priority in the study of learning, and to contrast it with Thorndike's "Trial and Error" paradigm, which they called instrumental conditioning.

Later, Skinner referred to Pavlovian conditioning as respondent or Type S conditioning (because the subject's response doesn't change the environment), in contrast with his own operant or type R procedures (where the subject operates on the environment). "Instrumental" was, apparently, too mentalistic for Skinner's taste.

By the time he published Conditioned Reflexes (1927) and Lectures on Conditioned Reflexes (1928), Pavlov had developed pretty much the entire vocabulary of conditioning and learning.

The conditioned and unconditioned stimulus and response.
Reinforcement, extinction, and spontaneous recovery, generalization and discrimination.
Various paradigmatic variations, such as delayed, trace, simultaneous, and backward conditioning.

Pavlov's "simultaneous" conditioning, however, was not truly simultaneous: the CS was still presented before the US, even if by only a fraction of a second.

We now know that truly simultaneous conditioning does not occur (see below).
And we now know that backwards conditioning inhibits the conditioned response (see also below).

Pavlov was the first to notice (or, at least, to name) the orientation reflex, or the animal's tendency to attend to any novel stimulus.

The discovery, by Shenger-Krestovnikova and others working in Pavlov's laboratory, of experimental neurosis caused by difficult discriminations.

For an overview of Pavlovian work on experimental neurosis, see Mineka & Kihlstrom (1978), who reinterpreted the phenomenon in cognitive terms of unpredictability and uncontrollability (see also below).

The distinction between the first signal system, underlying conditioned reflexes and shared by humans with other animals, and the second signal system, or language, which gives unique power to human learning.

Soviet psychologists made a big deal out of the second signalling system, and a book by K.I. Platonov,The Word as a Physiological and Therapeutic Factor: The Theory and Practice of Psychotherapy According to I.P. Pavlov (1959), is an under-appreciated classic of psychosomatic medicine, in which words, representing ideas, affect physiological processes both inside and outside the central nervous system.

You could take Pavlov out of physiology and into psychology, but you couldn't take physiology out of Pavlov. Of all the classical learning theorists, Pavlov is the only one to have taken specific positions on the neural basis of conditioning.

Pavlov's laboratory did some experiments, anticipating Lashley's later program, attempting to abolish conditioning by creating specific brain lesions. But these were, apparently, no more successful for him than they would be for Lashley.
Actually, most of Pavlov's neurophysiological theorizing was not based on the outcomes of actual physiological experiments, but rather on inferences from behavioral data, based on his understanding of the nervous system. It's all pretty simple-minded, and all pretty obvious, based on the excitation and inhibition of cortical centers representing the stimuli and responses.
Pavlov explained sleep as a state of cortical inhibition.

And hypnosis was considered to be a state of partial cortical inhibition.

Pavlov interpreted the classical fourfold typology of personality in terms of particular interactions of excitation and inhibition.
The second signal system was associated with the frontal lobe -- which is more fully developed in humans than in other animals.

For an appreciation of Pavlov's contributions to psychology, written by a leading psychologist of the Soviet era, see Razran, G. (1965). Russian physiologists' psychology and American experimental psychology. Psychological Bulletin,63, 42-64.

Thorndike

While Pavlov dominated learning theory in the Soviet Union, Thorndike's theory dominated in the United States. Thorndike called his theory connectionism, because learning was held to strengthen the associations between sensory stimuli and motor responses. In order to avoid confusing Thorndike's "connectionism" with the "modern" connectionism initiated by Rumelhart and McClelland, it's probably best to think of Thorndike's theory as the "mother" of all stimulus-response (S-R) theories of learning.

I've already listed Thorndike's eight laws of learning, which I'll just list here again without much further comment.

There are three primary laws

The Law of Readiness states that motivational states such as hunger arouse behavior.
The Law of Effect states that responses that lead to reward are strengthened, occurring more quickly and reliably, while responses that are unrewarded, or even punished, are weakened.
The Law of Exercise states that associations between stimuli (such as the puzzle box) and responses (such as tripping the latch) are strengthened by practice and weakened by disuse.

And the subordinate laws:

The Law of Multiple Responses: organisms must be able to vary their responses to a stimulus, to give them the opportunity to stumble on the response which will be rewarded.

The Law of Set (or Attitude): an organism's momentary set or attitude will determine which rewards are effective (the opportunity to play tennis may not be rewarding to a golfer).

The Law of Prepotency of Elements: organisms must be able to distinguish between those elements of a situation that are really important, and those that are merely adventitious.

The Law of Response by Analogy: organisms respond to novel situations by drawing analogies to familiar situations.

The Law of Associative Shifting: a response that has been conditioned to a number of different stimuli will be likely to be given in response to a new stimulus.

These laws were set out fairly early in Thorndike's career, and subsequent research led to the revision or abandonment of some of them.

The Law of Exercise was essentially disproved by experiments showing that mere repetition did not necessarily result in improvements in performance. At the very least, there has to be some feedback.
The Law of Effect was truncated in light of experiments showing that reward and punishment did not, in fact, have "equal and opposite" effects. Reward did strengthen responding, but punishment had little or no effect on weakening responses.
Thorndike introduced a new principle, belongingness: responses are strengthened more easily if they somehow "belong" to the stimulus.

Thus, if a subject is learning sentences like "John is a butcher. Henry is a carpenter", John is associated with butcher even though butcher and Henry are closer together. That is, butcher somehow "belongs" to John, rather than to Henry.
Thorndike's concept of belongingness was revived with Seligman's (1971) notion of preparedness in learning, to be discussed below.

In fact, Rozin and Kalat (1971), who were Seligman's colleagues at the University of Pennsylvania, specifically invoked Thorndike's term, "belongingness" in their account of the biological constraints on learning. But we're not there yet.

Thorndike (1933) introduced the concept of spread of effect: reward not only strengthens the S-R connection to which it "belongs". Other recent S-R connections are also strengthened.

The spread of effect to S-R connections to which it does not "belong" illustrates the automatic nature of the Law of Effect. Rewards strengthen connections other than those to which they are directed, along some sort of generalization gradient.

Skinner

Watson, the founder of behaviorism, never developed a full-fledged theory of learning.

That job fell to B.F. Skinner, most prominently in his Behavior of Organisms (1938). Skinner's is an S-R theory, but he rejected the idea of "no stimulus, no response", by which earlier behaviorists had assumed that every response was preceded by some stimulus, even if that stimulus couldn't be identified. Instead, Skinner focuses on two types of response:

Elicited responses, as in Pavlovian conditioning, which Skinner classified as respondents.
Emitted responses, which he classified as operants.

Stimulus conditions are irrelevant to understanding of operants.
For this reason, the strength of operants can't be measured in the usual way, by the probability that they will be elicited by some stimulus.
Instead, the strength of an operant has to be measured in terms of response rate - -the frequency with which the behavior occurs.
Learning occurs when an operant becomes associated with prior stimulus conditions -- i.e., when it becomes a discriminated operant.

This occurs by virtue of reinforcement.

Thus, Skinner's theory can be viewed as an extended meditation of Thorndike's Law of Effect: the association between a stimulus and response is increased when the operant is reinforced in the presence of the stimulus.

This led Skinner to distinguish between two types of learning.

Type S is, essentially, Pavlovian classical conditioning.

This is a little counterintuitive, because "type S" learning involves elicited responses, or operants.

It's called "Type S" because reinforcement is correlated with stimuli, not responses.

Type R is, essentially, Thorndikian instrumental conditioning.

This is counterintuitive for the same reason; but you've got to think like Skinner.

It's called "Type R" because reinforcement is correlated with responses, not stimuli.

And in another counterintuitive move, Skinner distinguished between two types of primary reinforcers:

In positive reinforcement, presentation of reinforcement increases the probability of a response to a stimulus.
In negative reinforcement, withdrawal of reinforcement increases the probability of a response to a stimulus.

In ordinary language, "negative reinforcement" is tantamount to punishment. But for Skinner, punishment comes in one of two forms:

The withdrawal of a positive reinforcer.
The presentation of a negative reinforcer.

There are also secondary (or conditioned) reinforcers, which take on reinforcing properties by virtue of their repeated association with a primary reinforcer.

And there are generalized reinforcers, like money, which have become associated with a wide variety of primary reinforcers.

In shaping, novel responses can be achieved by reinforcing successive approximations.

In this way, organisms can learn to discriminate among stimuli, or differentiate among responses.

While Thorndike's Law of Effect gives rise to the impression that positive reinforcers are pleasant, or satisfy some biological motive, while negative reinforcers are unpleasant, Skinner is a true behaviorist, rejecting all reference to mental states. Reinforcements are known only by their effects: something is reinforcing if it increases the probability of the response with which it is paired.

Drive isn't a stimulus to behavior: rather it is simply an operation as food or water deprivation.
Emotion isn't a behavioral response, either; it, too, is just a set of operations.
And, of course, language is just verbal behavior, emitted and reinforced just like any other form of behavior (Skinner, 1957).

It was this proposal, of course, that drove Noam Chomsky crazy.

In his critique of Skinner's Verbal Behavior (which absolutely must be read), Chomsky (1959) found time to show that Skinner's definition of reinforcement is circular, and thus empty:

A reinforcer is anything that increases the probability of a response.
You can identify a reinforcer because it increases the probability of response.

Estes (1944), while a doctoral student working under Skinner's supervision, showed that punishment suppresses behavior, but does not weaken habits.

Punishment does not have much of an effect on extinction.
Punishment does not eliminate a response from the organism's repertoire.
When punishment is effective at all, it acts on discriminative stimuli, rather than on responses.

That is, the organism learns under which stimulus conditions a response will be punished.

Intermittent punishment is more effective than constant punishment in controlling behavior.

Probably for much the same reason that intermittent reinforcement increasing resistance to extinction.

Punishment has to continue indefinitely to have its intended effects on behavior.
Punishment is most effective if accompanied by a clear discriminative stimulus.

You might say that punishment is more effective if the organism is given an alternative means of achieving its goal. Which is true, except that Skinner doesn't want to talk about goals. Too mentalistic.

As noted earlier, Skinner and his students and colleagues placed great emphasis on the schedule of reinforcement -- that is, the precise relationship between response and reinforcement (e.g., Ferster & Skinner, 1957). Each of these schedules produced a corresponding pattern of behavior.

Hull

Hull also classifies as an S-R theorist, as in his famous formulation:

_SE_R = _SH_R X D.

But that little element D, distinguishes Hull from the other S-R behaviorists, because it posits that learning is a function of an internal physiological (if not mental) state, drive. And it's the presence of this drive state that makes reinforcements reinforcing. So, by postulating an internal state, Hull makes it clear that learning isn't just a matter of associating stimuli and responses. And it offers a non-circular definition of reinforcement: reinforcements reduce physiological drives.

Hull's mathematico-deductive theory of learning (1940) is, in some ways, a masterpiece of quantitative psychological theory, expressly inspired by, and explicitly modeled on, Newton's Principia and Whitehead and Russell's Principia Mathematica with each of its elements stated verbally, then translated into symbolic logic, followed by experimental tests conducted on a variant of the verbal-learning paradigm known as role learning -- essentially, an extension of Ebbinghaus's method. (Earlier, Hull had adapted Ebbinghaus's method for the study of concept acquisition -- ever the tinkerer, inventing the memory drum in the process and creating a whole industry of makers of equipment for university psychology laboratories.

18 postulates, with 10 corollaries.
54 theorems, with 110 corollaries.
8 problems.

Hull's research gave rise to the standard, ogival, form of the learning curve showing the acquisition of a response over time. Actually, there has been some confusion over the shape of the learning curve. Often, the curve is described as negatively accelerated, with large gains on initial trials followed by smaller gains as learning approaches asymptote. But Culler & Girden (1951), in an exhaustive analysis of published learning curves (following Culler, 1928), determined that it is ogival after all.

On initial trials, response strength grows slowly at first, followed by a rapid, positively accelerated increase in strength.

This portion of the curve is often obscured in experiments where the subjects do not start "cold", with no base of prior learning.

Later, on somewhat later trials, there is a steady increase in response strength.

In other words, no acceleration -- just a steady rise.

And on still later trials, the increase is negatively accelerated, gradually tapering off as it reaches plateau.

Hull's system attracted a great number of adherents, and he gained additional fame after leaving Wisconsin (where he got his PhD, with Joseph Jastrow as his advisor) to Yale, where his colleagues at the Institute of Human Relations applied his drive-reduction theory to a wide variety of issues in personality and social behavior -- most famously, Miller and Dollard's work on frustration and aggression and on conflict (approach-approach, approach avoidance, and avoidance-avoidance). Together, these two lines of research laid the foundation for a translation of Freudian psychoanalytic theory into the vocabulary of Hull's S-R theory of learning.

Unfortunately, the mathematical rigor of his theory proved its undoing. In a famous paper, Gleitman, Nachmias, and Neisser (1954) showed that Hull's theory of extinction was simply wrong. It contained a number of internal, logical contradictions; and its empirical predictions proved to be simply wrong. A theory that can't explain extinction isn't a very good theory of learning, after all. And by this time, any Skinnerian functional behaviorism was at its apex -- soon to be overthrown itself, by the cognitive revolution in psychology.

Tolman

The cognitive revolution was foreshadowed by the genuinely cognitive theory of learning proposed by E.C. Tolman (who had been Gleitman's teacher at Berkeley). As a learning theorist, Tolman was the chief competitor to both Hull and Skinner.

There were other cognitive theories of learning, of course, proposed by members of the Gestalt school (both Nachmias and Neisser were students of Kohler and Wallach at Swarthmore, which was where they met Henry Gleitman, who had just joined the faculty there).

But the Gestalt psychologists really focused on perception, and their work on insight in problem-solving, didn't really connect with mainstream learning research (more's probably the pity).
And it probably didn't help much that some Gestalt psychologists, like Koffka, denied the importance of experience in perception. Given that learning was classically defined as changes in behavior that result from experience, that made Gestalt theory a nonstarter as a learning theory.
Of course, the Gestalt theorists had a lot to say about memory, and that should have helped some. But, then again, mid-century behaviorists probably found the whole concept of memory, as an internal (mental) representation of the past, too mentalistic for their verbal-learning tastes.

Tolman is best known for his studies of latent learning, discussed later, which cast doubt on the role of reinforcement. Here, I'll talk in general terms about his theoretical approach.

It is behavioristic in the sense that Tolman rejected structuralism, and thus any form of introspection (not even self-reports), relying exclusively on publicly observable behavior.

Tolman's behaviorism, though, is molar rather than molecular. He is interested in the behavioral act, not in any specific muscular or physiological activities.

In one series of experiments, Tolman taught rats to run a maze. Then he flooded the maze, and showed that they had learned to swim it as well.
Molar behavior is docile, or teachable, in a way that molecular behavior is not.

Tolman's behaviorism was methodological, but not systematic, like Skinner's. Tolman allowed intervening variables to appear between stimulus and response.

He was interested in the purpose of behavior, how it is goal-directed.
For Tolman, behavior is cognitive, in that it is based on the organism's knowledge and expectations concerning the environment.

Put another way: Tolman was interested in "what the organism is doing" (Hilgard & Bower, 1966, p. 192, emphasis added).

And what the organism is doing, mostly, is following signs to a goal.

It doesn't learn responses, as Skinner and other S-R theorists would have it.
Rather, it learns signs and their significance.

Everybody's got their method. Pavlov had dogs in harnesses; Thorndike had cats in puzzle-boxes; Skinner had his operant chamber. Tolman had the maze -- a series of alleys and choice points where his rats could -- well, make choices. In fact, Tolman used the same maze throughout his career. It was a thing of real beauty, with lots of alleys and choice points, which could be walled off with curtains to create different pathways from start box to goal box (diagram courtesy of UCB Prof. Donald Riley, who was one of Tolman's students).

Tolman's research program focused on three aspects of learning.

Reward expectancy: Rats learn to run the maze for a particular preferred reward. When the reward is shifted to a less-preferred reward, they will leave the goal box and search for the preferred reward.
Place learning: rats learn where to go, not what movement to make, in order to get a rewarded.

Tolman kept the path intact but changed the behavior, as in the running-swimming experiment described earlier.
Tolman put spatial habits in opposition to movement habits, and showed that the former won out.
Tolman blocked a familiar path through the maze, and saw that rats quickly followed an alternative path.

Latent learning: animals learn about their environment by exploring it, in the absence of reward. If they receive a reward in a particular place, they know where to go the next time.

When they come to a choice point for the first time, Tolman found that individual rats consistently favored one choice over another, behaving as if they were generating hypotheses about where to go.

And they also engaged in vicarious trial-and-error behavior (VTE), vacillating between one choice and another, before settling on one.

This idea was derided by some radical behaviorists, who characterized Tolman's rats as "lost in thought at the choice point".

Tolman considered himself a behaviorist, but it is clear that he was a behaviorist of quite a different stripe than others.

He was a purposive behaviorist,
He characterized learning as the acquisition of expectations rather than of habits, giving his theory a distinct cognitive, if not mentalistic, flavor.

Under the influence of his colleague Egon Brunswik, Tolman characterized these expectations in probabilistic terms.

The role of reinforcement is not to "stamp in" an association between stimulus and response, but to develop, test, confirm or disconfirm, or refine these expectations.

When psychology was ready for the cognitive revolution, Tolman, and a few others (like Jerome Bruner) had pointed the way.

A final note: There's a reason that the Education/Psychology Building at UCB is named after him. Along with Brunswik, Tolman was probably Berkeley's most famous psychologist: his experiments, from almost 100 years ago, are still described in introductory textbooks. But Tolman's contributions to the University go far beyond the experiments on latent learning. In the late 1940s and early 1950s, at the height of the McCarthy Period in American politics, the Regents of the University of California (there was only Berkeley and UCLA then) required all UC faculty to sign a loyalty oath. Tolman viewed this as an infringement of academic freedom, and (along with some other faculty) refused to sign. He was then dismissed from his post, and took a visiting position at Harvard (where he had gotten his PhD under Munsterberg). He then sued the University for reinstatement. In Tolman v. Underhill (1955), the Supreme Court overturned the loyalty oath, and required the University to reinstate him and the other plaintiffs.

A Note on Functionalism

This is as good a place as any to make some remarks about a general trend in learning theory what is known as functionalism, and clear up some misunderstandings about it.

As a "school" of psychology, functionalism was skeptical of the structuralist claim that we can understand mind in the abstract. Based on Charles Darwin's (1809-1882) theory of evolution, which argued that biological forms are adapted to their use, the functionalists focused instead on what the mind does, and how it works. While the structuralists emphasized the analysis of complex mental contents into their constituent elements, the functionalists were more interested in mental operations and their behavioral consequences. Prominent functionalists were:

William James, the most important American philosopher of the 19th century, and who taught the first course on psychology at Harvard, James's seminal textbook, Principles of Psychology (1890), is still widely and profitably read by new generations of psychologists. True to his philosophical position of pragmatism, James placed great emphasis on mind in action, as exemplified by habits and adaptive behavior.
John Dewey (1859-1952), now best remembered for his theories of "progressive" education, who founded the famous Laboratory School at the University of Chicago.
James Rowland Angell (1869-1949), who was both Dewey's student (at Michigan) and James's student at Harvard, and who rejoined Dewey after the latter moved to the University of Chicago; later Angell was president of Yale University, where he established the Institute of Human Relations, a pioneering center for the interdisciplinary study of human behavior. In contrast to Titchener, who wanted to keep psychology a "pure" science, Angell argued that basic and applied research should go forward together.

Psychological functionalism is often called "Chicago functionalism", because its intellectual base was at the University of Chicago, where both Dewey and Angell were on the faculty (functionalism also prevailed at Columbia University). It is to be distinguished from the functionalist theories of mind associated with some modern approaches to artificial intelligence (e.g., the work of Daniel Dennett, a philosopher at Tufts University), which describe mental processes in terms of the logical and computational functions that relate sensory inputs to behavioral outputs.

The functionalist point of view can be summarized as follows:

Adaptive value of mind. Functionalists assume that the mind evolved to serve a biological purpose -- specifically, to aid the organism's adaptation to its environment. Thus, functionalists are interested in what James called (in the Principles) "the relationship of mind to other things" -- how the mind represents the objects and events in the environment. Functionalism also laid the basis for the application of psychological knowledge to the promotion of human welfare.
Mind in context. From a functionalist point of view, the mind essentially mediates between the environment and the organism. Therefore, the functionalists were concerned with the relations between internal mental states and processes and the states and processes in the internal physical environment (i.e., the organism) on the one hand, and the external social environment (i.e., the real world) on the other.
Operations over content. Whereas structuralism attempted to analyze the contents of the mind into their elementary constituents, functionalism attempted to understand mental operations -- that is, how the mind works. It's this sense of functions as operations that gives functionalism its name.
Individual differences. For Wundt and other structuralists, it didn't matter who the observer was: so long as observers were properly trained, they were interchangeable. But the functionalists, with their roots in Darwin's theory of natural selection, were interested in variation.
Mind and body. Because the mind is what the brain does, functionalists assumed that understanding the nervous system, and related bodily systems, would be helpful in understanding the workings of the mind. At the very least, mind and body ought to be related somehow, and psychologists should be free to investigate the neural underpinnings of mental life, and other aspects of the mind-body relationship.

So where's the confusion? The confusion comes from another form of functionalism, "philosophical" functionalism, which holds a prominent position in cognitive science -- in particular, those proponents of what John Searle calls "strong artificial intelligence" . Essentially, functionalists identify mental states with certain input-output functions, irrespective of the medium which performs those functions. It the follows that any physical system which performs those functions has mental states -- regardless of whether that physical system is a brain, a computer, or -- to take a vivid image -- a bunch of beer cans connected by string and powered by windmills. The connection to Stimulus-Response theory is obvious. Philosophical functionalism does have one advantage over behaviorism, in that it at least acknowledges the existence and causal power of of mental states.

So don't get confused. When somebody identifies himself as a "functionalist", these days, he's likely to be a philosopher who identifies mind with certain functions, and who thinks that computers can have minds. And he's also likely to be inclined toward something like stimulus-response behaviorism.

But, as Dewey and his friends understood, functionalism doesn't have to stand for any such thing. In the American tradition of Dewey and James, functionalism can just be an umbrella term for a particular approach to learning, memory, and other aspects of mind and behavior:

Mind mediates between the environment and the person.
Mind enhances the adaptation of the organism to its environment.
Psychologists should feel free to investigate the biological substrates of mental life.

What is Learned in Conditioning?

So far, we have simply described the phenomena of conditioning -- acquisition, extinction, generalization, discrimination, reinforcement, and the like. But what actually happens in learning? Or, put another way, what is the organism learning from experience?

The Stimulus-Response Theory of Learning

Learning was once thought to be as automatic as reflexes, taxes, and instincts. Just as these are innate stimulus-response associations, part of the organism's biological endowment, so classical and instrumental conditioning was thought to represent acquired stimulus-response connections, formed as a result of experience but no less automatic.

As its name implies, S-R learning theory holds that what is learned in conditioning is an association between a stimulus and a response -- an association that is strengthened by reinforcement.

In the case of Pavlov's dogs, the association is between the bell CS and salivation, and the salivary CR is reinforced by the meat powder US.
In the case of Thorndike's cats, the association is between the puzzlebox CS and the lever-pressing, and the lever-pressing CR is reinforced by escape.
In the case of Skinner's pigeons, the association is between the key CS (or, perhaps, between the illuminated key) and key-pecking, and the key-pecking CR is reinforced by food pellets.

Traditional stimulus-response theories of learning were based on four assumptions:

Association by Contiguity: associations are formed between events that occur close together in space and time. Or, put another way, the repeated co-occurrence of two events creates an association between them, so that the appearance of one evokes the idea of the other. In classical conditioning, the contiguity is between two events in the environment: the conditioned stimulus and the conditioned response induced by reinforcing the conditioned stimulus with the unconditioned stimulus. In instrumental conditioning, the contiguity is between the organism's behavior (the conditioned response) and the situation (the conditioned stimulus) in which it is reinforced.
Arbitrariness: By virtue of reinforcement, any stimulus can become associated with any response, so long as the stimulus can be sensed by the organism (a blind rat can't respond to a visual stimulus) and the response is in the organism's repertoire as a voluntary or involuntary action (a rat can't be conditioned to fly). The arbitrariness assumption is also known as equipotentiality, a term already introduced in the discussion of the functional specialization of the brain.
The empty organism: Behavior (remember that the proponents of the S-R theory were mostly behaviorists in the mold of Watson and Skinner) can be understood solely in terms of stimulus inputs to and response outputs from the organism. In order to understand learning and other aspects of behavior, we do not need to go "inside" the organism to understand its inner structures and functions. We need only focus on stimuli and responses, and can treat the organism as if it were empty. In other words, the organism can be thought of as a "black box" connecting stimuli and responses -- a black box that need never be opened.
The passive organism: The organism is not active during learning. Rather, all the "action" is in the environment, which "stamps in" associations between contiguous stimuli and responses. This assumption gives us the metaphor of "conditioning", and the idea that behavior (reflexes in the case of classical conditioning, non-reflexive behaviors in the case of instrumental conditioning) are under the control of environmental events. There is no notion of intentionality or free will in stimulus-response behaviorism, nor any valid distinction between "voluntary" and "involuntary" behaviors -- because the very notion of a behavior being "voluntary" smacks of free will and mentalism, both anathema to radical behaviorism.

The stimulus-response theory of learning, and the assumptions on which it was predicated, dominated the study of learning for more than 50 years since Watson. Beginning in the 1960s, however, experiments began to challenge this view of learning as a passive, associationistic process. These experiments showed that there were two broad types of constraints on what can be learned -- biological and cognitive. And in revealing these constraints, research overturned the four assumptions of S-R learning theory and completely changed our view of learning.

Biological Constraints on Learning

One important line of research challenged the arbitrariness assumption that organisms could learn to attach any response in their repertoire to any stimulus in the environment, by showing that some conditioned responses are easier to acquire than others.

This research begins with work by the American psychologist John Garcia and his colleagues on a phenomenon known as taste-aversion learning (or bait shyness). Before Garcia became a graduate student of Tolman's at UC Berkeley, he grew up on a sheep ranch in the American southwest, where ranchers routinely used poison to control coyotes and other predators. Garcia knew from this experience that when animals eat poisoned food or drink poisoned liquids, and nonetheless survive, they will avoid that substance later (hence the term, "bait-shyness"). Garcia and his associates developed a laboratory analogue of bait-shyness in an attempt to study the anticipatory nausea which some cancer patients develop in the course of receiving chemotherapy. Garcia's paradigm was a variant on classical fear conditioning:

Rats were exposed to a compound CS while drinking water. By "compound", we mean that the CS was not a simple stimulus, such as Pavlov's bell. Rather, it was characterized as "bright, noisy, sweet" water: the water was flavored with saccharine, and there was a flashing light and clicking sound in the background. . The animals were exposed to all three elements of the compound CS simultaneously while water was made available for them to drink. And because they were all somewhat water-deprived, they all drank during exposure to the compound CS.
Exposure to the CS was followed by one of two unconditioned stimuli:

Foot shock: the delivery of an electrical shock through the floor grid of the test cage -- which elicits pain immediately as a UR.
A sub-lethal dose of X-rays, which induced nausea in the rats some time later. Note that X-rays cannot be sensed by the organism: they are invisible, make no sound, have no taste or smell, and cannot be felt. This fact is important, because it makes it clear that any association established is between the conditioned stimulus (bright, noisy, sweet water) and the unconditioned response (nausea), as traditional S-R theories of learning hold. There can be no association established involving an event that the organism cannot pick up through its sensory apparatus.

Later, learning was tested through an avoidance procedure. The animals were presented with two sources of water, and allowed to drink from either one.

From one source, the water was flavored with saccharine, but there were no sounds or lights presented.
From the other source, the water was unflavored, but drinking was accompanied by flashes and clicks.

Garcia and his associates found that the animals' avoidance behavior depended on the US to which they had been exposed.

If the US had been foot-shock, they avoided water associated with the bright, noisy CS and preferred the water associated with the sweet CS.
If the US had been X-rays, they avoided the sweet water and preferred the bright, noisy water.

In other words, the animals formed associations between shock and sight and sound, and between nausea and taste; but they made no connection between nausea and taste, or between shock and sight and sound. This outcome violates the arbitrariness assumption of traditional S-R theories of learning, because all elements of the compound CS occur at precisely the same time and place. Thus, they all have precisely the same spatial and temporal contiguity with respect to the US. Therefore, under the assumption of arbitrariness or equipotentiality, they should all have been equally powerful as CSs. But they were not.

This experimental outcome is commonly interpreted as indicating that the potency of a stimulus is related to the evolutionary history of the species. Rats are nocturnal animals, and under ordinary circumstances choose their food according to its taste. Therefore, their evolution has disposed them to form associations between the taste of food and its gastrointestinal consequences, but not between sight or sound and nausea. The explanation is supported by experiments on birds (like quail), who are sight-feeders. They quickly form associations between nausea and visual stimuli, but not between nausea and taste.

From Coyotes to Sheep to Wolves

Garcia became interested in bait shyness because of its use by sheepherders and other ranchers in the natural control of coyotes and other predators, but you don't have to be a predator to be susceptible to bait shyness.

In 2007, Morgan Doran, a farm advisor with the University of California Agricultural and Natural Resources Cooperative Extension, based in Davis, began a program of research on bait-shyness in sheep. Sheep and goats are often used for brush control and weed abatement -- you can see them, for example, in the Oakland and Berkeley Hills in an attempt to prevent wildfires from spreading through dry overgrowth. And vintners have been interested in using this same technique for weed control in vineyards.

That's all very good on paper, but the practical problem is how to get the sheep to eat the weeds, and not the very tasty tender shoots of young grapevines!

In Doran's study, a group of sheep are allowed to feed freely on vine leaves, and then they are fed a capsule filled with lithium chloride -- which, while not lethal, induces pretty severe nausea. A control group is also allowed to feed on the grape leaves, but gets a placebo capsule. Results from a pilot study indicates that the sheep will, in fact, avoid the grape leaves in the field, and focus their feeding on the leaves.

A similar project is underway in Marin County's dairyland, where cattle have been trained to prefer a particular kind of thistle.

Turning the tables, bait-shyness (and preparedness) has been enrolled in the effort to protect the Mexican wolf, which was hunted to near extinction by ranchers seeking to protect their cows and sheep from predation. An experiment with captive Mexican wolves shows promise in getting the animals to avoid sheep, and might be effective in wildlife management as well.

Who says that animal research has no practical significance!? Or that's it's bad for the animals.

Contiguity versus Contingency in Conditioning

For example, the principle of association by contiguity, already challenged by Garcia's experiments on taste-aversion learning, is further undermined by certain peculiarities of classical conditioning.

In what is known as the standard paradigm for classical conditioning, the CS precedes the US by a short interval, approximately 1 second, and the termination of the CS is simultaneous with the onset of the US. This situation usually yields excellent conditioning.

In delay conditioning, the duration of the CS is lengthened, although its termination is still simultaneous with the onset of the US. This situation also yields good conditioning, even though the temporal contiguity between CS and US onset has been degraded somewhat by the delay.

In trace conditioning there is also a delay, but in this case the CS goes off before the US comes on. In other words, whereas in delay conditioning there is a delay between CS onset and US onset, in trace conditioning there is a delay between CS offset and US onset. Nevertheless, trace procedures also yield good conditioning. Because of the interval between CS offset and US onset, trace conditioning must be mediated by something like a memory trace of the CS -- hence the name given to the procedure. But the important point is that, as in delay conditioning, trace conditioning gives good results despite the degradation in temporal contiguity between CS and US.

In simultaneous conditioning, the onset of the CS and the onset of the US occur at precisely the same time. Obviously, this situation optimizes temporal contiguity. Nevertheless, in contrast to the standard, delay, and trace paradigms, conditioning does not occur in the simultaneous paradigm -- even though there is perfect contiguity between the CS and the US.

In backwards conditioning, the onset of the CS actually follows the onset of the US. However, the temporal distance between the two stimuli is preserved -- for example, a US-CS interval of about 1 second. In other words, the CS and the US are still highly contiguous in terms of the spatial and temporal relations between them. Nevertheless, no conditioning occurs. In fact, there is evidence that the formation of the CR is actually inhibited in the backwards paradigm. For example, in a standard fear-conditioning experiment, where a tone CS precedes a shock US by about 1 second, the animal will quickly acquire a conditioned response of heart-rate acceleration to the tone (remember that one of the components of the flight-or-fight response, mediated by activation of the sympathetic branch of the autonomic nervous system, is an increase in heart rate). However, in backwards conditioning, where the shock precedes the tone, the animal will actually show heart-rate deceleration in response to the tone.

These kinds of results highlight the distinction between contiguity and contingency.

In contiguity, the CS co-occurs with the US: they are contiguous, or close together, in space and time.
In contingency, the CS predicts the US: the occurrence of the US is contingent on the prior occurrence of the CS.

Given the results just summarized, we can conclude several things about the role of contiguity and contingency in conditioning.

Conditioning is best when the CS and US are both contiguous and contingent -- as in the standard paradigm, where the CS predicts that the US will occur shortly.
Conditioning is also good when the CS and US are contingent but not contiguous -- as in delay and trace conditioning, where the CS predicts that the US will occur after some delay.
Conditioning is poor when the CS and US are contiguous but not contingent -- as in simultaneous conditioning, where the CS cannot predict the US because the two stimuli occur simultaneously.
Conditioning is actually inhibited in backwards conditioning, where the CS occurs close in time to the US, but the CS actually predicts the absence of the US.
Conditioning is also inhibited in extinction, where the CR no longer predicts that the US is forthcoming. In "extinction below zero", the conditioned inhibition is strengthened even further.

According to conventional S-R learning theory, associations are formed by virtue of the spatiotemporal contiguity between events in the environment, stimuli and responses, or actions and their outcomes. That is to say, associations are formed between two elements that occur closely together in space and time. However, an increasing body of evidence, including the outcomes of various classical-conditioning paradigms, indicates that contiguity is not the important element in learning. Rather, the important element is contingency: the degree to which one event (etc.) predicts another (etc.).

Put another way, conditioning occurs when the CS acts as a signal that the US is forthcoming. In backwards conditioning, however, the CS signals that the US is not forthcoming. In backwards fear conditioning, the CS actually serves as a safety signal -- informing the animal that the shock will not be forthcoming for a while. The CS has value as a signal only when there is a contingent relationship between the CS and the US, regardless of whether the CS and US are temporally and spatially contiguous. The conclusion is that contingency is more important than contiguity: conditioning occurs only when the CS predicts the US. When the CS is uninformative about the US, no conditioning occurs. And when the CS predicts the absence of the US, as in extinction or backwards conditioning, the CR is actually inhibited.

The Rescorla Experiment

A compelling demonstration of the role of contingency in classical conditioning was provided in a classic experiment by Robert Rescorla (1967), for his doctoral dissertation at the University of Pennsylvania (after many years at Yale, Rescorla returned to his alma mater in a faculty role). In this experiment, Rescorla varied the predictability of a shock US, given the presentation of a tone CS.

In one condition of the experiment, the CS was a perfect predictor of the US, in that the CS always immediately preceded the US (that is, within 1 second or so). No CS was ever presented that was not immediately followed by a US; and no US was ever presented that was not immediately preceded by a CS. Thus, expressed in terms of probabilities:

p(US | CS) = 1.0; and

[Read this as "the probability that the US will occur given the prior occurrence of the CS is 1".]

p(US | no CS) = 0.0.

[Read this as "the probability that the US will occur given no prior occurrence of the CS is 0"]

This condition resulted in very good conditioning.

In another condition of the experiment, the CS was a less-than-perfect predictor, because Rescorla interspersed a number of unreinforced CSs -- that is, CSs that were not immediately followed by USs. Thus, of all the CSs that were presented, half were not followed by USs. However, the US never occurred unless it was immediately preceded by a CS. Again, expressed in terms of probabilities:

p(US | CS) = 0.5 and p(US | no CS) = 0.0.

This condition still resulted in fairly good conditioning.

In a third condition of the experiment, the CS rendered ineffective as a predictor of the US, because Rescorla interspersed a number of unsignalled USs -- in fact, half of the USs -- USs that were not immediately preceded by CSs. Now, the situation was that CSs and USs occurred randomly, independently of each other. Expressed in terms of probabilities:

p(US | CS) = 0.5 and p(US | no CS) = 0.5.

Under these conditions, no conditioning occurred, even though the CS and US were frequently presented together in the same place at the same time.

The upshot of Rescorla's experiment, which stands as a modern classic in psychology, is that conditioning is not simply the formation of an association between spatially and temporally contiguous stimuli. Rather, conditioning occurs only when the CS provides information about the US. The amount of information provided may be estimated as the difference between two probabilities:

p(US | CS) - p(US | no CS).

In the first condition of Rescorla's experiment, this difference is 1.0, and results in good conditioning.
But in the second condition, this difference is reduced to 0.5: still positive, thus resulting in conditioning, but not as high as 1, thus not as good as in the first condition.
In the third condition, the difference is 0.0, and no conditioning results.

Conditioning occurs only if, and to the degree that, the CS is a reliable predictor of the US. Put another way, conditioning occurs only if the US is more likely following a CS than in the absence of the CS. What's amazing about this is that it appears that even organisms as simple as the white rat, or simpler, are in some sense computing the conditional probabilities involved. The computation is not necessarily conscious, of course -- the rats haven't taken Statistics 2, after all. But it is a computation nonetheless.

The Kamin Experiments

The importance of the predictive relationship between the CS and the US is underscored by two other phenomena discovered by Leo Kamin.

Kamin's first experiment concerned the phenomenon of overshadowing. Consider two standard conditioning preparations:

in the first, a bright light CS is followed by shock US;
in the second, a soft tone CS is followed by shock US.

Both preparations yield good conditioning. And when we combine these two effective CSs into a single compound CS, bright light and soft tone presented simultaneously and followed by shock, just as Garcia did with his compound of "bright, noisy, sweet" water, what we find is good conditioning,

But what happens if, after we condition the organism to the compound, we test the two elements separately? When we do, we get a good CR to the light, but not to the tone. This is not a problem of differential preparedness, as in the Garcia experiment, because neither light nor tone is particularly prepared or contraprepared to serve as a signal for shock. Instead, once more, the result violates the assumption of association by contiguity. Both the light and the tone were equally contiguous with the shock. But it appears that the more salient, noticeable CS, in this case the bright light, overshadows the less salient or noticeable one. Both are contiguous with the shock, and both are good predictors of the shock as well, but conditioning occurs to the CS that is more salient.

The second experiment concerned the phenomenon of blocking. As background to this research, recall that in standard classical fear conditioning, a foot-shock US is preceded by a tone or light CS. Under these conditions, we get good conditioning of fear, as represented by such conditioned emotional responses as heart-rate acceleration, in response to previously neutral CSs.

We now give an animal acquisition trials with a compound CS, consisting of a tone and a light presented simultaneously, followed by a shock US in the usual manner. After 16 pairings of tone and light followed by shock, we test the animal's response to a variety of stimuli:

When we test the animal's response to the compound CS, we see evidence of fear conditioning, as expected.
When we test the animal's response to each element of the compound, presented individually, we also see evidence of fear conditioning to each of the elements presented alone.

But something different happens when the procedure is reversed, and conditioning trials with the compound CS are preceded by conditioning with only one element alone.

In Phase 1 of a blocking experiment, the animal receives 16 trials with an elementary CS, such as a noise followed by a shock; at the end of this phase, the animal will show a conditioned fear response to the tone.
In Phase 2 of a blocking experiment, the animal now receives 8 additional trials with a compound CS, in which the noise and light appear simultaneously, followed by shock.

What happens when we now test the animal's response to presentation of the light alone?

The first prediction of association by contiguity is that the animal should now show fear conditioning to the compound CS. This does in fact occur.
However, the further prediction of association by contiguity is that light alone should now evoke the fear CR as well, because for eight trials it has appeared close together in space and time with the shock US. But, in fact, no conditioning accrues to the light. If we test the tone, however, the animal will continue to show conditioned fear.

Here are the actual results of some of Kamin's experiments.

When animals are conditioned to fear a noise, and then are tested with a light, there is little evidence of a conditioned response. After all, they've been conditioned to a noise, and don't know anything about lights.
When animals are conditioned to the tone/light compound, and then are tested with a light, they show a big conditioned response. After all, light has been paired with shock.
But when animals are first conditioned to the noise, and then receive further conditioning trials with the noise/light compound, testing with the light yields no conditioned response. It's as if they never received any pairings of the light and noise at all.

Apparently, the prior conditioning to the noise has "blocked" conditioning to the light. This surprising outcome is explained in terms of the information provided by the various CSs. In the case of the compound CS, the new element, light, is redundant with the noise. Expressed in terms of Rescorla's conditional probabilities:

p(shock | noise) = p(shock | noise + light) = 1.0.

Now, the outcome would be different under different conditions.

For example, if the light preceded the noise, which in turn preceded the shock, conditioning would accrue to the light as well as the noise: this is because the light predicts the noise which predicts the shock.
Similarly, if there was a change in the US, such as its latency or intensity, conditioning would also accrue to the light as well as the noise: this is because the light predicts this change, providing extra information about when it will occur, or how strong it will be.
Finally, consider an experiment in which the animal is conditioned to the noise, and then receives trials where the noise/light compound is not followed by shock. Ordinarily, unreinforced presentation of the noise would yield extinction of fear to the noise. But in this case, testing response to the noise alone yields a big conditioned fear response. The noise alone still predicts shock; in combination with the light, noise predicts the absence of shock.

This leads us to a clarification of the principle of association by contingency:

Conditioning occurs only when the CS signals a change in the US.

Kamin concluded, further that conditioning only occurs when the US surprises the organism. In the presence of a surprising event, the organism then searches the environment for possible predictors of that event. Among these, it will pay attention to the most reliable predictor, which becomes the effective CS. If there is more than one reliable predictor, it will attend to most salient predictor, leading to the phenomena observed in the "overshadowing" experiment. And it will ignore stimuli that lack predictive power, leading to the phenomena observed in the blocking" experiment.

Kamin's experiments are important because they simultaneously undermine three assumptions of classical S-R learning theory.

Because the elements of the compound CS are equally contiguous with the US, but differ in terms of the degree to which they predict the US, the assumption of association by contiguity must be wrong.
Because conditioning occurs only when the US surprises the organism, the assumption of the empty organism must be wrong: in order to understand conditioning, we must know what is going on in the mind of the organism -- what it's expecting, and whether it's surprised.
And the assumption of the passive organism is also wrong: the surprised organism is actively searching its environment for predictors, and focusing its attention on some events to the exclusion of others.

Pretty good for one experiment. No wonder it's a classic.

Learned Helplessness

Similar considerations apply to instrumental conditioning. The behaving organism is searching for predictability, but it is also searching for control. It wants to know what to do about forthcoming events, not just where and when to expect them. In instrumental conditioning, the organism is acquiring these expectancies of control.

The role of these expectations can be seen clearly in the phenomenon of learned helplessness, discovered by Martin E.P. Seligman, Steven Maier, and Bruce Overmaier when they were graduate students at the University of Pennsylvania, working under Richard L. Solomon. Mowrer's two-factor theory of avoidance learning, discussed above, predicts that avoidance learning will be facilitated if the organism has already undergone fear conditioning. The idea is that the organism already knows to fear the CS, and all it has to do is to learn to avoid the US. To test Mowrer's theory, they performed the following experiment:

In Phase 1, a dog receives classical fear conditioning trials in which a tone CS is paired with a foot-shock US, until the animal reliably shows a conditioned emotional response.
In Phase 2, the same dog is placed in a shuttlebox for avoidance training. This is a long box divided into two sections by a small barrier, over which the dog can leap. A tone comes on, followed by foot-shock delivered through the floor of one section. If the animal leaps the barrier to the other section while the shock is on, the shock will be terminated. If the animal leaps the barrier to the other section while the tone is on, the shock will be eliminated for that trial. Then the procedure is repeated for several more trials.

In standard avoidance-learning situations, without Phase 1, animals learn the avoidance response readily, and shuttle nonchalantly back and forth from one section to the other as tones come on. However, Overmeier & Seligman (1967) discovered that in their new situation, with Phase 1 inserted before Phase 2, avoidance learning was actually retarded.

The animals would passively accept the shock, stand or sit on the electrified grid, and show considerable signs of distress. In fact, the dogs looked somewhat depressed.
They made few inadvertent escape or avoidance responses, so they rarely received any reinforcement.

In a subsequent experiment, Seligman & Maier (1967) used a yoked-control design to insure that animals in the two conditions received exactly the same amount of shock. In each pair, one dog could escape shock while the other received the same amount of shock as the first dog, no matter what he did. In a subsequent avoidance experiment, the "escape" animals responded like controls who had received no pretreatment of any kind, while the "yoked" animals showed considerable evidence of learned helplessness.

Proper avoidance responding can be established in dogs who have been pretreated with inescapable shock, but only by forcibly dragging the dogs from one side of the shuttlebox to the other.

Why does this happen? Seligman and his associates reasoned that learned helplessness reflects the acquisition of negative expectations of control. In classical fear conditioning, the shock is both inescapable and unavoidable. Tone is followed by shock, and there is nothing the animal can do about it, because in classical conditioning reinforcement is not contingent on the subject's behavior. It is only contingent on the CS. Accordingly, the animal in such a situation acquires a negative expectation that nothing can be done about the shock. This negative expectation, in turn, generalizes to the avoidance learning situation.

Learned helplessness is significant because it may underlie certain forms of clinical depression. But it also has great theoretical significance, because it shows that instrumental behavior is determined by the organism's expectancies, not by environmental events.

Helplessness at the World Trade Center

In the aftermath of the terrorist attacks of September 11, 2001, emergency-service workers at the World Trade Center employed "search and rescue" dogs to locate victims, living and dead, who might have been buried under the rubble. These animals were trained through instrumental conditioning procedures to sniff out human bodies: basically, when they found a body they received a reward (a similar training procedure is used for the "drug-sniffing" dogs employed by the police). At the WTC, however, there were very few such bodies to be found -- not because there weren't any victims, of course, but because the victims' bodies had been pulverized into dust by the collapse of the building. As a result, the search-and-rescue dogs became obviously depressed -- because they were not able to do the job they were trained to do. In the language of learned helplessness, the animals were not able to engage in behaviors that controlled reward. In order to maintain the animals' motivation for the job, emergency-service workers would sometimes lie down in the rubble -- just to give the dogs somebody to find -- or, in the language of learned helplessness, to maintain their sense of control.

The bottom line is that conditioning is the wrong metaphor for learning. A better metaphor might be computing. The learning organism is trying to figure things out, and it does this by, in some sense, computing conditional probabilities.

In classical conditioning, the organism is learning to predict its world, by computing which events (CSs) predict other events (USs).
In instrumental conditioning, the organism is learning to control its world, by computing which actions (CRs) lead to desirable, and undesirable, changes in the environment.

Prediction and control. Conditional probabilities. Signals. Information. That's what "figuring it all out" is all about.

Experimental Neurosis

Predictability and controllability are central to conditioning, but they also have clinical implications. I referred earlier to a body of research, initiated in Pavlov's laboratory on experimental neurosis. Inspired by Seligman's learned helplessness model of depression, which focused on uncontrollable aversive events, Mineka and Kihlstrom (1978) proposed that experimental neurosis was caused by exposure to unpredictable aversive events.

Consider, first, the experimental paradigm developed by Shenger-Kristovnikova, as reported by Pavlov (1927). This was an experiment on discrimination learning, in which a circle was the CS+ and an ellipse the CS-. The dog acquired the discrimination rapidly, but as the circle was made more elliptical, and the ellipse more circular, the discrimination broke down and the dog now became very upset.

Thomas and Dewald (1977) repeated S-K's procedure with cats, and attributed experimental neurosis to uncontrollability of the US. But in classical conditioning the US is always uncontrollable. The difficult discrimination in the S-K experiment also meant that the US was unpredictable as well.

In an even earlier experiment, by Erofeveva (also cited by Pavlov, 1927), an electric shock served as the CS for food. The dogs again acquired the CS, but as the CS was "generalized" to other parts of the body, conditioning again broke down and the dogs showed considerable signs of behavior disorder. Apparently, the dog was no longer able to predict where the painful CS would be applied.
Petrova (also cited in Pavlov, 1927) performed salivary conditioning with a 5-second interval between CS and US, and gradually increased the ISI to 180 seconds, at which point "the animal became quite crazy". In general, animals prefer short to long ISI, and apparently, 3 minutes exceeded the limits of the dog's ability to use the CS as a signal.
In a fourth instance, also involving discrimination learning, a CS+ was presented immediately following a CS-.
W. Horsely Gantt, an American who had worked with Pavlov, and translated Pavlov's lectures into English, also induced experimental neurosis by forcing his dogs to make extremely difficult discriminations.
Liddell, another American Pavlovian, employed difficult discriminations in what we would now call fear conditioning, which -- as in Seligman's learned helplessness experiments -- added uncontrollability of the shock UR to unpredictability.
Masserman (1943) punished cats after making instrumental appetitive responses (opening a food container). They too developed experimental neurosis, but when they were given control over the punishment, thus predictability as well, their behavior returned to normal.
Wolpe (1958), employing variants on Masserman's procedures, subjected cats to foot-shocks during feeding.

Whereas a history of uncontrollable aversive events can lead to depression, as Seligman argued, Mineka and Kihlstrom suggested that unpredictable aversive events are a source of anxiety.

The Role of Reinforcement

A similar point can be made with respect to the role of reinforcement in learning. The conventional view, expressed in Thorndike's Law of Effect, which says that nothing is learned in the absence of reinforcement. In classical conditioning, the CS must be followed by a reinforcing US. In instrumental conditioning, the CR must be followed by reward or punishment. However, a number of experiments now make clear that reinforcement is not necessary for learning to occur.

More Vicissitudes of Classical Conditioning

Consider, for example, two phenomena of classical conditioning discussed earlier.

In sensory preconditioning, the CS1 elicits a CR even though it has never been paired with the US.
Similarly, in higher-order conditioning, the CS2 elicits a CR even though it has never been paired with the US.

In both instances, the animal has learned to respond to a stimulus even though its response to that stimulus has never been reinforced. However, we can explain these phenomena by an extension of the principle of association by contingency, which states that animals in conditioning experiments learn the predictive relationships among events in their environment.

In sensory preconditioning, the animal is learning not just that CS2 predicts the US, but that the CS1 predicts CS2. Therefore, by transitivity, the CS1 predicts the US as well.

In higher-order conditioning, the animal learns not just that the CS1 predicts the US, but that the CS2 predicts CS1. Therefore, again by transitivity, the CS2 predicts the US as well.

Latent Learning

A similar point is made with respect to instrumental learning by classic studies on latent learning performed by Edward C. Tolman of the University of California, Berkeley (after whom the Education/Psychology Building at Berkeley is named). Tolman's experiments involved a maze-learning procedure, in which hungry rats were placed in the start box of a maze, and food placed in the goal box. Over trials, the rats would learn, through trial and error, the route through the maze. In theory, these responses -- turn left here, turn right there, go straight, whatever -- were reinforced by the delivery of food in the goal box. Intuitively, this makes sense, but Tolman asked whether the reinforcement was really necessary for learning to occur.

The experiment, by Tolman and Honzik, involved three groups of rats:

Group 1 was rewarded on every trial with food in the goal box. As expected, they showed a gradual reduction in errors.
Group 2 received no reward on any trial. They showed no reduction in errors, taking a relatively long time to make their way from the start box to the goal box on each trial.
For Group 3, reward was introduced on Trial 11, after 10 trials with no reward. These animals behaved, for the first 10 trials, like their counterparts in Group 2. However, on Trial 11, they showed an immediate reduction in errors, and subsequently behaved similarly to Group 1.

Tolman concluded that the animals in this group learned how to get from the start box to the goal box on the first 10 trials, but just needed a reason to do it. This reason was provided on Trial 11 and subsequent trials. In other words, Tolman's animals learned the maze without any reinforcement. Over 10 trials of exploration, they developed a "mental map" of their environment, which was subsequently available for use for a variety of purposes. However, they didn't perform a goal-directed response until the introduction of reinforcement established a goal.

Put another way,

Reinforcement controls performance rather than learning.

Curiosity and Intrinsic Motivation

A similar point was made in research on rhesus monkeys published in the early 1950s by Harry Harlow of the University of Wisconsin (later to become famous for his studies of "monkey love" and "motherless monkeys". In one set of studies, Harlow presented his monkeys with a wooden "puzzle lock" consisting of a series of latches which, when moved in the right order, would open a door. Some animals were rewarded with food (rhesus monkeys love FrootLoops) for making correct moves; others received no reward at all. Harlow observed no difference in the monkeys' problem-solving behavior. In fact, if they were hungry, hunger appeared to interfere with solving the puzzle. If they were not hungry, but were "rewarded" with food anyway, they usually stored the food for later consumption. Harlow concluded that the monkeys were simply curious about the puzzle. In his view, curiosity is an aspect of intrinsic motivation, or the desire to perform an activity without the promise or prospect of reward. This is not to say that animals are not also motivated by extrinsic considerations such as hunger and thirst, only that these are not the only rewards. Considering only extrinsic motivation such as hunger, Harlow's monkeys learned whether they were rewarded or not.

Statistical Learning

The point of all of this is that organisms are built to learn from experience, and they do this naturally, in the ordinary course of everyday living, without requiring reinforcement, by computing the contingent probabilities among the objects and events that they observe in their environments. This learning mechanism is sometimes known as statistical learning, because the organism samples the environment and then makes probabilistic inferences about what is going on in it -- what are technically known as the transitional probabilities from one thing to another (Aslin & Newport, 2012).

Here's an example of statistical learning in the domain of language. As we'll see later in the lectures on Language and Communication, an early phase in language learning occurs when an infant learns to recognize the particular phonemes -- basic sound units -- and combinations of phonemes that occur in his or her native language. Saffran, Aslin, and Newport (1996) presented eight-month-old human infants with a steady stream of speech-like sounds consisting of four randomly ordered three-syllable nonsense words, such as:

pa bi ku go la tu da ro pi ti bu do go la tu ti bu do da ro pi pa bi ku pa bi ku da ro pi go la tu ti bu do.

Note that, in such a string, the transitional probabilities of syllables within words (e.g., pabi within the word pabiku is a perfect 1.0, while the transitional probability of syllables across words (e.g., tuda between the words golatu and daropi is only 0.33). They then tested the infants' recognition of individual worlds by presenting them with "legal" real words, like pa bi ku, and non-legal "part-words" like tu da ro.

How do you test word-recognition in infants? One way is to give them an artificial nipple to suck on, and measure the rate at which they do so: when they're surprised, they stop sucking for a moment. In this experiment, the infants were placed in front of a blinking light, and changes in their looking behavior were used as an index of surprise.

Anyway, the upshot of the experiment was that, after only two minutes of exposure, the infants were able to discriminate between legal and non-legal words. Learning occurred, a very sophisticated learning at that, just by listening to the audio stream, without any reinforcement at all.

Other experiments have shown similar learning effects with sequences of musical tones as well as syllables; and in the visual domain, as infants learned the spatial arrangements of shapes in scenes.

And it's been shown that statistical learning extends to neonates as well as to infants.

Moreover, infants can generalize from the stimulus materials to which they've been exposed to novel stimulus materials. For example, infants who have been exposed to one set of pseudowords in a pattern such as dadapi or pabibi also recognized novel pseudowords arranged in the same AAB or ABB pattern, such as kikino or golala. In other words, they acquired something like a concept or a rule that went beyond the specific instances to which they had been exposed to cover novel elements or combinations of elements.

In statistical learning, infants are doing exactly what Pavlov's dogs and Thorndike's cats and Rescorla and Kamin's rats were doing: learning the structure of the world, acquiring expectations about what goes with what and what is going to happen next, simply through observation.

The Bottom Line on Reinforcement

Learning occurs naturally in most behaving organisms. Some species are so well adapted to their environmental niches, and their environmental niches are so stable, that they have little need (or opportunity) to learn much more than where they are likely to find food. For other species, a capacity for altering behavior through learning is itself an important adaptation. Through the experience of various contingencies, organisms acquire information about events in their environment, and about the outcomes of behavior. Reinforcement merely motivates the organism to act on what it learns, in order to achieve certain outcomes, and avoid others.

Reinforcement plays a particularly limited role in language learning. Babies do not learn their native language through trial and error, mediated by reinforcement. Rather, they simply pick up language by being exposed to it. Human babies seem to be innately programmed to learn natural language, merely through exposure to a linguistic community.

Expertise

Reinforcement may not be necessary for learning, but practice is. Hardly anything is learned in a single trial, and that is especially true for complex motor and cognitive skills like learning to play a musical instrument or reading music. In a famous paper, Anders Ericsson and his colleagues (1993), interviewed musicians and determined that, by age 20, the best violinists had engaged in deliberate practice for a cumulative amount of more than 10,000 hours, compared to 7,800 hours for merely "good" violinists, and 4.600 hours for the least-accomplished group. Assuming that they began playing the violin at 5 years of age, that comes to more than 666 hours per year, or about an hour per day, every day, week in an week out. Findings such as these led Eriksson (2007) to conclude that "extended and intense practice" was the feature that most distinguished elite performers from "normal adults". Ericsson's research, in turn, formed the basis of the 10,000 Hour Rule" popularized by Malcolm Gladwell in his book, Outliers (2008). That is, it appears to take about 10,000 hours to become an expert at something. And indeed, when you examine the histories of elite performers, 10,000 hours seems about right -- the equivalent of about 250 40-hour workweeks.

Of course, talent matters, too. A twin study by Mosing et al. found that individual differences in musical ability -- defined as the ability to make subtle discriminations of pitch, rhythm, and melody -- had a substantial genetic component, accounting for about 50% of population variance (for more on how such calculations are made, see the lectures on Psychological Development). Most of the remaining variance was accounted for by the nonshared environment. Somewhat surprisingly, Mosing et al. reported that music practice had no effect on musical ability. That is to say, there was no difference in test performance between monozygotic twins who differed in the amount of musical practice (e.g., between two twins, one of whom became an orchestra musician, and the other of whom became a brain surgeon). Interestingly, Mosing et al also found a substantial genetic contribution to the amount of practice that their subjects engaged in, explaining about 69% of population variance.

Still further doubt on the 10,000 Hour Rule was cast by a meta-analysis of studies of expertise by a meta-analysis of expertise studies by Macnamara et al. (2014). These investigators surveyed a large number of studies of the effects of practice on skilled performance, covering games, music, sorts, education, and professional activities. Across 88 studies involving more than 11,000 subjects, they found that the average correlation between deliberate practice and performance was .35, explaining about 12% of total variance. This outcome, they claim, is inconsistent with Ericsson's claim that individual differences in performance are mostly explained by individual differences in practice.

It has to be said that the claim that practice has no effect on expertise, and that all the action is in the genes -- which is what Mosing et al. expressly state in the title of their paper -- is implausible on the face of it.

In the first place, Mosing et al. assessed expertise in musical perception, not in their subjects ability to sing or play an instrument. It is one thing to have an innate "ear" for pitch, melody, and rhythm. It is another thing entirely to have innate "fingers" for the violin or clarinet.
And, for that matter, Macnamara et al. didn't take account of either talent or expertise. Nobody ever said that, just by virtue of practice alone, one could be concertmaster of the San Francisco Symphony. And, for that matter, one can still be a pretty good violinist, but still not perform at that level.
One can say, without fear of contradiction, that the ability to play the violin or the clarinet is not innate. The oldest known violin dates to 1555, and even if you take into account its ancestors, such as the Middle Eastern rebec or even the ancient Greek lyre, that's simply not enough time for a "violin" gene to have evolved. The clarinet is of even more recent vintage, even if you count its heritage in the recorder or the ancient flute.
Anyone who's ever played a musical instrument, or sung seriously (i.e., outside the shower, church, or karaoke bar) knows that it takes practice. Maybe that practice builds on some innate digital agility, or something like that. But at the very least, you've got to learn the fingerings, and the fine points of technique. And that takes practice -- probably about 10,000 hours worth, if you want to become good.
None of this means that practice is all there is, and that anyone can become an expert at anything, so long as he's willing to put in the time. But again, if you're willing to put in the time, there's no reason why you couldn't become pretty damn good.

Observational Learning

Usually, we think of learning as entailing the direct experience of environmental events, organismal responses, and their outcomes. In classical conditioning, Pavlov's dog gets the food after hearing the bell. In instrumental conditioning Thorndike's cats get freedom after pressing the latch. But can animals learn from the experience of other animals? This is the question of vicarious or observational learning.

Observational Fear Conditioning

The phenomenon of observational learning was first demonstrated convincingly in the laboratory by Susan Mineka, who was then at the University of Wisconsin (she is now at Northwestern University), in a study of snake fear in rhesus monkeys. Rhesus monkeys born and raised in the wild are universally afraid of snakes. This is quite adaptive: after all, the monkeys live in an environment where there are lots of deadly snakes, vipers as well as constrictors. Therefore, traditional theory has held that the fear of snakes in rhesus monkeys is innate, programmed by evolution in much the same way that instincts are. The only problem with the theory is that monkeys who are born and raised in laboratory conditions do not fear snakes. When exposed to a snake, they show no signs of fear. Therefore, it seems that snake-fear must be acquired through experience. But, if you think about it, it's not entirely clear how you learn from experience to fear a deadly snake. Because after the first encounter, you're dead (snakes are like that). Therefore, Mineka proposed that monkeys acquire their fear of snakes vicariously, from observing the reactions of other monkeys when they encounter snakes. Thus, snake fear is not innate, but a learned part of what might be thought of as "monkey-culture".

Mineka conducted an ingenious series of experiments to investigate the social learning of snake fear in rhesus monkeys. For her test of fear, she employed a piece of equipment known as the Wisconsin General Test Apparatus (WGTA), in which the monkey is seated in a restraining chair, something like a baby's high chair, while being presented with various stimuli and making responses. Mineka offered the monkeys a highly desirable food treat (Fruit Loops are dandy for this purpose), but in order to obtain the treat it had to reach past a snake or some other object. Response latency, or the time it took the animal to reach past the object, was the measure of fear: the longer the latency, the more fear.

Mineka's initial study compared monkeys reared in the wild and in the lab in their response to various test stimuli such as real, toy, and model snakes (the real snake was a small boa constrictor), black and yellow cords, and a painted wood block. As expected, the wild-reared monkeys were more afraid of the snakes than were the lab-reared monkeys.

For her first vicarious conditioning study, Mineka paired a (snake-phobic) wild-reared adult with a (non-snake-phobic) lab-reared adolescent (in her first study, the adult was actually the parent of the adolescent).

She pretested the adolescent in another apparatus, known as the Sackett Circus (after Gene Sackett, the researcher who invented it), which is a chamber with four compartments. Three of these compartments contained a real, toy, or model snake. The fourth compartment contained a wood block. The wild-reared adults avoided the compartments with snakes, but the adolescents were indifferent to them.

Then the adolescent was allowed to observe, for the first time, the reaction of the adult to a snake presented in the WGTA. After exposure to the fearful adult, the adolescents now behaved very differently. Now they strongly avoided the snake compartments.

In other words, the adolescents learned to fear snakes -- not from having unpleasant experiences with snakes themselves, but merely from watching an adult react negatively to them. They learn, from observing other monkeys behave fearfully, that snakes are things to be feared. One is reminded of "You've Got to be Carefully Taught", from the Rogers and Hammerstein musical South Pacific (1949). During World War II, Lt. Joe Cable has come to the base to conduct an espionage mission against the Japanese forces on a neighboring island. He falls in love with Liat, the daughter of Bloody Mary, but despairs of gaining acceptance for their biracial love back in the United States:

You've got to be taught to hate and fear
You've got to be taught from year to year
It's got to be drummed in your dear little ear
You've got to be carefully taught

You've got to be taught to be afraid
Of people whose eyes are oddly made
And people whose skin is a different shade
You've got to be carefully taught

You've got to be taught before it's too late
Before you are six or seven or eight
To hate all the people your relatives hate
You've got to be carefully taught
You've got to be carefully taught

Link to a recording of William Tabbert singing this song, from the original Broadway cast.

Mineka performed a number of variants on this basic experiment, with increasingly sophisticated methods, to explore the parameters of observational conditioning.

She discovered that she could also obtain vicarious learning when an unrelated adult served as the model for the adolescent.
And she discovered that prior benign experience with snakes could immunize adolescents against the effects of later vicarious exposure.

In her most fascinating experiment, Mineka discovered that, despite the central role of vicarious experience, observational learning was also constrained by preparedness. In this study, she modified her apparatus, employing mirrors and video so that she could independently vary what the model and the target see. For example, an adult model might see a snake, and react fearfully, while the adolescent sees a flower rather than the snake. From the adolescent's point of view, then the adult is reacting fearfully to the flower, not the snake. Will an adolescent who sees such a thing subsequently show fear of flowers?

The answer is no: Vicarious fear conditioning occurs only to snakes and snakelike objects. It does not occur to the flower.

Snake fear in rhesus monkeys is not innate, but it does appear to be highly prepared, so that it can be acquired with little vicarious experience.
Flower fear in rhesus monkeys, if indeed there is any such thing, is unprepared, or perhaps even contraprepared -- acquired with difficulty, if at all.

Vicarious or observational learning is fascinating, but it is also theoretically important, because it is another instance of learning in the absence of reinforcement. That is, the animal learns, even though reinforcement is provided to the other animal.

Language Acquisition

In humans, perhaps the most powerful and dramatic example of observational learning occurs in the domain of language. By the time they are 4 or 5 years of age, every normal human child has become a fluent speaker of his or her native language -- that is, whatever language the child's parents and others speak in his or her presence.

Beginning at birth, and perhaps even in the womb, the infant learns to detect the particular sounds of his or her native language, and how they are combined to form words.
Before an infant can walk, he or she will be able to recognize many words. Toddlers, aged 12-24 months, learn about three new words per day; preschoolers about 5-8 new words a day; older children and adolescents about 10-15 words a day. By the time children are 5 years old they will have a vocabulary of about 10,000 words; this grows to about 70,000 words by adulthood.
In late toddlerhood, children begin to string words together to form sentences. And the sentences get longer and more complex. Again, by the time the child is 4 or 5 years old, they are virtually complete masters of the grammar of their native language. But even before they can speak long, complex sentences, they can understand them when they are spoken to by others.
The learning of words and their meanings (which we call the semantics of language), and of the grammatical rules that string words together (which we call the syntax of language) actually feed off each other, so that children use their knowledge of words to infer grammatical rules, and use their knowledge of syntax to learn what new words mean.

The acquisition of language occurs effortlessly, and it occurs without reinforcement, before they are ever formally taught the rules of grammar in elementary school (which used to be called "grammar school", after all), and get graded for learning them. It all happens by the child hearing spoken language, and connecting what is said to what is going on in the world around them. In this sense, language acquisition is a lot like Tolman's latent learning.

By contrast, even our closest primate relatives, chimpanzees, have no ability to learn language. They may learn some "words" in the form of symbols, spoken or visual, that represent things like bananas. But even after years of effortful training, they have essentially no ability to use syntactical rules to form and understand meaningful sentences. When it comes to language, the "smartest" chimpanzee can't hold a candle to the dullest human 5-year-old.

In fact human language learning is so effortless and automatic that many linguists speculate that there is an innate capacity for language -- a "language acquisition device" that is a product of evolution, and which is a unique feature of human nature. Knowledge of English or Chinese or Swahili or Farsi isn't innate, but the mechanism that allows children to learn these languages does appear to be.

Put another way, language acquisition is highly prepared in humans. Just like rhesus monkeys are highly prepared to learn to fear snakes, so human beings are prepared to learn language. In chimpanzees, the best we can say is that language learning is unprepared, and it may even be contraprepared -- which is why chimpanzees can't learn syntax no matter how much training they receive.

Social interaction is critical to language acquisition: without models. Not only does the child require exposure to spoken language (and thus to the people who speak it), but the child needs to be exposed to what others are doing, and looking at, when they speak. You can't just play a CD of spoken English under the child's crib and expect it to learn semantics and syntax (though it will learn the basic sound patterns). The child has to interact with other people. And these people don't even have to speak. Deaf children whose parents and teachers use sign language, will effortlessly pick up the semantics and syntax of sign language, just like hearing children pick up whatever language their parents speak.

And this interaction has to occur within a particular interval of time -- roughly, before the onset of puberty. "Wild" children, who are raised in isolation from others until they reach adolescence, never really "get" language. Within the more normal range of human experience, children who are raised in a bilingual environment -- say, with parents who speak both English and Spanish -- will effortlessly learn both languages, and speak both without an accept. But if the learning of one language is delayed -- say, until high school or college -- it is very hard to gain facility in the second language, and the person is likely to speak it with a decided accent. So, as with imprinting, there appears to be a critical period in language learning.

The capacity to learn language appears to be innate, a gift of human evolution. And there is a critical period in language learning. But despite this innate component, language acquisition requires exposure to a linguistic environment. In this sense, it fits the true definition of learning as a change in knowledge that occurs as a result of experience. And instead of being taught deliberately, through the direct experience of rewards and punishments, is occurs vicariously -- just by virtue of observation, without any particular reinforcement.

Social Learning Theory

As language acquisition illustrates, observational learning is particularly important in humans. If you think about it, we do not learn all that much through the direct experience of trial and error, reward and punishment. Rather, most of our learning comes through interactions with others. To take a somewhat extreme example, physicians don't learn how to perform surgery by trial and error. Rather, they learn surgery by watching experienced surgeons perform, and by being taught by them. When a surgeon takes a scalpel to his or her first patient, he or she already knows what to do and how to do it.

Albert Bandura, of Stanford University, argues that human social learning takes two forms:

Learning by Example, in which we model our behavior on that of other people -- much like Mineka's rhesus monkeys.
Learning by Precept, in which we are deliberately taught by other people -- the kind of learning that goes on in school and college.

Language plays a particularly important role in learning by precept, as it provides a very flexible, efficient way of communicating our thought and knowledge to others. Humans have a far greater capacity for language than any other species, and so it is not surprising that so much of our social learning is accomplished through language.

Consciousness also plays an important role in learning by precept. To deliberately teach someone something presupposes that you are aware of it yourself. Without conscious awareness, there could be no conscious intent, and so no sponsored teaching of the sort that is critical to learning by precept.

Social Learning and Imitation

Although most studies of learning performed before 1950 employed lower animals such as rats, dogs, and pigeons for subjects, the ultimate object of inquiry was humans. The major theories of learning assumed, explicitly or implicitly, that the same principles of learning adduced to explain simple behavior in these species would also be found relevant to complex human behavior. This program of application to the human case was pursued most prodigiously by B.F. Skinner, in his analyses of personality and social behavior (1953) and language (1957). According to Skinner, human behavior is performed under the conditions of stimulus control. Rather than focusing on internal dispositions such as traits and motives, or cognitive constructs such as expectation, a proper analysis of personality will focus on the individual's reinforcement history, as well as on discriminative stimuli and reinforcement contingencies present in the current environment. Human behavior is complex only insofar as the stimulus conditions in which it occurs are complex.

Other investigators also took up the Skinnerian program. For example, Staats and Staats (1963) attempted to apply the principles of learning to problems in personality, motivation, and social interaction, among other topics. Their work is not exactly Skinnerian in nature, because it attempts to come to grips with certain aspects of language that are outside the scope of Skinner's analysis. Nevertheless, the list of psychologists whom they cite as the inspiration for their efforts begins with Skinner, and includes most of major figures identified with the behaviorist analysis of learning. Staats' most recent statement of his theory, in fact, is entitled Social Behaviorism (1975).

At the same time, it became clear that certain aspects of complex human behavior resisted conventional behavioral analysis. As one example, already discussed, language does not seem to be acquired through the principles of conditioning and reinforcement that are central to behaviorist analyses. The same is true of many human social behaviors. The problem of accounting for learning without direct experience of reinforcement ultimately lead to the development of a different cognitive theory of personality: cognitive social learning theory.

A step in this new direction was taken with the social learning theory of Miller and Dollard (1941). According to Miller and Dollard, personality consists of habits formed through learning. The learning process, in turn, is described in terms of a version of S-R learning theory proposed by Clark L. Hull. According to Hull, a habit represents a strong connection between some stimulus and some response. This association is acquired by virtue of drive-reduction: in the presence of the stimulus, the behavior has led to the satisfaction of some drive (you can see the connection to Thorndike's Law of Effect).

Although Hull conceived of these drives as biological in nature, Miller (1951) later added concept of acquired (or secondary) drive. That is, through conditioning some external stimuli come to possess some of the properties of an internal drive state. For example, while fear is an innate drive, elicited by noxious stimulation, it can also be conditioned to previously neutral stimuli. Habits can be learned because they lead to fear reduction (a primary drive), and also because they eliminate fear stimuli (secondary drives). Drive-reduction theory thus provides the basic elements of personality viewed as a system of habits, in the form of principles of learning. A drive is any need which activates behavior. It can be innate, or it can be acquired through experience. However, drive itself does not give any particular direction to behavior. This directionality is given by the operation of other principles. Hull's theory, like Freud's, assumes that people are motivated to maintain homeostasis, eliminating states of tension. Drive-reduction serves to reward behavior. Responses are behaviors that lead to rewards. Finally, cues are stimuli that determine the selection of responses. Thus, personality can be viewed as a system of habits acquired and maintained through drive-reduction. Individual differences in habitual responses to environmental stimulation comprise the whole of personality.

Miller and Dollard argued that in order to understand human personality, it was necessary to understand the principles of learning. However, because the habits that comprise personality are social behaviors, it is also important to understand the social circumstances in which that learning takes place. Thus, Miller and Dollard called their approach social learning theory. In this regard, it is interesting to note that the theory represents the collaboration between Miller, a psychologist, and Dollard, a sociologist. Thus, personality becomes an interstitial field, combining different levels of analysis.

Like Skinner's stricter behavioral approach, social learning theory as stated would seem to imply that the person must have direct experience with reinforcement in order to establish habits. As noted, this is unlikely to be the case. In order to cope with this problem, Miller and Dollard postulated a drive of imitation. Imitation is a process by which similar actions are performed by two individuals in response to appropriate cues. At the start, imitation is a behavior which can be reinforced by the environment, just as other behaviors are. When rewarded regularly, however, it takes on the properties of an acquired drive. Thereafter, the individual is motivated to imitate the behavior of others -- to copy their behavior in order to obtain the same rewards that they receive from their actions. Imitation is widespread because the culture reinforces it strongly, as a means of maintaining social conformity and discipline. For this reason, although imitation is an acquired drive (and therefore optional in principle), it is almost a necessary consequence of socialization.

Miller and Dollard discussed two principal forms of imitation. In both forms, one person matches another's behavior.

In matched-dependent behavior, however, only the model recognizes the cues that elicit the behavior. A good example is crowd behavior, where people engage in certain actions (like applause or yelling) simply because other people are doing so, without knowing why.
Copying is a much more deliberate act, in which one person consciously conforms his or her behavior to that of another person. This entails awareness of the cues that elicit the behavior of the model. Imitative behavior is central to social learning, and thus to personality. It is readily observed in even the youngest children, and indeed whenever one person possess more authority or knowledge than another. Imitation, especially matched-dependent behavior, is the chief means by which patterns of behavior are passed from one individual to another.

Social Learning and Expectations

Although some social-learning theorists continued to embrace the tradition of functional behaviorism into the 1960s and 1970s the break from the behaviorist view of social learning was apparent in the Rotter's Social Learning and Clinical Psychology, which appeared in 1954 (see also Rotter, 1955, 1960; Rotter, Chance, & Phares, 1972). Where Staats and Staats (1963), writing almost a decade later, were still acknowledging the primary influence of Skinner and other functional behaviorists, Rotter (1954) acknowledged the influence of no behaviorists at all. Rather, he aligned himself with the dynamic psychologist Adler and the gestalt psychologists Kantor and Lewin (see also Rotter, Chance, & Phares, 1972, p. 1). From the beginning, Rotter intended his theory as a fusion of the drive-reduction, reinforcement learning theories of Thorndike and Hull with the cognitive learning theories of Tolman and Lewin. Although Rotter's version of social learning theory often uses behaviorist vocabulary, it is with a clear cognitive twist.

In the first place, Rotter is less interested in behavior than in choice, an internal mental state which obviously manifests itself in behavior. Rotter's cognitive-social learning theory employs three basic concepts:

Behavior potential is the probability of a particular behavior occurring in some situation, given the available reinforcement contingencies.
Expectancy is the person's subjective probability that a particular reinforcement would occur as a function of his or her engaging in some specific behavior in some specific situation.
Reinforcement value refers to the degree to which the individual would prefer some outcome above all others, provided that the probabilities of the outcomes were equivalent. These three terms are combined to yield the basic predictive formula (1954, p. 108):

Rotter's intellectual debt to the behaviorists is clear. Instead of predicting behavior in general, behavior is predicted only under certain conditions. When these conditions change, the behavior may likely change as well. Moreover, the behaviorist construct of reinforcement is central to his theory. However, Rotter's departure from the behaviorists is equally clear: whereas behaviorists such as Skinner hoped to dispense with mental constructs entirely, Rotter places them at the center of his theory. Although the behaviorists defined reinforcements objectively in terms of their effects on behavior (Thorndike's empirical law of effect), Rotter defines them subjectively: the value attached to any potentially reinforcing event is subjective, and one person's meat can be another person's poison. Moreover, whereas behaviorists defined reinforcement contingencies objectively, in terms of the contingent probability of the event given a particular response, Rotter clearly defines them subjectively, in terms of the individual's cognitive expectations. Finally, Rotter defined the situation in psychological terms, as it is experienced by the individual, and as the individual ascribes meaning to it.

Cognitive Social Learning Theory

Rotter labeled his approach a social learning theory, and employed some of the concepts and principles of reinforcement theory in it. Nevertheless, his approach is less a theory of learning than it is a theory of choice. That is to say, Rotter is primarily concerned with how expectancies and values govern the choices we make among available behaviors. However, the theory has relatively little to say about how those expectancies, values, and behavioral options are acquired -- except to say that they are acquired through learning. It remained for another social learning theorist, Albert Bandura (Bandura, 1971, 1977, 1985; Bandura & Walters, 1963) to add to the concept of expectancies an explicit theory of the social learning process. Like Miller and Dollard, Bandura stressed the role of imitation in social learning. However, his concept of imitation departs radically from theirs in that it no longer functions as a secondary drive. By emphasizing cognitive processes over reinforcement, observation over direct experience, and self-regulation over environmental control, Bandura took a giant step away from the behaviorist tradition and offered the first fully cognitive theory of social learning.

Bandura's behaviorist roots are seen most clearly in his earliest statement of social learning theory, Social Learning and Personality Development (Bandura & Walters, 1963). On the surface, this book seems to draw heavily on Skinnerian analyses of instrumental conditioning. For example, there is a great deal of attention paid to the role of reinforcement schedules in the maintenance of behavior. Bandura and Walters argued that most social systems operated on some combination of fixed- and variable-interval schedules of reinforcement. For example, Bandura and Walters argued that most social reinforcements are delivered on an intermittent schedule. For example, family routines such as dining, parent-child interactions, shopping trips, and the like occur in a relatively unchanging cycle. Insofar as these activities can take on reinforcing properties, then, they are delivered on a fixed-interval schedule: the child cleans his plate at dinnertime during the week, and then gets to sit on his mother's lap during the family television hour on Saturday night. Other social reinforcements, however, seem to be delivered on a variable-interval. When a child seeks her mother's attention, she may get immediately, or at some time in the future when her mother doesn't have her hands full. Still other situations seem to involve the differential reinforcement of high or low rates of behavior. If a father pays attention to his child only when she kicks and screams, he is virtually guaranteeing that she will misbehave when she wants attention.

For a number of reasons, Bandura and Walters argued, most social reinforcements are dispensed on complex schedules combining variable ratios and variable intervals. In some respects, this complexity reflects the unreliability of social reinforcement. Often, the reinforcing agent is simply not present when the target behaviors occur -- in such a case, reinforcement must be deferred to a later time. And because humans are not automated machines, they will sometimes simply fail to deliver reinforcements that are due. Perhaps more important, the complexity of social reinforcement schedules reflects the complexity of social demands. It is rarely enough simply to perform a certain social behavior: it must be done in a particular way. A child asked to set the dinner table will not be rewarded simply for piling dishes and utensils; the forks have to be on the left side of the plate, and the blade of the knife turned inward. As Bandura and Walters note, effective social learning entails both adequate generalization and fine discriminations.

Social learning is also complex because of the wide variety of factors that affect the effectiveness of social reinforcements. For example, Bandura and Walters noted that children with strong dependency habits (note the phrase) are more susceptible to social reinforcement. Moreover, the prestige of the reinforcing agent is important, as is the match between the person and the agent on such attributes as gender. The person's internal states of deprivation, satiation, and emotional arousal are also important. The point is that social reinforcement is complex but not chaotic or haphazard. Social behavior is maintained by virtue of schedules of reinforcement, even if the precise nature of that schedule is sometimes hard to discern.

Although Miller's theory gained impressive support from analyses of animal behavior, Bandura and Walters were critical of its application to the case of human social behavior. For example, they argued that deliberate social learning also played a role in displacement. Thus, parents often direct their children's aggressive behaviors towards some targets rather than others, and displacement itself is maintained by contingencies of reinforcement. Clear examples of this may be found in scapegoating and other examples of prejudice towards minorities and other outgroups. By and large, these sorts of aggressive behaviors are not simply selected by the vicissitudes of the generalization gradient. Rather, children get their prejudices from their parents: as Rogers and Hammerstein wrote in South Pacific, "You've got to be carefully taught" whom to hate and fear.

While agreeing on the importance of reinforcement in the control of behavior, Bandura and Walters differed most from their behaviorist predecessors over the manner in which behavior was acquired in the first place. Taken at their word, Skinner and other functional behaviorists actually appear to deny that new behaviors are learned at all. Rather, responses already in the organism's repertoire come to be elicited by certain environmental cues by virtue of the law of effect. What are acquired are new patterns of behavior, by virtue of shaping and successive approximations. That is, a piece of behavior is synthesized from more elementary behaviors already in the organism's repertoire. Bandura and Walters, while agreeing that shaping procedures can be effective, doubted that they were responsible for the acquisition of most complex human social behaviors. Like Miller and Dollard, Bandura argues that social learning is largely mediated by imitation.

On the basis of anthropological studies as well as informal observation, Bandura and Walters argued that socialization -- the acquisition of socially sanctioned beliefs, values, and patterns of behavior -- was largely mediated by imitative learning. In some cultures, for example, young boys and girls are provided with miniature replicas of the tools used by their parents, and they spend a great deal of time tagging along with their parents practicing their use -- thus preparing for their adult roles. Similarly, children in the United States (and other developed societies) are given toys that the child can use to imitate adult behavior. In this way, for example, children in all cultures acquire behaviors consistent with the occupational roles deemed appropriate by their culture for persons of their gender.

Gender-role socialization is far from the only example of learning by imitation. In some tribal cultures, children even obtain their sex education by watching adults engage in various aspects of mating behavior. Certain aspects of language acquisition, such as the meanings and pronunciation of words, are learned largely through observation and imitation of other people. In addition, certain complex motor and cognitive skills appear to be acquired in this manner. Medical residents do not learn to perform surgery through a trial-and-error process. Rather, they learn by watching skilled practitioners operate, and by reading about the procedures in textbooks. In a very real sense, a surgeon knows how to do surgery before he or she ever puts a scalpel to a patient -- that is, before there can be any direct experience of trial and error. On a more mundane level, driver education courses in high schools make sure that students have acquired basic skills in handling an automobile before they ever take to the road.

In tribal cultures, parents and older siblings are probably the models for most imitation. They are, after all, the primary agents of socialization. However, this purpose may also be served by exemplary models sanctioned by the parents: children are constantly being encouraged to emulate various national heroes and mythological figures, as well as the children next door. In technologically advanced societies, models for imitation are provided by books, television, movies, and other media as well as by real life. One of the sources of the constant controversy over children's television viewing concerns the kinds of models presented to children in cartoons and action series. A major function of written and oral language is this kind of cultural transmission. By virtue of linguistic communication, we can tell someone what to do in a particular situation -- describe the behavior, and indicate when it should be performed -- instead of letting the person discover the relations between cues, acts, and outcomes for him- or herself. For this reason, social learning by imitation is highly efficient. In a complex, highly developed society, it also seems necessary.

While agreeing with Miller and Dollard that imitation is an important source of social learning, Bandura and Walters took issue with the theory that imitation -- either as a general tendency or of a specific act -- is acquired through reinforcement. For example, developmental studies show that children imitate others before they ever are reinforced for doing so. Very young infants, up to about four months of age, engage in pseudoimitation, in which they repeat some simple act (like babbling) displayed by their caretaker. However, this imitation will not occur unless the infant him- or herself had just recently performed the same act. Somewhat older infants will engage in genuine imitation of others, in circumstances where they have not just performed the same act themselves. The extent to which behavior will occur will depend on the degree to which the child's sensorimotor operations have developed. For example, children cannot reliably stick out their tongues in imitation of adults, until they have acquired some mental representation of their facial anatomy (Piaget, 1951; but see Meltzoff & Moore, 1977). Children are not reinforced for this: it simply happens, apparently as a reflection of an innate tendency to do so.

Even imitation of specific behaviors is not learned by virtue of reinforcement. The behaviorist model of imitation involves three elements: a discriminative stimulus (S^d) that serves as a cue, the response of imitating the model (R), and the reinforcing stimulus (S^r). By virtue of the law of effect, repeated reinforcement of the imitative behavior will make that behavior more likely to occur. However, a classic experiment on aggression by Bandura (1962) shows that this is not the case. Children watched a film in which a model displayed novel aggressive behaviors (that is, behaviors not previously in the children's repertoires) towards a "Bobo the Clown" doll. In one condition, the model was punished for this behavior; in another, he or she was rewarded; in a third condition, there were no consequences to the behavior of any sort. In a later test, children who viewed the punished model showed less imitative aggression than those who viewed the rewarded model; interestingly, those who viewed the unreinforced model displayed the same amount of aggression than those who saw the model rewarded. This first test was performed under conditions of no incentive. In a second test, the children were promised a reward for imitating the model: under these circumstances, the group differences disappeared. Thus, novel aggressive behaviors were acquired by the children even though they were not reinforced for imitating the behavior. However, the performance of these behaviors was under reinforcement control: those who saw the model punished were less likely to engage in the behaviors themselves, until instructed that the reinforcement contingencies had been changed.

In a later statement, Bandura (1977) argued that there are two forms of learning. Learning by response consequences is the kind of trial-and-error acquisition of knowledge familiar from the operant behaviorism of Skinner. However, this learning is given a cognitive emphasis. Direct experience provides information concerning environmental outcomes and what must be done to gain or avoid them. As a result, the person forms mental representations of experience that permit anticipatory motivation and behavioral self-control. Modeling involves learning through vicarious experience -- by observing the effects of other's actions. While a term such as "modeling" encompasses learning through example, Bandura also uses it to cover learning through precept -- deliberate teaching and learning, often mediated by linguistic communication.

Although Bandura goes beyond Rotter in discussing the process of social learning, his analysis of performance is similar to Rotter's in many respects. That is, Bandura agrees that the person's behavior is governed primarily by his or her expectancies concerning the future. Our responses to various situations are governed by information we possess concerning forthcoming events, and the outcomes of our actions. These expectancies are formed, respectively, through processes resembling classical and instrumental conditioning -- except that conditioning is given an active, cognitive interpretation as opposed to the conventional passive interpretation in terms of the laws of practice and effect. Moreover, conditioning is not the only -- or even the most important -- way that these expectancies can develop. Rather, they can be acquired vicariously through precept and example.

Expectations before the fact are, of course, subject to revision by the information gained subsequently. The actual consequences of an environmental event, for example, or of a person's actions, serve to confirm or revise the person's expectations. These consequences can be directly experienced by the person in question, or they may be experienced vicariously through observation or symbolic mediation. Moreover, in discussing the consequent determinants of behavior, Bandura stresses the role of aggregate as opposed to momentary outcomes. In his view, people are more influenced by what happens in the long run than by minor setbacks, delays, and irregularities. In large part, this is due to the cognitive capacities of humans, whose powerful memories permit them to transcend even long intervals, and integrate information from different points in time.

A unique feature of Bandura's social-learning theory is the active role played by the self. Behaviorist doctrine, of course, eschewed any reference to the self as an active organizer of experience or agent of action. Such talk was banned as mentalistic and ultimately beyond the pale of science. Insofar as the self was discussed at all, it was as (in Skinner's terms) a system of responses. As a cognitive theorist, however, Bandura (1977) permits the self to take an active, executive role in the regulation of behavior. In this way, the self plays a role as both an antecedent and a consequent determinant of behavior.

In the cognitive view offered by Tolman and by Rotter, outcome expectancies are vitally important determinants of behavior. That is, we tend to engage in behaviors that we expect will lead to outcomes we desire, and prevent outcomes we dislike. Bandura agrees that outcome expectancies are important. However, he has also added a new concept: self-efficacy expectations (Bandura, 1977, 1978). While it is obviously important that the individual expect that a particular behavior will lead to a certain outcome, it is equally important that the person have the expectancy that he or she can reliably produce the behavior in question. Note that the actual state of affairs is irrelevant here. It does not matter whether the person can, in fact, perform some particular action. What matters is whether the person thinks he or she can. Self-efficacy expectations are conceptually similar to the sense of mastery, and have important motivational properties, in that they determine whether the person will even attempt the behavior in question.

An example of self-efficacy can be found in the literature on learned helplessness. As a rule, dogs placed in a shuttlebox will acquire escape and avoidance responses fairly readily, shuttling back and forth in response to stimuli signaling forthcoming shock. However, dogs who have first received classical fear conditioning are retarded in learning escape and avoidance. In some instances, they simply sit and take the shock passively. Learned helplessness can also be produced in humans. For example, subjects who have been exposed to unsolvable anagram problems are retarded in completing subsequent problems that are solvable. Although the learned helplessness effect is quite complex, it appears to involve the subject's belief that he or she cannot master the situation. In fact, that is objectively not the case: the shock in the shuttlebox is avoidable, and the dog has in his repertoire the necessary behavior; the second set of puzzles is soluble, and the student has the intelligence to do so. Yet, experience has taught the subject to believe otherwise (if we can speak of beliefs in lower animals), and this belief controls behavior.

Self-efficacy can serve as an example of how antecedent expectations develop through social learning. Obviously, one source of self-efficacy is performance accomplishments: the personal experience of success and failure. Repeated failure experiences will lower the person's expectancy that he or she can effectively control outcomes. But the same sorts of expectancies can be generated through vicarious experience. Observing other people's success or failure will lead to appropriate expectations about oneself -- at least to the degree that one perceives oneself to be similar to those other people. But perceived self-efficacy can also be shaped in the absence of any experiential basis whatsoever, merely through verbal persuasion. A person who is repeatedly told that he or she is incapable of accomplishing some goal, especially if that information comes from an authoritative source, may actually come to believe it about him- or herself. Perceived self-efficacy can also change on a moment-to-moment basis, depending on the person's emotional state. Feelings of elation may increase feelings of mastery (sometimes beyond all reason, as in the megalomania of a manic patient), while anxiety or depression may reduce them. Finally, self-efficacy can vary from one situation to another. Even though a person has not encountered a particular problem before, he or she may have a high degree of self-efficacy if it closely resembles some other problem that the person has been able to master in the past.

Another way in which Bandura departs radically from the behaviorist analysis of social learning is by embracing the concept of self- reinforcement. Recall that Skinner objected to self-reinforcement on the ground that it was ineffective as a means of behavioral control. However, Bandura acknowledged that people can effectively regulate their own behavior in the absence of, or in opposition to, schedules of external reinforcement. For example, a run-of-the mill jogger can reward herself by finishing in the top half of a local road race, even though she will never get a medal for her performance. Alternatively, a college professor may feel remorse about flunking a student, even though he receives praise from his dean for upholding academic standards. It is so common to find writers, painters, and composers pursuing their own vision even though the are denied any professional recognition, that the image of the starving artist has become part of our cultural mythology. By means of goal-setting and self-reinforcement, people can free themselves from environmental control. This independence of the person from environmental control distinguishes Bandura's social learning theory from its behaviorist forebears.

In principle, self-reinforcement frees people from external control. As a practical matter, however, the essential first step in self- regulation, setting the standard, tends to be based on imitation. That is, we set standards for ourselves that a similar to those set for themselves by those we admire. These models may be our parents, teachers, or spiritual leaders. However, models may also come from other sources, such as books, films, and media. One important consequence of literacy, coupled with free access to books and magazines, is that we encounter potential models whose standards may be quite different from those whom we would otherwise meet. Modeling our standards on those individuals is another way in which we free ourselves from the constraints of our local social environment.

In addition to standard-setting, Bandura postulates three other component processes in self-regulation. The person must monitor his or her own performance, and evaluate it according to the standard set for him- or herself. The dimensions on which the performance is evaluated can vary widely, as can the precise standards. Very often, the individual will measure him or herself against actual or assumed population norms; or, some single individual will serve as the standard of comparison; in other circumstances, the standard will be set by the person's own previous behavior. It is important, of course, not to set standards that cannot be met. Research in a variety of domains, from academic achievement to weight loss, indicates that people should set goals for themselves that are clearly specified, and of only moderate difficulty. Vague or unambiguous goals, of course, are not goals at all. Setting an unattainable goal obviously has motivational drawbacks, while setting a goal that is too easy to accomplish will yield little or no satisfaction in its accomplishment. (It should be noted that the same considerations apply to goals set by others, as when parents enforce standards for their children's behavior.)

Once the evaluation has been made, the person will reinforce his or her performance appropriately. These rewards come in two forms, tangible and symbolic. The student who aces an exam may reward herself with a movie or punish herself by canceling a date; or she may just praise or censure herself. The effectiveness of self-praise or self-reproach, in the absence of tangible consequences, is currently subject to considerable debate. However, research clearly shows that people -- even young children -- who fail to meet their own performance standards will deny themselves reward. Apparently, such internal states as self-esteem and self-efficacy have their own motivating properties. While behavior that is controlled only by external contingencies will be unreliable in the absence of those contingencies, our selves are always with us. Thus, in principle self- reinforcement should lead to more effective behavioral regulation, because it is less subject to situational variation.

Moreover, human intelligence and consciousness permits us to project the consequences of our actions far into the future. Traditional behavioral theories, of course, assert that present behavior is under the control of past events, and that future prospects that have no parallel in the past are very weak determinants of behavior. However, this is clearly not the case. The emergence of political movements supporting environmental protection and nuclear disarmament are clear examples of the control of behavior by the future. We have had no experience of the greenhouse effect or nuclear winter, but the prospects of them in the future led us to try to protect the ozone layer, and reduce the number of nuclear warheads, today. The behaviorist analysis of future determinants is largely correct when it is applied to lower animals, with their limited cognitive capacities. Bandura's openness to such determinants is another mark of the extent to which social learning theory has embraced cognitivism, and abandoned its behaviorist roots.

Social Learning as the Cognitive Basis of Culture

Social learning is the cognitive basis of culture, which anthropologists define as the customary beliefs, social forms, and material traits of a racial, ethnic, or social group, transmitted through informal learning and formal training from one generation to the next. This intergenerational transmission cannot be accomplished through the genes: there is no inheritance of acquired characteristics. Instead, if must be accomplished by learning -- which is to say, social learning, through example and precept. It is through social learning, both informal modeling and in formal institutions (such as schools and libraries) organized for the purpose, that we pass down its knowledge, beliefs, and attitudes from one generation to the next. In this way, each generation builds on the advances made by those who went before, and doesn't have to start "from scratch".

Which raises the question of whether nonhuman animals have "culture" as well. Observations of animals behaving in their natural environment suggests that animals do indeed learn vicariously from observing the experiences of others, and in this respect possess sets of cultural traditions that are passed from one generation to the next.

Chickadees who watch another chickadee open a milk bottle learn more quickly to open it themselves.
Red squirrels who watch another red squirrel open hickory nuts learn to do that more quickly.
Israeli roof rats (no kidding -- that's a real species!) quickly learn how to open pine cones obtain the seeds inside, if they have the opportunity to watch an older roof rat do so.
Chimpanzees living in a rain forest in Cote d'Ivoire (the former French colony of Ivory Coast) employed a hammer-and-anvil system to crack the extremely hard shells of the panda nut in order to obtain the high-calorie kernel inside (Mercader et al., Science 296:1452-1455, 2002). Using archeological methods, anthropologists discovered that this behavior had been going on for more than 100 years at a particular "anvil" site, to which the chimps brought both the nuts and rocks to be used as hammers. It takes the animals as long as seven years to learn how to crack a panda nut properly, but the important thing is that individual animals do not appear to start the learning process from scratch. Rather, the behavior is passed down, chiefly from mother to child, by a process of imitation, or vicarious learning by example. The proper nut-cracking technique has been observed only by some bands of West African chimps, suggesting that it is part of these groups' "ape culture", passed from generation to generation by social learning.

It is an open question whether individuals can learn from watching animals of other species. But these instances certainly leave open the possibility of learning vicariously, taking others as models for one's own behavior. In that sense, at least some nonhuman species have at least the rudiments of culture.

Along with consciousness and language, and culture, the capacity for learning, and especially for social learning, is one of the greatest gifts of evolution to the human species.

For More on Social Learning, Go to the Appendix: The Evolution of Cognitive Social Learning Theory

The Nature of Learning

Behaving organisms are not just machines, operating by reflex, taxis, or instinct. Rather, even organisms with very simple nervous systems are able to modify their behavior in accordance with what they have learned. Much learning can be described in terms of classical and instrumental conditioning, and combinations thereof. But not all learning is of this sort: language learning is a particularly salient example of learning merely through exposure to others, without any reinforcement.

What is learned is not a simple connection between stimulus and response. Rather, the learning organism forms a mental representation of the world and its relation to it: of objects, events, its own behavior, and the contingent relations between them.

In light of modern experiments on predictability, controllability, and social learning, we should revise our definition of learning.

Learning is not a change in behavior that occurs as a result of experience. That definition is a holdover from the radical behaviorism of Watson and Skinner, who thought that notions of mind, mental life, and the like were not scientific, and that psychology could only be a science if it became a science of behavior.
Rather, learning is the acquisition of knowledge through experience -- either the direct experience of classical and instrumental conditioning, or the vicarious experience of social learning. This knowledge is then used to guide behavior.

We cannot understand learning solely by focusing on events outside the organism, tracing connections between stimuli and responses, and treating the organism as if it were empty. Rather, we must go inside the "black box", to see how the mind is structured, and how its structures operate. We need to understand the principles by which information about the world is acquired through sensation and perception, retained through memory, transformed through thought, and communicated by language. These matters are the province of cognitive psychology.

For a comprehensive survey of the psychology of learning, see The Psychology of Learning and Behavior by B. Schwartz and S.J. Robbins (Norton, 1978), and subsequent editions. The most up-to-date of these is Learning and Memory by B. Schwartz and D. Reisberg (1991).

For a thorough discussion of behaviorism, see Behaviorism, Science, and Human Nature by B. Schwartz and H. Lacey (Norton, 1982).

For a comprehensive survey of theories of learning, see the various editions of Theories of Learning by E.R. Hilgard and G.H. Bower (1st ed. by E.R. Hilgard, published by Appleton-Century -Crofts, 1948; 5th ed. by G.H. Bower and E.R. Hilgard, published by Prentice-Hall, 1981).

From Animal Learning to Animal Memory

The fact that animals can learn means that they have a capacity to encode, store, and retrieve memories. But the sorts of memories implied by classical and instrumental conditioning represent semantic and procedural knowledge:

Bells are followed by food.
Tones are followed by shock.
If I press this lever, I'll get out of this puzzle box.
If I press this bar, I'll get a piece of rat-chow.

Mostly, however, when we think about memory we mean episodic memory, which raises the question: can nonhuman animals have episodic memory, in the sense of an ability to remember specific experiences as such? Some theorists (like Tulving, 1983) think not -- that the ability to remember specific episodes of experience is a uniquely human faculty. But we've long since learned to accept the Darwinian principle of evolutionary continuity, so it would be surprising if at least some nonhuman species, most likely primates or other animals, had the ability to remember specific episodes in their lives.

Let's first define the terms. An episodic memory is a memory for an episode -- an event with a unique location in space and time. So, at the very least, an episodic memory has to have been encoded after a single experience.

By this standard, any example of one-trial learning -- such as the one-trial step-down passive avoidance learning often used in animal models of traumatic retrograde amnesia (e.g., Miller & Marlin, 1979) might count as episodic memory. In this paradigm, a rat is perched on a platform above a floor grid which is wired to deliver an electric shock. If the animal steps down (and they always step down), it gets a foot-shock, at which point it jumps back up onto the shelf and won't step down again. It has learned the association between floor and shock in a single trial, and it passively avoids further shock by refusing to step down. (If the rat receives electroconvulsive shock immediately after jumping back up, it will step back down onto the floor as if nothing had happened, apparently amnesic for the shock experience.)

Now, it might be that the rat remembers the specific experience of getting shocked when it stepped down onto the floor -- in which case the memory might count as episodic. Alternatively, it might be the case that the animal has acquired more generic knowledge that the floor delivers foot-shock in which case we're talking about something more like semantic memory -- abstract knowledge about the world. A human analogue would be source amnesia, in which a subject remembers factual knowledge acquired during a learning session, but not the learning session itself. So, the occurrence of one-trial learning isn't enough to qualify as an animal model of episodic memory.

So, returning to our definition of episodic memory, it seems that, at a minimum, an episodic memory has to contain information about the target event, as well as information about the time and place at which it occurred. Call it a what-where-when structure (Tulving, 1972). It is this W-W-W structure that makes the verbal-learning paradigm a model of episodic memory: subjects must remember what words were on a particular list studied at a particular time and in a particular place. So, a successful animal model of episodic memory would have to demonstrate, at a minimum, that an animal remembers not just what happened, but also where and when it happened.

Such a model was introduced by Clayton & Dickinson (1998) based on cache-recovery behavior in scrub jays (note: not a primate or even mammalian species!).

The birds were allowed to cache two different foods for later consumption: wax worms, a preferred food which decays relatively quickly, and peanuts, a less-preferred food which does not.
The foods were cached in different locations.
After a short or long retention interval, the birds were allowed to retrieve the food.
After the short interval, the birds went for the worms; after the long interval, they went for the peanuts.

This sort of experiment, which has been repeated many time in various species (including rats), seems to indicate that the animals have the ability to remember what was cached, where it was cached, and when it was cached - -thus meeting the minimal requirements for an episodic memory.

But maybe episodic memory requires more than this. Remember James's definition of secondary memory:

Memory requires more than a mere dating of a fact in the past. It must be dated in my past. In other words, I must think that I directly experienced its occurrence.

This feature of "reminiscence, recollection, reproduction, or recall" is necessarily subjective, and would seem to be ruled out by the fact that we simply have know way of knowing what the subjective experience of remembering is like for subjects who can't talk to us about their introspections. Which is one reason why Clayton and others refer to "episodic-like memory".

Related to this is Tulving's notion that episodic memory represents mental time travel (MTT), or traveling back in time to relieve a prior episode. Tulving (2005) now believes that this self-referential autonoetic experience is the real hallmark of episodic memory -- and that the ability to mentally travel backward in memory is also related to our ability to project ourselves, mentally, into the future. And and he also believes that this ability -- MTT in either direction -- is uniquely human. At the same time, we've known since Tolman, and certainly since the cognitive revolution in animal learning (Rescorla, Seligman, Kamin, and the others) that animals form expectations during both classical and instrumental conditioning. And the very idea of expectations implies some ability to anticipate the future.

"Episodic Memory" in Animals

For a recent overview of this research, see:

Crystal, J.D. (2010), Episodic-like memory in animals. Behavioural Brain Research, 210, 235-243.
Roberts, W.A. (2002). are animals stuck in time? Psychological Bulletin, 128, 473-489.
Suddendorf, T., & Corballis, M.C. (2007). The evolution of foresight: What is mental time travel, and is it unique to humans? Behavioral & Brain Sciences, 30, 299-313.

This page last revised 09/16/2014.