What is positive reinforcement

What is positive reinforcement and clicker training?

Introduction to positive reinforcement

One way both humans and non-humans animals learn is through consequences. In science, this type of learning is called operant conditioning (or also known as instrumental learning). Operant conditioning focuses on using reinforcement or punishment to increase or decrease the frequency of behaviours.

In Operant conditioning there are 4 quadrants (two reinforcers and two punishers):

Positive reinforcement (+R) involves the addition of a pleasurable (appetitive) stimulus following a behaviour, making it more likely for this behaviour to occur again. For example: When the horse comes to the gate, his owner gives him a carrot. (+R) As a result, the horse always comes when he sees his owner standing by the gate.

Negative reinforcement (-R) involves the removal (or avoidance) of an aversive stimulus following a behaviour, making it more likely for this behaviour to occur again. In the equestrian world, negative reinforcement is often marketed under the euphemism “pressure/release”. The pressure is the application of the aversive stimulus, and the release is the removal of this aversive. For example: A rider kicks his horse’s flanks to get him moving. When the horse starts moving the kicking stop (-R). After a few sessions the rider barely have to move his legs and the horse start walking.

Positive punishment (+P) involves the addition of an aversive stimulus following a behaviour, making it less likely for this behaviour to occur again. For example: A horse fails to canter when cued to by his rider, as a punishment the rider hit him with a whip (+P). When the rider tells the horse to start cantering again, the horse quickly goes into canter in order to avoid getting hit again.

Negative punishment (-P) involves the removal of something desirable (appetitive) following a behaviour, making it less likely for this behaviour to occur again. For example: When I scratch my horse, my horse start biting my clothes (wanting to engage in mutual grooming). When my horse does this, I stop scratching him and step back a little. (-P) After a few repetitions, my horse quickly learns not to use his teeth when engaging in mutual grooming with me.

Both traditional and natural horsemanship primarily use negative reinforcement (to get desired behaviour) and positive punishment (to remove undesired behaviour) however a raising number of trainers are starting to incorporate the use of appetitives in their training and ditch the use of aversives.

A trainer or a rider using appetitive based training will obtain desired behaviours by adding a pleasurable stimulus (positive reinforcement) and will discourage undesired behaviour by removing a pleasurable stimulus (negative punishment). However unlike with aversive trainers, appetitive trainers tend to have a greater understanding of learning theory (due to the training method being first developed by scientists) and therefore will be cautious with their use of negative punishment and try using more ethical alternatives when applicable. For example, antecedent arrangement, redirection to an appropriate stimuli or teaching an incompatible behaviour.

More of a visual learner?

Watch our video on the topic!

Click on the video to be redirected to Youtube and get subtitles and language options.

Illustration by Fed up Fred.

Benefits and drawbacks of training with positive reinforcement

Unlike negative reinforcement that has many issues including inducing fear to the animal, damaging animal/handler relationship and causing aggression there are few drawbacks to positive reinforcement. However one important issue we see from time to time is attractive-type aggression due to the trainer possessing a desired stimulus such as food. Such issue can be solved by appropriate horse management, change of reinforcer, improvement in the food delivery etc.

Benefits of positive reinforcement based training include:

Create cooperative and happy animals. Such cooperation make it possible for an animal to willingly accept procedures that may cause some discomfort and pain but are necessary to the animal well-being such as injections.
Improve relationship. The relationship between horse and trainer improves as the trainer gets associated with all sort of positive stimulus and experiences such as food, scratches and play.
Dopamine and serotonin. When training with positive reinforcement the animal can experience all the positive effects of dopamine such as memory, learning, attention, sleep and reward recognition. This also leads to an increase in serotonin, which is key in reducing depression and aggression.
React better to novel stimuli. An animal trained with rewards has a more positive outlook on life. (glass half full) This is especially true to animals that are a couple of years on in training, they tend to be less easily scarred by novelty and learn new behaviours fast.
Better at teaching complex behaviours. While negative reinforcement is very efficient at teaching avoidance-based behaviours such as backing away from the handler it is less efficient than positive reinforcement when it comes to nuanced, complex behaviours because the animal is primarily focused on escape (flight) rather than on completing the goal behaviour.

man-hand-holding-dog-clicker-positive-re

Clicker training

Clicker trainers uses positive reinforcement with the addition of a bridge signal which is most often a small noise maker called a clicker. The purpose of the clicker is to improve timing, to be able to precisely mark the moment the horse did the desired behaviour so he knows what earned him a reinforcer. For example: A rider is working on improving his horse canter by rewarding every time he takes the canter on the correct lead. Using a bridge signal, the rider can mark when the horse takes the canter on the correct lead. He can then go back to walk, halt and deliver the reinforcer to his horse.

The use of a bridge signal may not always be necessary (eg. when the reinforcer can be delivered immediately) or recommend (eg. when counter-conditioning a horse to an intense fear you do not want to pair the bridge signal with the feared stimuli).