Instrumental Conditioning Unit 1: Introduction explicit training between voluntary behaviours and their consequences learning the contingency between behaviour and their consequences Unit 2: Instrumental Conditioning Thorndike: placed cats in a puzzle box, where performing a specific behaviour such on pulling on a rope allowed them to escape, small dish of food outside of box, focused on overt behaviour he believed as soon as the cat learned to escape, it woud immeadietly pull the rope the next time over several trials, the random behaviours that did not lead to escape would occur less frequently, leaving only the correct target behaviour in place Long trail and error process of discovery. Rope pulling behaviour gets “stamped in”, random behaviour gets “stamped out” leads to refinement leads to contingency between specific behaviour of rope pulling and specific consequence of food reward Law of Effect: behaviours with positive consequences are stamped in behaviour with negative consequences are stamped out Unit 3: Types of Instrumental Conditioning presenting or removing a positive or negative reinforcer reinforcer: any stimulus, which, when presented after a response, leads to a change in the rate of that response reward training presentation of a positive reinforcer following a response increases the frequency of the behaviour punishment training presentation of a negative reinforcer decrease in the behaviour being reinforced omission training removal of positive reinforcer leads to decrease in behaviour being performed (eg. Watching tv and teasing sister, access to tv show is positive reinforcer, if turned off every time he teases sister, teasing will decrease) (ex. Timeout, must sit alone without friends or toys) escape training removal of negative reinforcer increase in response behaviour (ex. Floor of one side of cage is electrified, if rat moves to other side of cage or loud music being played, banging on ceiling, negative reinforcer is the loud music) immediate presentation of the consequence following the response
Unit 4: Acquisition and Shaping reward training: horizontal line (subject is not responding) followed by upward slopes (when a response has been made) Cumulative record, shows when each peck occurred and the rate of responding over time
autoshaping: learning the contingency without special guidance by the researcher (placing bird in cage with keyhole, bird will eventually peck at keyhole) shaping by successive approximation: complex behaviour is organized into smaller steps gradually building up to the full response, each small step is reinforced through reward training, used extensively by animal trainers) Unit 5: Generalization and Discrimination discriminative stimulus (SD or S+): signals when a contingency between a particular response and the environment is valid (ex. Environment of parents home becomes SD for the response of vegetable eating which is reinforced with access to a dessert reward) S delta (S): cue which indicates when the contingency relationship is not valid (eg. Environment of grandparents home) association between a behaviour and an instrumentally conditioned outcome can be lost through extinction (EX. For reward training, if behaviour no longer yields reward) SD sets the occasion for a response, signaling when the responsereinforcer outcome relationship is valid. The response if voluntary!! Unit 6: Schedules of Reinforcement Ratio schedule vs interval schedule (ex. Bird rewarded with food) Ratio: FR1 rewarded with food for each pecking response, FR10, rewarded for food after every 10 th pecking response interval: FR1minute rewarded with food for the first pecking response after 1 minute, FR10minutes rewarded with food for first peck response after 10 minutes Fixed vs Variable schedule: Fixed: after every peck or 1 minute after every peck reward is given (constant) Variable: VR10 schedule pigeon must peck an average of 10 times to get food, but exact amount between trails changes Four types: FR (fixed ratio), VR (variable ratio), FI (fixed interval), VI (variable interval) FR eg. Paying $5 for every 3 shirts sewed, pause and run pattern following reinforcement, a subject will pause with inactivity before beginning the next run of responding (ex. If pigeon is not hungry, almost like procrastination) pause and run (horizontal then slope then horizontal) VR reinforced around characteristic mean, eg. Slot machine. VR schedule is often very constant and high response rate (may be diagonal line VR10 is steeper than VR40 b/c of higher response rates) FI (ex. Course with weekly quiz study behaviour will ramp up before quiz (shows scallop pattern, diagonal line with half circles, since there is no direct reinforcement for responding beforehand) VI responds at very steady rate (VI2 is stepper than VI6) schedule that delivers more frequent reinforcement will support high response rate (linear) partial reinforcement schedule (PRS) are more robust than those on CRS better to reinforce using PRS (since may not notice reinforcement is gone) shorter variable ratio schedules show a faster rate of response while longer variable response schedule are more resistant to extinction Overjustification Effect occurs when individuals are rewarded for a behaviour that previously has intrinsic value and now only had extrinsic value once it has been reinforced