PSYC 2021 - Exam 2 Notes
Chapter 6: Probability Probability
Probability quantifies the possibility that something is true through calculations or tables.
The probability (p) of a target event = number of target events / number of total events.
The total number of possible events (N) can be represented as a multidimensional matrix of the total number of events (a) and the number of trials (n): N = a n
(e.g. 63 = 216)
The most popular outcomes are binomial events such as: win or lose, male or female, etc.
This division does not seem to be clear-cut as presented here and there are situations that fall in between. (p) defines the target probabilities while (q) defines the probabilities outside the target probability. The formula states: p + q = 1
Coefficient
The coefficient (C) is the different ways in which two possible outcomes (correct and incorrect) can mix. It is calculated as: C = N! / r!(N – r)! Where N is the total number of trials whereas r and (N – r) are the two subgroups of trials.
(!) signals multiplications of successive numbers starting with one number preceding the multiplication sign and followed by those that are immediately below until 2 or 1. For example, 6! = 6 x 5 x 4 x 3 x 2 x 1
Mean, Standard Deviation and z-value.
The mean is calculated as: μ = Np while the standard deviation is: σ = √Npq
The z-value follows: z = (X – μ) / σ or more informally: z = (X – Np) / √Npq
If the z-value exceeds 4, then the probability is very miniscule (below 0.0001).
Getting the probability from the z-table is incomparably easier than calculations because there are no more discontinuous measuring units and it is distributed in two halves.
Combinations and Permutations
One complication when calculating exact probabilities comes from the number of ways in which a certain event may occur (e.g. unlocking a numerical padlock). These are called Combinations (NCr). NCr = N! / r!(N – r)!
The problem with probability is that the calculations can escalate into complexities that exceed our ability to representation, despite its usefulness in differentiated judgments.
The numbers increase even faster if order matters as it does. Permutations (NPr) are used in such cases involving latter orders. NPr = N! / (N – r)!
The assumption has always been that choices are made randomly, but the understanding of probabilities is notorious for being jeopardized by our “common sense” and intuition.
With strings of numbers, the selection with a wide variety is more desirable.
Chapter 7: Introduction to Inferential Statistics Inferential Statistics studies group representatives from a population sample to represent the general population at large. They can be well-defined or ill-defined. Sampling requires that: 1. All elements of the population have an equal chance of being included in the sample. 2. All elements are independently and randomly selected. 3. The number (N) of elements selected is large enough to minimize sampling error. How large an N is needed depends in part of the amount of variability within the population. The more variability there is, the larger the sample size needed. Xbar is the best unbiased estimator, although not effective, for sets based on few data. The standard deviation in a distribution of means is called the standard error of the mean.
Standard error of the mean: σμ = σ / √N. The smaller the σμ, more normal the distribution.
Distribution of raw scores (X) use standard deviations (σ) and variances (σ2) while distribution of sample means (Xbar) use standard error of the mean (σμ) and its square (σμ2). However, distribution of sample means cannot be used for limited group statistics. Best estimate of σ2 = variance within the group + variance between the groups
Or more simply: σ2 + σμ2 = (SS / N) + (SS / N [N – 1]) = SS / (N – 1)
Z scores can also be calculated using: z = (Xbar – μ) / σμ2
Hypothesis Testing: Null (H0) and Experimental (H1) Hypotheses
Hypotheses do not describe certainties, but events that are merely probable or possible.
Although some methodology requires the rejection of the null hypothesis, the basic fundamental rule is “innocent, until proven guilty”: prove H1 as true, or else keep H0
H0 makes an explicit statement while the H1 makes a directional claim. H1 ≠ H0
z = (Xbar – μ as stated by H0) / σμ2 where H1 is the alpha (the region of rejection).
Type I error: H0 is rejected, but H0 is true.
Type I is justified, but rejected. Type II is temporarily passive, but wrongfully accepted.
Alpha is the risk of making a Type I error. Decreasing Type I risk increases Type II risk.
Type II error: H0 is retained, but H0 is false.
Statistical hypothesis deals with two hypotheses, specifics in regards to μ with H0, and rejecting or retaining H0. The investigator’s goal is always to reject H0 (unlike Rosenhan). Chapter 8: Testing Hypotheses using a Single Sample Null Hypothesis (H0)
Alternative Hypothesis (H1)
σμ = σ / √N
One Sample Case: Hypothesis testing involving a single sample comparing mean Xbar to mean μ, where Xbar is the representative sample of a general set and μ is the general sample set. μ can be easily calculated as pN in a probability of p and q.
Standard error of the mean = √(SS / N[N – 1]) = schevron / √N = s / √(N – 1) William Sealy Gosset observed discrepancies between Xbar and μ, when converted into z-values using the standard error of the mean. With a small N, there is a lot more variability than the larger samples. Samples have different distributions depending on the sample size N on which they are based (mostly extreme events). T-test: t = (Xbar – μ) / standard error of the mean Degrees of freedom (df): in the single sample mean, df = N – 1if there is only one sample mean involved in the calculation of the standard error of the mean. As df increases, the t-table distribution increasingly looks like the normal distribution. Using df is irrelevant to the z-test. Large t-values are needed to reject H0 when df is small. If σ is known, z-test is used. If σ is unknown and standard error of means is depended upon, t-test is used. If both σμ and the standard error of means are known, σμ is used for the z-test. Choice of alpha: conventional α = .05
Probability of a Type I error: < .01
Decision Rule: Since H1 states that μf < μ, we are only interested in the negative extreme of the distribution, thus reject H0 if tobtained ≤ tcritical. 95% Confidence Interval (C95): μ = Xbar ± tα = .05 x (standard error of mean), where Xbar is the center of the confidence interval and tα defines the width of the interval. Where one sample cases are used: pension benefits for retirees, GPA, Infertility rates, average height of adult population, climate change, and industrial quality control standards. Chapter 9: Variables, Within-Subjects Designs, and the T-Test for Correlated Samples The choice of a statistical test is based on a number of criteria. 1. The scale of the dependent variable X (ratio, interval, ordinal, and nominal) 2. The number of groups k (k = 1, k = 2, k > 2) 3. The relation of groups to each other when there are two or more groups.
Types of Variables
Independent Variable (IV): has the connotation of being the cause or a contributor to the resulting effect. Dependent Variable (DV): has the connotation of effect.
Variables observed in correlational studies are simply referred to as co-variables.
Disturbance (noise / random) Variables: the clear relation that ideally exists between IV and DV where other variables tend to muddle this relation. They are the individual / situational differences that affect the dependent variable in an unforeseen way. Disturbance variables manifest themselves in the standard deviation of the dependent variable. The size of the standard deviation reflections the amount of random variation within the data. One way to reduce disturbance variables is to eliminate or reduce individual differences when possible (usually through same subjects in various conditions of the IV or by using the subjects that are related and similar to each other), or increase N.
Confounding variables: a systematic correlation between the IV and DV that affect the dependent variable while being correlated with the IV (disturbance variable does not correlate with the IV). It is associated with the administration of the IV. As confounds will always exercise their effect on the group as a whole just as the IV does, they present a problem of interpretation (resolved through the use of control groups and isolation).
Types of Related Samples
Collectively referred as within-subject designs, experiments that use the same subject may be before-and-after experiments. This design is popular when testing the effect of substances such as drugs on coordination or the effect of information or persuasive arguments on attitudes or the effect of any other factor on behavior.
Within-group designs decrease disturbance, but risk increasing confounds.
Within-subject designs are popular with research that aims at establishing consumer preference. Time cannot act as confound variable, but spatial order of the items can.
Twin studies are the most desirable and effective design. Familial studies can be used too.
Artificially-matched samples: deals with pairs of unrelated subjects that share similarities such as a genetic trait or illness. But, this design does not share common characteristics of the target group.
Using related samples reduce / eliminate individual differences and become an important source of disturbance variables. Within-subject design is more powerful than betweensubject design although some exceptions apply. They are also powerful in rejecting H0.
The critical t-value that needs to be surpassed in order to reject H0 is greater with the lower degree of freedom (df). Difference becomes crucial as N is very small and if the amount of individual differences is particularly not large.
Carrying out the t-test for correlated samples: Z = (Xbar - µ) / σxbar
t = (Xbar - µ) / schevronxbar
t = (Dbar - µDbar) / schevronDbar
Dbar is the mean of the difference scores in the samples. UD is the mean of the difference scores in the population. schevronDbar is the estimated standard of error of the mean of difference scores. The characteristics of this test are that it is virtually always a t-test since it is unlikely that we know σ and the H0 virtually states that µD = 0, which means t = Dbar / schevronDbar. schevronDbar = √(SSD / N[N - 1])
SSD = ΣD2 – ([ΣD]2 / N)
Steps: (1) State H0 and H1 and determine if the test is one-tailed or two-tailed. (2) Select the alpha level, calculate df, and look up tcritical. (3) Calculate D, D2, ΣD, ΣD2. (4) Calculate Dbar, SSD, schevronDbar. (5) Calculate tobtained. (6) State a conclusion.