# Statistics and Epidemiologic Glossary

## Statistics and epidemiologic glossary and explanations#

Probability – The probability is a measure of the likelihood of an event and is expressed as the proportion of events of interest compared to all possible events.
Example – What is the probability of getting a 6 from a single throw of a die?
The proportion of the desired event (throwing a 6) to all possible events (throwing a 1, 2, 3, 4, 5 or 6) = 1/6

Odds – The odds is a measure of the likelihood of an event and is expressed as the ratio of the frequency of an event of interest occurs compared to the frequency of the alternative events among all possible events.
Example – What is the odds of getting a 6 from a single throw of a die?
The ratio of the number of ways of throwing a 6 (1) to the number of ways of not throwing a 6 (of which there are 5 ways: throwing either 1, 2, 3, 4 or 5); therefore the odds are 1 to 5 or 1:5 True positive (TP) – A positive test result in individual with the disease.

True negative (TN) – A negative test result in individual without the disease.

False positive (FP) – A positive test result in individual without the disease.

False negative (FN) – A negative test result in individual with the disease.

Sensitivity – The sensitivity of a test is the probability that the test will be positive among individuals who have the disease. (See the grid above.)

Specificity – The specificity of a test is the probability that the test will be negative among individuals who do not have the disease. (See the grid above.)

Positive predictive value (PPV) – The positive predictive value of a test is the probability of disease among those who test positive. (See the grid above.)

Negative predictive value (NPV) – The negative predictive value of a test is the probability of no disease among those who test negative. (See the grid above.)

False positive rate (FPR) – The false positive rate is the probability of a positive test result when the disease is absent.
The false positive rate = 1 – specificity.

False negative rate (FNR) – The false negative rate is the probability of a negative test result when the disease is present.
The false negative rate = 1 – sensitivity.

Likelihood ratio of a positive test result (LR+) – The increase in the odds of having the disease after a positive test result.
LR+ = sensitivity/1-specificity

Likelihood ratio of a negative test result (LR-) – The decrease in the odds of having the disease after a negative test result.
LR- = 1- sensitivity/specificity (The numbers used in this example are only for demonstration purposes.)
The probability of having cardiovascular disease among individuals with periodontal disease =
a/a+b = 355/3495 = 10.1%. This is a measure of the risk of developing cardiovascular disease among those having periodontal disease.
The probability of having cardiovascular disease among individuals without periodontal disease =
c/c+d = 140/2647 = 5.3%. This is a measure of the risk of developing cardiovascular disease among those who do not have periodontal disease.

The odds of having periodontal disease among individuals with cardiovascular disease =
a/c = 355/140 = 2.54
The odds of having periodontal disease among individuals without cardiovascular disease =
b/d = 3140/2507 = 1.25

Relative risk (RR) – The relative risk is the ratio of the risk of developing disease in the exposed group to the risk of developing disease the non-exposed group.
Example – The risk of developing cardiovascular disease among individuals with periodontal disease to the risk of developing cardiovascular disease among individuals without periodontal disease = 10.1%/5.3% = 1.9

Odds ratio (OR) – The odds ratio is the ratio of the odds of having been exposed among the diseased group to the odds of having been exposed in the non-diseased group.
Example – The odds of having periodontal disease among individuals with cardiovascular disease to the odds of having periodontal disease among individuals without cardiovascular disease = 2.54/1.25 = 2

Absolute risk reduction (ARR) – The absolute risk reduction is the decrease in risk given an exposure, treatment or activity in relation to a control group without an exposure, treatment or activity.
Example – The absolute risk reduction for having cardiovascular disease in individuals with periodontal disease compared to having cardiovascular disease among individuals without periodontal disease =
(a/a+b) – (c/c+d) = 10.1% – 5.3% = 4.8%. This is the incidence rate of cardiovascular disease that is associated (or attributed) to periodontal disease.

Number needed to treat (NNT) – The number needed to treat is the number of individuals who need to be treated in order to prevent one additional bad outcome, or to achieve an additional favorable outcome. NNT is defined as the inverse of the absolute risk reduction.
Example – The number of individuals needed to treat periodontal disease to prevent a case of cardiovascular disease = 1/ARR = 1/0.048 = 21 The purpose of a statistical test is to determine if the null hypothesis can be rejected.

Null hypothesis (H0) – The null hypothesis is a statement that there is no difference or no association between two or more groups. If the statistical test applied to the observed data suggests that the null hypothesis can be rejected, then the alternative or research hypothesis (H1) is supported.
Example – If the null hypothesis is “fluoride added to toothpaste has no effect against caries”, a statistical test may be used to determine if the data is such that the null hypothesis can be rejected, supporting the alternative hypothesis (H1) that “fluoride added to toothpaste protects against caries.” .

Type I error – A Type I error is rejecting the null hypothesis when it is true.
Example – If the null hypothesis is true (“fluoride added to toothpaste has no effect against caries”) but the statistical test falsely suggests that the null hypothesis should be rejected (“fluoride added to toothpaste does protects against caries”), a Type I error has occurred.

Type II error – A Type II error is failing to reject the null hypothesis when it is false.
Example – If the null hypothesis is false (“fluoride added to toothpaste protects against caries”) and the statistical test falsely suggests that the null hypothesis should not be rejected (“fluoride added to toothpaste does not protect against caries”), a Type II error has occurred.

Alpha (α) level – An alpha level is the threshold probability for tolerating a Type I error and is usually set at 0.05. If the probability of making a Type I error is less than alpha the null hypothesis is rejected and the alternative hypothesis is supported (also expressed as “statistically significant”). If the probability of making a Type I error is alpha or greater the null hypothesis is not rejected (also expressed as ” not statistically significant”).

P-value – The p-value (probability value) is calculated after the statistical test is performed and is the data-based probability of obtaining a result as extreme (or more extreme) than the one observed, if the null hypothesis is true. It can be interpreted as the probability that the observed results is due to chance alone. If the p-value is less than alpha, the null hypothesis is rejected and the alternative hypothesis is supported (“statistically significant”).

Confidence interval (CI) – A confidence interval is a range of values (the lower and upper values) used to estimate a population parameter and is associated with a specific confidence level.

Confidence level – The confidence level is the probability that the interval estimate will include the population parameter.

Mean – The mean is the average value of a data set, or the calculated sum of the data values divided by the total number of data values.
Example
Data set – 56, 45, 78, 3, 9, 10, 53, 76, 12
Mean – (56+45+78+3+9+10+53+76+12)/9=38

Median – The median is the value halfway through an ordered data set, where there are an equal number of data values below and above.
Example
Data set (odd number) – 56, 45, 78, 3, 9, 10, 53, 76, 12
Ordered data – 3, 9, 10, 12, 45, 53, 56, 76, 78
Median – 45 (4 data values above and 4 data values below)

Data set (even number) – 56, 45, 78, 3, 9, 10, 53, 76, 12, 35
Ordered data – 3, 9, 10, 12, 35, 45, 53, 56, 76, 78
Median – 40 (halfway between the “middle” data values [35 and 45])

Mode – The mode is the most frequent occurring data value among a specified data set.
Example
Data set – 24, 26, 26, 28, 30, 28, 26, 20
Mode – 26

Prevalence – Prevalence measures how much of a disease or a condition there is in a population at a particular point in time. The prevalence is calculated by dividing the number of persons with a disease or a condition at a particular time point by the number of individuals examined.
Example – Diabetes affects 25.8 million individuals out of a population of 311 million people. This translates to a prevalence of 8.3%.

Incidence – Incidence measures the rate of occurrence of new cases of a disease or a condition. Incidence is calculated as the number of new cases of a disease or a condition in a specialized time period (usually a year) divided by the size of the population under consideration who are initially free of the disease or the condition.
Example – 1.9 million individuals over 20 years of age, out of a population of 226.5 million people over 20 years of age, and without a previous diagnosis of diabetes developed diabetes in 2008. This translates to an incidence rate of 0.84%.