|
PDF version of the Glossary is available here .
Accuracy (of a test). The proportion of correctly classified patients based on the test and its threshold. The proportion of true positive classifications plus the proportion of true negative classifications. [See prevalence, sensitivity, specificity]
Alpha-level (α): Level of significance chosen prior to performing a statistical test. Generally .001, .01, .05, or .10. Alpha-level determines table entry in probability distribution to determine whether a hypothesis should be rejected. The term is largely vestibular. Modern practice is to report an exact p-value without accepting or rejecting the hypothesis. [See p-value, test statistic, Type I error]
Analysis of covariance: In tests of difference among means, an adjustment can be made in advance to “eliminate” or “hold constant” known or suspected background factors that might confound the tests. EX: the effects of a new mouth rinse on plaque scores could be tested after first correcting for baseline plaque scores, DFM, and average dental visits. The variables statistically controlled are known as covariates. An analogue of this technique for measures of association is the partial correlation or hierarchical regression analysis. Performed on statically software, a statistician is helpful. {See templates – Choosing a statistical test.}
Attenuation: Diminution of observed correlation because of restrictions occurring during measurement. Attenuation occurs when observations are taken from a restricted range or when category or rank formulas are used to calculate correlation coefficients on interval data. EX: the effect of moisture content on setting time of materials is underestimated if only a small range of moisture content is studied or if phi or rank correlation formulas are calculated. [See correlation coefficient]
Bayes Theorem: Formula used to calculate the probability that a condition exists, given information about the baseline prevalence of the condition and test data. Rare conditions are still rare despite positive test results, even though a positive test increases the chances that the condition exists. Other factors to consider include the sensitivity of the test (ability to detect present conditions) and the proportion of false positive results generated by the test. [See sensitivity, Template]
Bias: Differences among data in which patterns can be detected where the patterns are confounded with (cannot be separated from) the conclusions one wishes to draw). EX: concluding that better education is associated with better oral health may be biased by the confounding of education with income. [See independence among variables, random variation]
Category data: Numbers assigned as names of categories. EX: Male = 1, patient’s social security numbers, placebo drug = 3, favorable outcome = 1. No arithmetic should be performed on category data, e.g., “The average sex of subjects was 1.56” is not meaningful. In certain complex statistical tests such as multiple regression, coding presence or absence of a condition is referred to as a “dummy variable.” [See also indexes, numerical data, ranks, transformations]
Chi-square test (χ 2): The name of a statistical test and of a probability distribution that has a particular shape. The chi-square test is used to determine whether the distribution of counts or occurrences is random across categories. EX: If a die is fair (not loaded), each of the numbers 1 through 6 should occur with the same frequency. If three handpieces are equally safe, the number of accidents with each should be jointly independent of which handpiece is used and how often it is used. The result of a chi-square test is “looked up” in a table of chi-square distributions to determine the chance of such a test value occurring by chance, the p-value. Usually preformed as a hand calculation. {See templates – Choosing a statistical test, chi-square.} [See degrees of freedom, independence]
Confidence interval: Range (upper and lower limit) constructed so that similar ranges are likely to contain the true mean for a set of observations. Traditionally, CI 95 is the calculated range where 95% of similarly constructed ranges include the true average. {CI 95 = X ± 1.96 * [SD / SQRT (n)]} NB: The confidence interval is not the range that has a 95% chance of containing the mean – despite this being a commonly reported interpretation. Small confidence intervals are preferred. They can be obtained by having (1) a small variance, (2) a large sample size, or (3) a lower confidence level, e.g., 90% rather than 95%. When a confidence interval does not contain the value 0.0 for differences between groups or 1.0 for odds ratios, it is equivalent to a statistically significant test for difference.
Cohen’s coefficient kappa (κ): Proportion of observed agreement among raters divided by the maximum possible agreement. The kappa statistic ranges between 0.0 and 1.0, with higher values representing greater agreement. Kappa used to adjust for skewed distributions. For example, if nearly all candidates for initial licensure pass, there will be a high observed agreement among examiners due to change applications of a liberal criterion. [See Cronbach’s alpha, Kendall rank coefficient, Pearson correlation coefficient, Pearson rank correlation coefficient, phi, point-biserial correlation coefficient]
Conditional probability: Likelihood of an event occurring given that a specific condition holds. EX: If two of ten restorations with Material X are observed and six of ten restorations with Material Y are observed, the probability of a failure (not conditional) is .40 – (2 + 6) / 20. The conditional probability of a restoration using Material X is .20 (2 / 10). Written p (F | X), probability of failure given condition X. [See also probability]
Continuous distribution: Frequency or probability distribution where the values of the horizontal axis could theoretically take on any value along a continuum. Time is a continuous variable, even when measured in days; distance is a continuous variable, no matter how fine the measurement units are. [See discrete distribution, frequency distribution, probability distribution]
Contrasts: In the one-way analysis of variance, the presence of a statistical significance between group averages is not simultaneously a test for which groups are different from each other. An after-the-fact test of contrasts is performed to answer this question. There are numerous, roughly equivalent tests of contrasts (least-significant difference, Bonferroni, Duncan’s multiple range test, Student-Newman-Kuels, Tukey, and Scheffé). In the literature, contrasts are reported as sets of groups that cannot be distinguished (statistically) from each other. This is often depicted graphically as lines or brackets creating sets of averages values. Performed using statistical software, a statistician may be helpful. [See one-way analysis of variance.]
Correlation coefficient (r): A measure of degree of association among two or more variables. When knowing the value of one variable affords little or no advantage over chance of predicting another variable, the variables are independent and the correlation coefficient is close to 0.0 (weak or null correlation). When knowing one variable allows prediction of the other, the r-value gets closer to -1.0 or to 1.0. [See negative, Pearson, Kendall, point-biserial, phi, Cronbach’s alpha, Cohen’s Kappa, partial, and spurious correlation, covariance, reliability, validity, attenuation, correlation matrix]
Correlation matrix: A tabular presentation of many correlation coefficients. Each variable is both a row and a column heading. The intersection cell for each row and column is the correlation between the variables. The cell entry for rows and columns for the same variable, forming the diagonal, are always 1.00. Because the correlation between A and B is the same as the correlation between B and A, the correlation matrix is symmetrical above and below the diagonal, and the bottom triangle is normally omitted. [See correlation coefficient]
Covariance: A measure of association between pairs of variables, similar to the correlation coefficient. Covariance values can be very large or small because they are not standardized like the correlation coefficient is. Most typically, covariance calculations are referred to in the literature as steps on the way to more sophisticated analyses or it is said in a loose fashion that two variables “co-vary” (deviations from the mean are systematic across pairs) without imputing precision to the expression. [See correlation coefficient]
Cramer’s V: Measure of association for contingency tables where the cell values are counts. The V-statistic takes values between 0.0 and 1.0. The statistical significance of V is normally calculated with the chi-square test. Usually performed as a hand calculation. {See templates – Choosing a statistical test.} [See chi-square, phi, association.]
Cronbach’s coefficient alpha (α): Generalized measure of association used where there are more than two variables. Coefficient alpha values can only assume values between 0.0 and 1.0 (no negative values). EX: consistency among four raters or a series of items on an inventory purporting to measure the same construct. [See Cohen’s kappa, Kendall rank coefficient, Pearson correlation coefficient, Pearson rank correlation coefficient, phi, point-biserial correlation coefficient]
Deciles, Quartiles, Quintiles: The tenth, twentieth, and thirtieth percentiles etc. are the first, second, and third deciles, for example. Quartiles are divisions of a sample into four categories with equal numbers of observations in each category. Quintiles are divisions into five groups of equal size. CAUTION: It is customary that the first percentile or decile are at the bottom of the distribution of values while the first quartile or quintile is at the top. The interquartile range is the difference in values of the middle half of the scores arranged in numerical order. [See also percentile]
Degrees of freedom (df): Probability distributions for test statistics (t, F, χ 2) are families of curves that have lightly different shapes, depending on sample size. In order to locate the correct distribution among these families, the degrees of freedom are determined from the research sample size.
Dependent variable (predicted variable): When it is assumed that the values of one variable are a function of others, the variable that “depends” on the values of other variables is called the dependent or predicted variable. A variable can be dependent of one or more variables. When graphed, the dependent or predicted variable is placed on the vertical axis. . EX: The longevity of a restoration (dependent variable) is assumed to depend on properties of the materials used (independent variable). DMF (predicted variable) is assumed to be dependent on age (predictor variable). [See independent variable.]
Discrete distribution: Frequency or probability distribution where the values of the horizontal axis can only take on a small, finite number of values. Individual DMF, number of children, and televisions per household are discrete variables. Caution: Combinations of multiple samples of discrete variables can be thought of as continuous. EX: DMF in individuals must be in whole numbers, but the distribution of average DMF in different communities is continuous. [See discrete distribution, frequency distribution, probability distribution]
Factor analysis: Technique for identifying patterns intrinsic in a set of data. A computer program proposes “factors” as sets of variables that are related because respondents who score high on one variable tend to score high on related variables. Each factor is similar to a regression line, with variables coming close (“loading on”) various factors. The researcher must select factors based on statistical characteristics and meaningfulness and label them.
Factorial analysis of variance (factorial, or n-way ANOVA): Statistical test for differences among average scores in two or more groups classified across multiple classification dimensions. EX: Three curing times, six different operators, and two materials are used and average marginal integrity in each group is compared. The hypothesis in such studies concerns the classification dimensions. This would be a three-way or three-factorial ANOVA because there are three classification dimensions. A conclusion can be drawn regarding curing times, another conclusion can be drawn regarding operators, and a third about the materials. It is also possible to draw conclusions about interactions between the dimensions. Possibly, a long curing time is effective for one material and a short curing time is best for a different material. The results of a factorial analysis of variance are reported as F-values that are “looked up” in a table, using the appropriate number of degrees of freedom. Performed on statistical software, a statistician is often helpful. {See templates – Choosing a statistical test.} [See interaction effects.]
Frequency distribution: Ordered display of data categories or values along the horizontal axis and display of frequency or count accruing at each data point on the vertical axis. [See continuous distribution, discrete distribution, Matthew distribution, Pareto distribution, probability distribution]
Hypothesis: Statement of theoretical principle about differences between groups or associations among them that is phrased in terms that are testable empirically. EX: The average healing time following surgical procedure A is less than following procedure B. Dr. Smith is a better surgeon than Dr. Jones is not a hypothesis because it is not testable as phrased. [See null hypothesis]
Independence among observations: No relationship exists between the nature and manner of observations in a sample. EX: Alternating between recording data from male and female subjects or filling remaining appointment time following a difficult case with an easy one are examples of NON-independence of observations. Randomization is the commonly accepted means for achieving independence of observations. Statistical tests using non-independent observations are subject to Type-I error – claiming a difference exists when in fact it doesn’t. [See random, sample]
Independence among variables: No relationship exists in nature among independent variables. If knowing the value of one variable does not allow prediction of another variable with anything better than chance, the variables are independent. It is assumed that longevity of amalgam restorations are independent of phases of the moon but tides are not. [See bias, random variation]
Independent variables (predicted variables): When it is assumed that the values of one variable are a function of others, the variables that can be manipulated to cause a change in the other are called independent or predictor variables. When graphed, the independent or predictor variable are placed on the horizontal axis. EX: The longevity of a restoration (dependent variable) is assumed to depend on properties of the materials used (independent variable). DMF (predicted variable) is assumed to be dependent on age (predictor variable). [See dependent variable]
Indexes: Data expressed as the ratio of numerical scores divided by another score that represents a comparison group. EX: The S&P Index, DAT and National Board Scores, some lab values. [See also category data, numerical data, ranks, transformations]
Interaction effects: When the average score at one level of a classification dimension differs depending on which category it is in another classification dimension, the variables are said to interact. EX: Patients may respond between to treatment A if they have no prior history of the disease. Or perhaps only women with no prior history respond better. In this example, the first part is a two-way interaction treatment-by-history), the second part is a three-way interaction (treatment-by-history-by-sex). Graphically, interactions are displayed as lines that are not parallel (no uniform effect across another variable). The non-parallel lines do not have to cross for an interaction to exist. [See factorial analysis of variance.]
Intercept (constant) (of a regression equation): A base value added to every predicted value (y) when estimating from predictors values (x) in a regression situation. It is the value of y when x = 0.0. [See regression equation, scattergram, slope]
Kendall rank coefficient (τ): An alternative formula for calculating correlations between two variables that are ranks. [See Cohen’s kappa, Cronbach’s alpha, Pearson correlation coefficient, Pearson rank correlation coefficient, phi, point-biserial correlation coefficient]
Kruskal-Wallis test (H): A statistical test used to determine whether differences exist among ranks in several groups. The Kruskal-Wallis test returns a test statistic H. EX: Judges rank order preparations performed by practitioners using four procedures. The research question is whether the average ranks of preparations are different from each other. Can be performed as a hand calculation or, more conveniently, on statistical software. {See templates – Choosing a statistical test.}
Latent structure (λ): Potentially meaningful patterns in data sets that are revealed by statistical procedures. These are distinct from patterns imposed on data by observers, either a priori (before data are gathered) or a posteriori (after data are collected). Means and regression equations are the most familiar examples of summary patterns suggested by the data. Cluster analysis, factor analysis, canonical correlations, and multidimensional scaling are more complex examples.
Mann-Whitney U-test: A statistical test used to determine whether differences exist between the ranks in two groups. For sample sizes under 30, the U-value that is calculated by the test is referred to tables found in statistics books. When sample sizes are greater than 30, the U-value approximates the normal curve and U can be treated as though it were a z-value. EX: Judges rank order preparations performed by practitioners using two procedures. The research question is whether the preparations using one procedure are ranked higher, on average, than preparations using the other technique. Can be performed as a hand calculation or, more conveniently, on statistical software. {See templates – Choosing a statistical test.}
Matthew curve: Positively skewed curve (most scores bunched at the low end, with a few very high values). Characteristic of income and performance on complex tasks where those who do well at time 1 have an advantage at time 2 (compound interest). So-called after Matthew 13:11 in the Bible. [See also frequency distribution, Pareto distribution, skewed distribution]
Mean (arithmetic mean): Average value in a set of numbers – total of all values divided by the number of observations. [See also median, mode]
Measure of effect: Observed difference or association in a sample of data. EX: “The average difference between the test and control groups is 2 millimeters” or “the correlation between mothers’ ages and supernumerary teeth in children is .235” or “the slope of the regression line for years of practice and net practice income is 758.00” are all examples of measures of effect.
Median: The value in a distribution such that half of the values for scores are larger and half smaller than that value. EX: If 100 scores are available and arranged in numerical order, score number 49 may be 510 and score number 50 may be 514. The median would be 512 – half way between number 49 and number 50, e.g., half the scores are higher and half lower than 512. The median is the 50 th percentile score. [See also mean, mode, percentile]
Mode: Most common number in a set of numbers. [See also mean, median]
Multiple analysis of variance (MANOVA): When there are multiple depended variables, outcomes, a disturbance is created in statistical testing. Because the outcomes might be related to each other, separate tests on each outcome run the risk of measuring the same thing several times and thus thinking that several results were observed. EX: A group of patents is followed over time, with measures at one-month intervals (one study not twelve). MANOVA tests are sophisticated and require both a computer and statistical consultation. Where they are reported, it is likely that the research is knowledgeable, where they should be used and are not, it is likely that the researcher is committing Type I errors. Performed on statistical softw3are, a statistician is helpful.
Negative correlation (r-values between 0.0 and -1.0): Associations where large values on one variable are more likely to be paired with small values on another variable than would occur by chance. The less likely the pairings are to be independent of each other, the closer the correlation coefficient is to -1.0. Ex: Reported frequency of brushing has about a -.30 correlation with pocket depths and water fluoridation has about a -.60 correlation with DMF. [See correlation coefficient]
Null hypothesis: Testable hypothesis intended to be disproved via research. Normally, the null hypothesis is the opposite of what the research intends to find. EX: Number of streptococcus mutans in a sample of plaque will be fewer than 10 per cubic millimeter. If the number of s mutans is 10 or more, the null hypothesis is rejected. NB: There is no statistical test for a hypothesis stating that two or more values are the same. Only differences can be tested. [See hypothesis]
NNT (Number Needed to Treat): A measure of the effect of a treatment; the number of patients who would need to receive the treatment to see an effect, on average. EX: If a new drug reduces the probability of tooth lose due to periodontal disease by 15%, NNT = 7 (1 / .15).
Numerical data: Numbers expressing the relative amounts of items, weights, lengths, scores, etc. Numerical data can be used in arithmetic calculations. Statistical tests based on numerical data with distributions that are reasonably known are called parametric tests. [See also category data, indexes, ranks, transformations]
Odds: Number of targeted events divided by the number of non-targeted events. EX: number of preterm deliveries divided by the number of normal term deliveries. If 3 of 10 women have preterm deliveries, the odds are .428 (3 / 7) while the proportion of preterm deliveries is .300 (3 / 10). [See also odds ratio, proportion]
Odds ratio: Ratio of two odds numbers. EX: If the odds of preterm delivery are .428 in a normal population and .630 among women who have periodontal disease, the odds ratio is 1.472 (.630 / .428). Odds ratios are often used to express risk factors. An odds ratio greater than 1 represents increased risk. [See also odds]
One-tailed test (directional test): If there is only one way to disprove a null hypothesis, the statistical test is one-tailed. EX: “There is either no correlation or a negative correlation between age at first dental visit and DMF at age 30” is a one-tailed test – there is only one way to disprove it (finding a positive correlation). The operational significance of a one-tail test comes in determining the p-value from the test statistic. One-tail tests have lower p-values for a given test statistic than two-tail tests. [See null hypothesis, two-tailed test]
One-way analysis of variance (one-way ANOVA): Statistical test for differences among average scores in two or more groups classified across a single dimension. EX: Three curing times are used and average marginal integrity in each group is compared. The hypothesis in such studies concerns the classification dimension. The results of a one-way analysis of variance are reported as an F-value that is “looked up” in a table, using the appropriate number of degrees of freedom. If the analysis of variance is statistically significant, it can be concluded that the groups are different from each other, but it is not possible to determine (statistically) which groups differ from each other. That determination is made via a separate, post hoc, calculations known as contrasts. Usually performed on statistical software. {See templates – Choosing a statistical test.}
Operating characteristic curve (OCC): A plot of the probability of accepting a hypothesis (on the vertical axis) as a function of the difference between the true value measured and the hypothesized value (on the horizontal axis). As the difference between the actual and hypothesized values increases, the probability of accepting the hypothesis increases. As the efficiency of the test increases, the probability of accepting the hypothesis increases. Test efficiency is a function of choosing the best test, reducing required confidence, reducing variance, and increasing sample size. [See alpha, power curve, Type I error]
p-value (lower case): The probability that a series of identical replications of an empirical test of a hypothesis would find effects of the same size or larger. (Incorrectly interpreted as the probability that a tested hypothesis is “true.”) Conventional p-values include .001, .01, .05, and .10. P-values of ≤ .05 are commonly referred to as “statistically significant,” while p-values between .05 and .10 are called “marginally significant.” It is accepted practice to report actual p-values instead of pegging them to a traditional standard (EX: p = .007 rather that p< .01.) Conventional p-values are often designated in tables as * ≤ .05, ** ≤ .01, and *** ≤ .001. [See alpha, null hypothesis, test statistic, Type I error]
Paired-comparison t-test: A variation on the traditional t-test where the two groups being compared are related to each other in some significant fashion. EX: Half of a group brushes with device A for a week and then with device B, while the other half performs the same procedures in reverse order. Plaque scores are measured at the end of each week. The variable analyzed is the difference between each subject’s plaque scores. (This is known as a cross-over design). The reason for using the paired-comparison test is that significant amounts of variance (skill, dedication, complexity of dentition) are common across subjects. Without accounting for this common variance, the standard deviation in the test statistic would be too large, causing an increased Type II error. {Formula: z = X d / [SD d / SQRT(n)], where X d is the average of the differences and SD d is the standard deviation of the difference scores.} Performed as a hand calculation, support from Excel. {See templates – Choosing a statistical test.}
Pareto distribution: An exhaustive and mutually exclusive set of categories is displayed on the horizontal axis and the frequency in each category is displayed on the vertical axis. The categories are arranged on the horizontal axis from the one with the largest frequency on the left to the one with the smallest frequency on the right. The total of frequencies in all categories combined equals the total of all cases considered. [See also frequency distribution, Matthew curve, skewed distribution]
Partial correlation: The degree of association between two variables with one or more additional variables “held constant” statistically. {r AB|C = (r AB – r AC * r BC) / [SQRT (1 – r 2 AC) * SQRT (1 – r 2 BC)]} EX: SES (socioeconomic status), insurance coverage, and oral health are all related in America. It is possible to calculate the effect of insurance on oral health partial ling out or “holding constant” the effects of SES if the correlations among all three variables is known. [See correlation coefficient, spurious correlation]
Pearson correlation coefficient: The formula normally used to calculate the correlation between two variables, each of which is measured on an interval scale. Correlation coefficients reported in the literature can be assumed to be computed using the Pearson formula unless otherwise stated or unless such a test would be illogical. EX: Age and periodontal pocket depth would be calculated with the Pearson formula, but age and PI would not. [See Cohen’s kappa, Cronbach’s alpha, Kendall rank coefficient, Pearson rank correlation coefficient, phi, point-biserial correlation coefficient]
Pearson rank correlation coefficient: Formula used to calculate correlation between two variables that are ranks. EX: class rank of dental students and place on the PASS rankings for graduate programs. [See Cohen’s kappa, Cronbach’s alpha, Kendall rank coefficient, Pearson correlation coefficient, phi, point-biserial correlation coefficient]
Percentile: The value in a distribution such that some percentage of scores is higher or lower than the percentage chosen. EX: To find the 15% percentile of 50 scores, arrange them in numerical order and count the bottom 7 (15% of 50). The value that is 15% of the distance between number 7 and number 8 is the 15% percentile. Percentile is sometimes written %ile, but it is not the same as percentage. [See also decile, median]
Phi-coefficient (φ): Measure of association in a contingency table – rows and columns composed of category data. Normally, association in such tables is expressed in terms of chi-square (χ 2), but chi-square values range between 0.0 and infinitely large positive numbers. Phi is a transformation of chi-square that compresses the values to the 0.0 to 1.0 range typical of measures of association. EX: participation in three treatment regimes and presence or absence of a disease. [See Cohen’s kappa, Cronbach’s alpha, Kendall rank coefficient, Pearson correlation coefficient, Pearson rank correlation coefficient, point-biserial correlation coefficient]
Point-biserial correlation coefficient: Formula used to calculate correlation between one variable that is measured on an interval scale and another variable that is dichotomous. EX: gender and DMF scores or, commonly, between proportion of respondents answering a question correctly on a test and their total correct score on the entire test. [See Cohen’s kappa, Cronbach’s alpha, Kendall rank coefficient, Pearson correlation coefficient, Pearson rank correlation coefficient, phi]
Population: Exhaustive set of observations from a “universe.” EX: All of the students who took the state boards at a given testing date and location constitute the entire domain of generalization for that testing occasion. There is no probability of error (other than computational error) when characterizing a domain based on a population sample. [See also sample]
Power: The probability of finding a true difference or association when one actually exists. Statistical power is defined as 1.0 – β. Beta is Type II error, the probability of failing to find an actual difference or association, so the probability of finding real effects in a test is the probability that one exists minus the probability of missing it. [See alpha, operating characteristic curve, power curve, Type II error]
Power curve: A plot of the probability of finding a true effect (on the vertical axis) as a function of the difference between the true value measured and the hypothesized value (on the horizontal axis). As the difference between the actual and hypothesized values increases, the probability of accepting the hypothesis increases. As the efficiency of the test increases, the probability of accepting the hypothesis increases. Test efficiency is a function of choosing the best test, reducing required confidence, reducing variance, and increasing sample size. By convention, a priori sample sizes are estimated to produce a power of .80, given α = .05 and measure of effect and variance estimated from the literature. [See alpha, operating characteristic curve, power, Type II error]
Prediction (from a regression equation): Estimate of dependent variable (y) using the regression equation. If the constant and slope, c and b, are known from the regression equation, the “predicted” value of y associated with a given x-value can be determined using the equation y = c + b * x. The predicted value of y is usually reported as a y with a ^ over it. All predicted values will be on the regression line. [See regression equation, residual, standard error]
Prevalence: Proportion of cases that have a condition. [See accuracy, sensitivity, specificity]
Probability: Likelihood of an event occurring, normally calculated as a proportion of events from domain that resembles the one about which a decision is to be made. EX: If the proportion of restorations of a given type that fail is three of nine, the probability that similar restorations will fail is .333. Proportions can only take values between 0 and 1 (cannot be negative). Abbreviated p. [See also conditional probability, proportion]
Probability distribution: Ordered display of data categories and values along the horizontal axis and display of probability of each value on the vertical axis. The total area under the probability distribution curve always equals 1.0. [See continuous distribution, discrete distribution, frequency distribution, Matthew distribution, Pareto distribution]
Proportion: Number of actual occurrences of an event divided by number of occurrences possible for that event. EX: If nine restorations are placed and three fail, the proportion of failing restorations is .333. Proportions can only take values between 0 and 1 (cannot be negative). Abbreviated p (by convention q is the complementary proportion, 1 – p.) [See also probability]
Random: Random relationships cannot be predicted. EX: In assigning patients to treatments in a random fashion, there should be no knowledge of any characteristic of the treatment, the patient, or any previous patient or treatment that permits guessing the assignment with anything more than chance probability. [See also bias, independent sampling]
Random variation: Differences among data in which there is no pattern. There is nothing about the nature of the data that would permit predicting outcomes with anything more than chance accuracy. [See bias, variance]
Range: Numerical difference between the largest and the smallest values in a set of scores. This value is only occasionally reported in the literature, but may be useful when performing data analysis as a check against data entry errors. [See also standard deviation]
Ranks (ordinal data): Numbers indicating relative order within a set. Normally, 1 represents the best item and ties share the average rank of all tied items. EX: A=44, B=91, C= 50, D=44, and E=72 would be ranked 1=91, 2=E, 3=C, 4.5=A and D, assuming that high numbers are preferred. Ranks cannot be used in arithmetic calculations such as addition or division. [See also category data, indexes, numerical data, transformations]
Regression equation: Description of the “best fit” summary of the relationship between pairs (or more complex sets) or data. Like the mean, the regression line does not describe actual data (except in trivial cases); the regression equation is convenient summary. It is constructed as the line that minimizes the sum of differences between each point and the regression line in a scattergram. In the most common, straight line or linear case, the point estimate of the mean is replaces by a “running” point with the formula y = c + b * x, where y is the dependent or predicted value in each pair of observations and x is the independent or predictor value in each pair. C in this equation is called the intercept and a is the slope of the regression line. [See intercept, prediction, residual, slope, standard error]
Reliability (consistency, precision): Consistency among observations. Normally measured as a correlation coefficient among variables Reliability is a function of the number of raters or ratings, so reliability can be improved by increasing the number of observations. Technically, reliability is a characteristic of the judgment made from data collection and not of the data collection procedure. [See correlation coefficient]
Residual: The difference between the predicted value of y from the regression equation and its actual value. (Residuals are used in constructing the regression line. The regression line is the line that minimizes the sum of the squared residuals.) [See regression equation, prediction, standard error]
Sample: Non-exhaustive set of observations from a “universe” – all the possible observations that could be made considering time, observed and method, representative objects, etc. EX: The observed improvement in six patients following periodontal surgery is a sample of the effects of all such surgeries. Inferences or generalizations are conclusions about cases that resemble those sampled but which were not actually observed. There is always a probability that descriptions from the sample may not match descriptions from the domain about which one makes a generalization. [See also population, random]
Scattergram: Graphic depiction of association between matched pairs of values. Each pair of observations is represented as a point where the values on the two dimensions intersect. The horizontal exist is referred to as the x-axis, and the vertical axis is called the y axis. Scattergrams are used to depict both correlation and regression. For correlation, it does not matter which variable is labeled x and which is labeled y. In regression, it doe matter. The dependent or predicted variable is always labeled y, while the independent or predictor variable is always labeled x. [See correlation coefficient, regression equation]
Sensitivity (of a test); The proportion of cases with a condition that are correctly classified by the test as having the condition. Number of true positive cases divided by the number of true positive and false negative cases. Tests with high sensitivity are likely to find a condition if it exists. [See accuracy, prevalence, specificity]
Skewed distribution: Asymmetry in a distribution of scores. In positively skewed distributions, the tail (the few extreme values) is on the right toward high or positive values. In a negatively skewed distribution, the tail points toward the negative values. [See also continuous distribution, discrete distribution, frequency distribution, Matthew distribution, Pareto distribution]
Slope (of a regression equation): The size of the increment in every predicted value (y) associated with a unit change in the predictor value (x) in a regression situation. A slope of 0.0 is associated with a flat (horizontal) line. As the value of the slope increases, the steepness of the regression line increases. Positive a-values (slopes) appear graphically as a curve rising from left to right; negative slopes appear graphically as a curve falling from left to right. [See intercept, scattergram, slope]
Specificity (of a test); The proportion of cases that do not have a condition that are correctly classified by the test as not having it. Number of true negative cases divided by the number of false positive and true negative cases. Tests with high sensitivity are unlikely to report a condition as existing when it does not. [See accuracy, prevalence, sensitivity]
Spurious correlation: Observed association between two variables that could be attributed to a common, but unmeasured third variable. The adjective “spurious” correctly refers to the inference made from the correlation, not the correlation itself. EX: If A and B are correlated, it is not possible to conclude that (on that fact alone) that A causes B. It is also possible that B causes A or that a C exists that influences both A and B in a systematic fashion. [See correlation coefficient, partial correlation]
Square distribution: Distribution where all variables have an equal frequency or probability. “Flat” distributions are between normal and square distributions. [See also continuous distribution, discrete distribution, frequency distribution, Matthew distribution, Pareto distribution]
Standard error (in regression): A measure of dispersion in regression estimates analogous to the standard deviation in calculating the mean. The standard error in regression can be used to estimate the wobble in the estimates of (1) the intercept (c), (2) the slope (b), (3) the mean of all future estimated values of y at a given x-value, or (4) any individual future estimated value of y at a given x-value. [See regression equation]
Standard deviation: A calculated value describing the spread in a set of scores. Larger standard deviations result from more scattered sets, with a few extreme values contributing more to the standard deviation that many values near the mean (because differences fro the mean are squared in the formula). The standard deviation is the square root of the average squared deviations of each value from the mean. It is abbreviated SD. The square of the standard deviation is known as the variance. [See also range, variance]
Standard error (SE): A theoretical value expressing expected chance measure of effect under assumed circumstances. The formula for calculating standard error is determined by the test chosen. The actual value is calculated based on data observed when sampling during research. The standard error is affected by sample size and standard deviation in the sample. Larger sample sizes produce smaller standard errors (diminishingly so) and smaller standard deviations produce smaller standard errors (diminishingly so). [See measure of effect, test statistic]
Standardized score (normalized score): Transformation on all scores in a set so they are expressed in units representative of the set from which they are drawn. {x z = (x i – X) / SD} Used when comparing or combining scores from sources that have different averages or variance. EX: different examiners who are not calibrated evaluate different patients in a clinical trial.
Stratified sample: Structuring of a sample to include predetermined numbers of items with various characteristics. EX: Four or five times the number of men may be sampled in a study of TMD because the condition is more uncommon among men than women. [See also sample]
t-test (z-test) (lower case): Statistical test for differences between an observed average value and a predetermined standard or between two observed average values. When the sample size is smaller than 30, the test statistic is known as a t-value and it is “looked up” in a statistical table to determine the p-value using the appropriate number of degrees of freedom based on sample size. When the sample size is larger than 30, the test statistic is called a z-value and degrees of freedom are not considered. EX: The installation directions for a piece of equipment say that its running temperature should be 110 o; is an average observed temperature over 40 trials of 112 o cause for alarm? The average completion time for a restoration using a new material on six restorations is 20 minutes. Is this faster than the average on ten trials with the old method and an average speed of 24 minutes? Can be performed as a hand calculation, support from Excel. {See templates – Choosing a statistical test, t-test.}
Test for proportions: The name of a statistical test used to determine whether an observed proportion is different from a predetermined proportion or whether two proportions are different from each other. For sample sizes about 30, the test produces a z-value (test statistic) that is “looked up” in a table to determine the probability (p-value) of the difference in proportions being attributable to chance. EX: The proportion of women with TMD problems is greater than the proportion of women in the population generally. Usually performed as a hand calculation. {See Templates – choosing a statistical test, test of proportions.}
Test of significance for correlations: When the correlation coefficient is used as a measure of association between a pair of variables, the null hypothesis can be tested that there is no association between the variables (r = 0.0). {Formula: t = [r * SQRT(n – 2)] / SQRT( 1 – r 2)].} Performed as a hand calculation following Excel calculation of r. {See templates – Choosing a statistical test.} [See correlation coefficient.]
Test of significance for parameters of a regression equation: A regression equation uses known values of x to predict a value of y, given a calculated constant (intercept) and one or more coefficients (slopes). It is possible to determine whether the constants and the coefficients are different from predetermined values – usually 0.0 for the constant and 1.0 for the slopes. This tests is performed using the difference between the observed and the hypothesized values as the measure of effect and dividing by the appropriate standard error. Performed using statistical software. {See templates – Choosing a statistical test.} [See regression equation]
Test for significance of a regression equation: A regression equation uses known values of x to predict a value of y, given a calculated constant (intercept) and one or more coefficients (slopes). It is possible to test for the significance of the overall equation – does it provide any more predictive advantage than would be gained by chance? The test returns an F-value that is used, with appropriate degrees of freedom to determine a p-value. This is a test of the overall usefulness of the equation, not of its individual parameters. Performed using statistic al software. {See templates – Choosing a statistical test.} [See regression equation]
Test statistic: Value calculated from research data in order to test a hypothesis. The test is selected based on characteristics of the hypothesis and the data. Typically, a statistical test is a ratio of the observed measure of effect and the standard error (a chance measure of expected effect). The test statistic is “looked up” in an appropriate table to convert it to a p-value. Different test statistics (z, t, F, χ 2, U, etc.) are all entered with the test statistic and return a p-value. All test statistics under 1.0 are insignificant. [See measure of effect, p-value, standard error]
Two-tailed test (non-directional test): If there are two ways to disprove a null hypothesis, the statistical test is two-tailed. EX: “There is no correlation between age at first dental visit and DMF at age 30” is a two-tailed test – there are two ways to disprove it (finding a positive correlation or finding a negative correlation). The operational significance of a two-tail test comes in determining the p-value from the test statistic. Two-tail tests have higher p-values for a given test statistic than do one-tail tests. [See one-tailed test, null hypothesis]
Transformation: Multiplication (and rarely the addition) of a set of numbers by the same value, used to change the relative distances among numbers in a set. Normally this is done when a set of numbers has undesirable characteristics such as extreme skew, large tails, or is otherwise not a normal curve. Common transformations include logs and cosines. [See also category data, indexes, numerical data, ranks]
Type I error: Claiming a statistically significant difference or association exists when it does not. This is caused by violating the assumptions of a statistical test, using the wrong test, or by random error. EX: An experiment is performed showing that one method of obdurating cannels is better than another. If the claim is made that the method is superior, there is still a possibility of being wrong, making a Type I error. A priori, Type I error is set by picking an alpha-level for the test. [See Alpha, Operating characteristic curve, power, Type II error]
Type II error (β): Failing to find a true and existing difference or association based on a statistical test. This is caused by violating the assumptions of a statistical test, using the wrong test, or by random error. EX: An experiment is performed on two obduration techniques, but it is inconclusive. The possibility that no difference will be found when one really exists is called the
Type II error. A priori, Type I error is set by picking an alpha level for the test. The probability of a Type II error is abbreviated β. [See alpha, operating characteristic curve, power, power curve, Type I error]
Validity (accuracy, lack of bias): Extent to which available information is useful for predicting other, unmeasured information of interest. Decisions made from a sample of observed data are valid to the extent that the same judgment would be made if nearly exhaustive sets of data were available. Concurrent validity refers to using one set of data to predict what is happening at the same time but is unobserved. Predictive validity refers to using use set of data to predict what will happen in the future. Lack of validity is represented by coefficients near 0.0; high reliability is in the .70 to .90 range (coefficients above that are normally trivial, such as heart function and death). Validity is independent of number of observations. Increasing sample size does not increase validity. [See correlation coefficient]
Variance: The average of squared deviations of scores from the mean is a theoretical measure of spread among scores in a set. Variance is in squared units, not the same units used to make observations. Typically, variance is a theoretical concepts or a general way of talking about spread of scores and is abbreviated σ 2, sigma squared. [See also standard deviation]
Weighted mean (weighted average): Average value adjusted for importance of some or all of the items. EX: Annual net income of dentists is usually expressed as the weighted mean of the incomes of general practitioners and specialists where weighting is the number of each type of practitioner in the population (not the number of practitioners of each type in the sample). This technique is used when reporting the results of stratified sampling. [See also mean, stratified sample]
Coefficient of determination (R 2). The proportion of variance measured is abbreviated R 2. In any set of measures, there are many factors that combine in various patterns to produce some high values and some low ones. The differences among the measures can be quantified. The observed differences in a set of measures is normally calculated as the standard deviation. The squared standard deviation is called the variance. (As a convention, when scientists talk about variance in general or as a theoretical concept, they refer to it as variance.) The total variance measured in any situation is composed of the measured variation -- so much of it due to treatment effect, so much due to demographic factors, etc -- plus the unmeasured variance (called unexplained variance or error variance). A ratio can be made of the measured variance divided by the total variance (measured plus error variance). This ratio is the coefficient of determination or R 2. The coefficient of determination is usually calculated using multiple regression. This is a statistical technique that tests for the predictive power of sets of variances (known as models) on an outcome variable. In the typical case, all variables in the regression model are assumed to be measured on an interval scale. Some alternatives include nonparametric regression and logistic regression. When the predictor variables, in combination, very accurately predict the outcome variable, the R 2 value approaches 1.0. When there is no predictive power in the set of variables selected, R 2 is near 0.0 (negative values are not possible). The R 2 distribution is not normal, and a coefficient of determination of .40 is not twice as strong as one of .20. There is a statistical test, which yields an F-value that ca be converted to p, for whether the set of predictors is significantly different from 0.0. The formula that describes the change in outcome variables resulting from changes in the predictor variables is called a regression equation. Typically, regression analysis and its resulting coefficients of determination are performed on computers using statistical packages.
Let =s assume that a researcher investigates the analgesic effect of a drug by measuring reported post-operative pain. Dosage is entered into a regression analysis as a predictor of reported pain. The regression equation shows a curve with a negative slope -- the larger the dose, the lower the pain. If the R 2-value is .30, this means that 30% of the differences in reported pain are related to differences in dosages. This might be highly significant or insignificant, depending on the sample size. If the regression equation had been calculated using time since surgery as the predictor, the R 2 might have been higher -- the greater the interval, the less pain. Age, history of previous visits, trauma involved in the surgery, and many other predictors could be entered into the regression analysis. Together, their predictive power, R 2, is less than the sum of each of them independently because the predictors share some variation with each other. Typically, R 2 reaches a value in the range of .40 to .70 with a combination of three to five well-chosen predictors. The addition of other predictors adds little practical significance. R 2-values greater than .80 are suspect. Either the predictors are really alterative measures of the outcome variable or the number of predictors is approaching the number of subjects (a statistical artifact).
Meta-analysis. Meta-analysis is a quantitative method to support extraction of conclusions for many research studies published in the same area. A variety of approaches is available, but the following steps are customary. (a) Define the criteria that must be met by previously published studies to be included in the sample. (b) Determination of a common measure of effect size. This must be a quantitative outcomes used in all selected studies. Examples include standardized differences between means or correlation coefficients. (c) Code all qualifying studies, including effect size and classification factors. A classification factors would be major types of treatments, age groups, settings, or any other differences commonly occurring in the literature. (d) Calculate average effect sizes. (e) Test for differences in effect size related to various classification factors. One of the most critical issues in meta-analysis is defining criteria for inclusion of studies. It is sometimes argued that meta-analysis overestimates effect sizes because non-significant studies tend not to be submitted or accepted for publication. This is known as the bottom drawer effect.
There are few meta-analyses performed in dentistry, although they are common in medicine and very common in other disciplines.
|