What is Which Stats Test? Question 1 - What is the purpose of your analysis? To look for differences between sets of data. To look for an association between variables. Therefore, when choosing a test it is important that you consider how many variables one wishes to analyze. One set of tests is used on single variables often referred to as descriptive statistics , a second set is used to analyze the relationship between two variables and a third set used to model multivariable relationships i.
For example, in the comparison of two antihypertensive drugs, the endpoint can be the change in BP in the two treatment groups. The change in BP is a continuous endpoint. It is also necessary to distinguish whether a continuous endpoint is approximately normally distributed or not. If, however, one only considers whether the diastolic BP falls under 90 mm Hg or not, the endpoint is then categorical.
It is binary, as there are only two possibilities. A statistical test is used to compare the results of the endpoint under different test conditions such as treatments. There are often two therapies. If results can be obtained for each patient under all experimental conditions, the study design is paired dependent. For example, two times of measurement may be compared, or the two groups may be paired with respect to other characteristics.
Typical examples of pairs are studies performed on one eye or on one arm of the same person. Typical paired designs include comparisons before and after treatment. These involve selecting a group where each subject is matched to a particular subject in the other group and they necessitate that the two groups be equal in size.
The data are then no longer independent and should be treated as if they were paired observations from one group. With an unpaired or independent study design, results for each patient are only available under a single set of conditions. The results of two or more groups are then compared. The group sizes can be either equal or different. The most important statistical tests are listed in Table 1.
However, the procedure is similar for the group comparison of categorical endpoints with multiple values [ Table 1 ]. In such a case, one has to perform the McNemar test. One more example for McNemar test is drug dose versus insomnia. Figure 2 shows a decision algorithm for test selection. The so-called parametric tests can be used if the endpoint is normally distributed. Where subjects in both groups are independent of each other persons in first group are different from those in second group , and the parameters are normally distributed and continuous, the unpaired t -test is used.
If a comparison is to be made of a normally distributed continuous parameter in more than two independent unpaired groups, analysis of variance ANOVA can be used. One example would be a study with three or more treatment arms. ANOVA is a generalization of the unpaired t -test. ANOVA only informs whether the groups differ, but does not say which groups differ.
This requires methods of multiple testing. The paired t -test is used for normally distributed continuous parameters in two paired groups. If a normally distributed continuous parameter is compared in more than two paired groups, methods based on ANOVA are also suitable. The factor describes the paired groups—e. If the parameter of interest is not normally distributed, but at least ordinally scaled, nonparametric statistical tests are used.
This necessitates putting the values in order of size and giving them a running number. The test variable is then calculated from these rank numbers. If the necessary preconditions are fulfilled, parametric tests are more powerful than non-parametric tests. However, the power of parametric tests may sink drastically if the conditions are not fulfilled.
The Mann—Whitney U test also known as the Wilcoxon rank sum test can be used for the comparison of a non-normally distributed, but at least ordinally scaled, parameter in two unpaired samples. The Wilcoxon signed rank test can be used for the comparison of two paired samples of non-normally distributed parameters, but on a scale that is at least ordinal. Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required. Home Support. You should definitely select a nonparametric test in three situations: The outcome is a rank or a score and the population is clearly not Gaussian.
Some values are "off the scale," that is, too high or too low to measure. Even if the population is Gaussian, it is impossible to analyze such data with a parametric test since you don't know all of the values. Using a nonparametric test with these data is simple. Assign values too low to measure an arbitrary very low value and assign values too high to measure an arbitrary very high value.
Then perform a nonparametric test. Since the nonparametric test only knows about the relative ranks of the values, it won't matter that you didn't know all the values exactly.
The data ire measurements, and you are sure that the population is not distributed in a Gaussian manner. If the data are not sampled from a Gaussian distribution, consider whether you can transformed the values to make the distribution become Gaussian. For example, you might take the logarithm or reciprocal of all values.
There are often biological or chemical reasons as well as statistical ones for performing a particular transform. Consider these points: If you collect many data points over a hundred or so , you can look at the distribution of data and it will be fairly obvious whether the distribution is approximately bell shaped. A formal statistical test Kolmogorov-Smirnoff test, not explained in this book can be used to test whether the distribution of the data differs significantly from a Gaussian distribution.
With few data points, it is difficult to tell whether the data are Gaussian by inspection, and the formal test has little power to discriminate between Gaussian and non-Gaussian distributions. You should look at previous data as well. Remember, what matters is the distribution of the overall population, not the distribution of your sample. In deciding whether a population is Gaussian, look at all available data, not just data in the current experiment.
Consider the source of scatter. When the scatter comes from the sum of numerous sources with no one source contributing most of the scatter , you expect to find a roughly Gaussian distribution. When in doubt, some people choose a parametric test because they aren't sure the Gaussian assumption is violated , and others choose a nonparametric test because they aren't sure the Gaussian assumption is met.
There are four cases to think about: Large sample. What happens when you use a parametric test with data from a nongaussian population? The central limit theorem discussed in Chapter 5 ensures that parametric tests work well with large samples even if the population is non-Gaussian.
In other words, parametric tests are robust to deviations from Gaussian distributions, so long as the samples are large. The snag is that it is impossible to say how large is large enough, as it depends on the nature of the particular non-Gaussian distribution.
Unless the population distribution is really weird, you are probably safe choosing a parametric test when there are at least two dozen data points in each group. Large sample. What happens when you use a nonparametric test with data from a Gaussian population? However, the second canonical correlation of. Factor analysis is a form of exploratory multivariate analysis that is used to either reduce the number of variables in a model or to detect relationships among variables.
All variables involved in the factor analysis need to be interval and are assumed to be normally distributed. The goal of the analysis is to try to identify factors which underlie the variables. There may be fewer factors than variables, but there may not be more factors than variables. We will include subcommands for varimax rotation and a plot of the eigenvalues. We will use a principal components extraction and will retain two factors.
Using these options will make our results compatible with those from SAS and Stata and are not necessarily the options that you will want to use. Communality which is the opposite of uniqueness is the proportion of variance of the variable i.
The scree plot may be useful in determining how many factors to retain. From the component matrix table, we can see that all five of the test scores load onto the first factor, while all five tend to load not so heavily on the second factor.
The purpose of rotating the factors is to get the variables to load either very high or very low on each factor.
In this example, because all of the variables loaded onto factor 1 and not on factor 2, the rotation did not aid in the interpretation. Instead, it made the results even more difficult to interpret.
Click here to report an error on this page or leave a comment. Your Name required. Your Email must be a valid email for us to receive the report! How to cite this page. About the hsb data file Most of the examples in this page will use a data file called hsb2, high school and beyond.
One sample t-test A one sample t-test allows us to test whether a sample mean of a normally distributed interval variable significantly differs from a hypothesized value.
One sample median test A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value. Binomial test A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value. Chi-square goodness of fit A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions.
Two independent samples t-test An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups.
See also SPSS Learning Module: An overview of statistical tests in SPSS Wilcoxon-Mann-Whitney test The Wilcoxon-Mann-Whitney test is a non-parametric analog to the independent samples t-test and can be used when you do not assume that the dependent variable is a normally distributed interval variable you only assume that the variable is at least ordinal.
Chi-square test A chi-square test is used when you want to see if there is a relationship between two categorical variables. One-way ANOVA A one-way analysis of variance ANOVA is used when you have a categorical independent variable with two or more categories and a normally distributed interval dependent variable and you wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable.
The command for this test would be: oneway write by prog. Paired t-test A paired samples t-test is used when you have two related observations i. Wilcoxon signed rank sum test The Wilcoxon signed rank sum test is the non-parametric version of a paired samples t-test. One-way repeated measures ANOVA You would perform a one-way repeated measures analysis of variance if you had one categorical independent variable and a normally distributed interval dependent variable that was repeated at least twice for each subject.
Ordered logistic regression Ordered logistic regression is used when the dependent variable is ordered, but not continuous. See also Annotated output for logistic regression Correlation A correlation is useful when you want to see the relationship between two or more normally distributed interval variables. Missing Data in SPSS Simple linear regression Simple linear regression allows us to look at the linear relationship between one normally distributed interval predictor and one normally distributed interval outcome variable.
Simple logistic regression Logistic regression assumes that the outcome variable is binary i. Multiple regression Multiple regression is very similar to simple regression, except that in multiple regression you have more than one predictor variable in the equation.
Multiple logistic regression Multiple logistic regression is like simple logistic regression, except that there are two or more predictors. Canonical correlation Canonical correlation is a multivariate technique used to examine the relationship between two groups of variables.
F Hypoth. Univariate F-tests with 2, D. Variable Sq. R Adj. Variable 1 READ. Covariate 1 MATH. F statistics are exact. Univariate F-tests with 1, D. Variable Hypoth. Raw discriminant function coefficients Function No. Factor analysis Factor analysis is a form of exploratory multivariate analysis that is used to either reduce the number of variables in a model or to detect relationships among variables.
0コメント