What statistical analysis should i use ucla




















Enrollment limited; early enrollment advised. Enrollment deadline: January 7, Internet access required. Materials required. Learn how we can help your organization meet its professional development goals and corporate training needs. Innovation Programs.

Student Scholarships. Coding Boot Camp. Lifelong Learning. View Course Options. What you will learn. You also want to consider the nature of your dependent variable, namely whether it is an interval variable, ordinal or categorical variable, and whether it is normally distributed see What is the difference between categorical, ordinal and interval variables?

The table then shows one or more statistical tests commonly used given these types of variables but not necessarily the only type of test that could be used and links showing how to do such tests using SAS, Stata and SPSS.

Statistical errors are the deviations of the observed values of the dependent variable from their true or expected values. These errors are unobservable, since we usually do not know the true values, but we can estimate them with residuals, the deviation of the observed values from the model-predicted values.

Additionally, many of these models produce estimates that are robust to violation of the assumption of normality, particularly in large samples. In deciding which test is appropriate to use, it is important to consider the type of variables that you have i. Most of the examples in this page will use a data file called hsb2, high school and beyond.

This data file contains observations from a sample of high school students with demographic information about the students, such as their gender female , socio-economic status ses and ethnic background race. It also contains a number of scores on standardized tests, including tests of reading read , writing write , mathematics math and social studies socst.

You can get the hsb data file by clicking on hsb2. A one sample t-test allows us to test whether a sample mean of a normally distributed interval variable significantly differs from a hypothesized value. For example, using the hsb2 data file , say we wish to test whether the average writing score write differs significantly from We can do this as shown below.

The mean of the variable write for this particular sample of students is We would conclude that this group of students has a significantly higher mean on the writing test than A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value. We will use the same variable, write , as we did in the one sample t-test example above, but we do not need to assume that it is interval and normally distributed we only need to assume that write is an ordinal variable.

A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value. A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions. We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions.

An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups. For example, using the hsb2 data file , say we wish to test whether the mean for write is the same for males and females. Because the standard deviations for the two groups are similar In other words, females have a statistically significantly higher mean score on writing The Wilcoxon-Mann-Whitney test is a non-parametric analog to the independent samples t-test and can be used when you do not assume that the dependent variable is a normally distributed interval variable you only assume that the variable is at least ordinal.

We will use the same data file the hsb2 data file and the same variables in this example as we did in the independent t-test example above and will not assume that write , our dependent variable, is normally distributed. A chi-square test is used when you want to see if there is a relationship between two categorical variables. In SPSS, the chisq option is used on the statistics subcommand of the crosstabs command to obtain the test statistic and its associated p-value.

Remember that the chi-square test assumes that the expected value for each cell is five or higher. This assumption is easily met in the examples below. The point of this example is that one or both variables may have more than two levels, and that the variables do not have to have the same number of levels. In this example, female has two levels male and female and ses has three levels low, medium and high. Please see the results from the chi squared example above.

A one-way analysis of variance ANOVA is used when you have a categorical independent variable with two or more categories and a normally distributed interval dependent variable and you wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable.

For example, using the hsb2 data file , say we wish to test whether the mean of write differs between the three program types prog.

The command for this test would be:. The mean of the dependent variable differs significantly among the levels of program type. However, we do not know if the difference is between only two of the levels or all three of the levels. The F test for the Model is the same as the F test for prog because prog was the only variable entered into the model.

If other variables had also been entered, the F test for the Model would have been different from prog. To see the mean of write for each level of program type,. From this we can see that the students in the academic program have the highest mean writing score, while students in the vocational program have the lowest. The Kruskal Wallis test is used when you have one independent variable with two or more levels and an ordinal dependent variable. In other words, it is the non-parametric version of ANOVA and a generalized form of the Mann-Whitney test method since it permits two or more groups.

We will use the same data file as the one way ANOVA example above the hsb2 data file and the same variables as in the example above, but we will not assume that write is a normally distributed interval variable. If some of the scores receive tied ranks, then a correction factor is used, yielding a slightly different value of chi-squared. With or without ties, the results indicate that there is a statistically significant difference among the three type of programs.

A paired samples t-test is used when you have two related observations i. For example, using the hsb2 data file we will test whether the mean of read is equal to the mean of write. The Wilcoxon signed rank sum test is the non-parametric version of a paired samples t-test. You use the Wilcoxon signed rank sum test when you do not wish to assume that the difference between the two variables is interval and normally distributed but you do assume the difference is ordinal.

We will use the same example as above, but we will not assume that the difference between read and write is interval and normally distributed.

The results suggest that there is not a statistically significant difference between read and write. If you believe the differences between read and write were not ordinal but could merely be classified as positive and negative, then you may want to consider a sign test in lieu of sign rank test.

Again, we will use the same variables in this example and assume that this difference is not ordinal. These binary outcomes may be the same outcome variable on matched pairs like a case-control study or two outcome variables from a single group. Continuing with the hsb2 dataset used in several above examples, let us create two binary outcomes in our dataset: himath and hiread.

These outcomes can be considered in a two-way contingency table. The null hypothesis is that the proportion of students in the himath group is the same as the proportion of students in hiread group i. You would perform a one-way repeated measures analysis of variance if you had one categorical independent variable and a normally distributed interval dependent variable that was repeated at least twice for each subject.

This is the equivalent of the paired samples t-test, but allows for two or more levels of the categorical variable. This tests whether the mean of the dependent variable differs by the categorical variable. In this data set, y is the dependent variable, a is the repeated measure and s is the variable that indicates the subject number.

You will notice that this output gives four different p-values. No matter which p-value you use, our results indicate that we have a statistically significant effect of a at the. If you have a binary outcome measured repeatedly for each subject and you wish to run a logistic regression that accounts for the effect of multiple measures from single subjects, you can perform a repeated measures logistic regression.

The exercise data file contains 3 pulse measurements from each of 30 people assigned to 2 different diet regiments and 3 different exercise regiments. A factorial ANOVA has two or more categorical independent variables either with or without the interactions and a single normally distributed interval dependent variable.

For example, using the hsb2 data file we will look at writing scores write as the dependent variable and gender female and socio-economic status ses as independent variables, and we will include an interaction of female by ses. Note that in SPSS, you do not need to have the interaction term s in your data set. You perform a Friedman test when you have one within-subjects independent variable with two or more levels and a dependent variable that is not interval and normally distributed but at least ordinal.



0コメント

  • 1000 / 1000