Homework #2 Introductory Statistics [Note: Look for things like "**1.)" below for the actual homework problems. There are 6 in all.] Dr. Jones is studying a new tranquilizer. He has demonstrated in past experiments that tranquil mice will stand still longer when placed in the middle of a test box. To measure of how well the new tranquilizer is working, he selects 20 mice and randomly assigns 10 to each of two groups. Call these Group A, and Group B. He gives his new tranquilizer to the mice in Group B, and a placebo to the mice in Group A. He then places each mouse in the center of his test box and measures with a stop watch how long the mouse stands still before starting to explore the box. The recorded times (in seconds) before each mouse begins to explore are recorded as the dependent measure of how well the tranquilizer is working. Naturally, Jones hopes that the mice in Group B will stand still longer than those in Group A. The data that Jones recorded are in the second data set as columns A and B (if you import the data into SPSS or Excel), or in df2$A and df2$B if you import the data into R with the command: df2 = read.csv(file="data2.csv") **1.) State the null and alternative hypotheses for this experiment. **2.) Use a t test to evaluate Jones's hypothesis. Which version of the test did you use and why? What conclusions can you draw? **3.) Assume that Jones used the same 10 mice for each of the two conditions (no tranquilizer and tranquilizer). How does that change your analysis and conclusions? **4.) Repeat the analyses for questions 2 and 3 using a non-parametric test (we recommend the Wilcoxon rank sum test (AKA Mann-Whitney U), and Wilcoxon signed rank test respectively). How do your results differ from the previous parametric analyses? Chi Square tests and contingency tables. There are many experimental situations where outcomes are categorical. For example, the results of corrective surgery might be categorized in terms of (a) no improvement, (b) partial recovery, or (c) full recovery of function. These outcomes could be coded as 0, 1, and 2 respectively to give them a numerical value, but usually, it is inappropriate to use parametric statistical analysis on such data. Nonetheless, it is often important to be able to test hypotheses related to categorical outcomes. Chi Square is often the appropriate statistic in such cases. Let us consider two cases. In the first and simpler case, the Chi-Square statistic can be used to assess goodness of fit to a specific distribution. Suppose that, on theoretical grounds, one expects a tissue culture to contain three cell types and that one type of cell, say Type 2, should be more common than the other two. In this case, the null hypothesis should be that all three cell types are equally common. Now, suppose this experiment was carried out with the following results: Cell Type 1: 4780 Cell Type 2: 9502 Cell Type 3: 4772 Total: 19054 A Chi Square test of the null hypothesis that all three types are equally common in R would look like this: /* First we put the three observed counts into an array */ o = c(4780,9502,4772) /* Next we ask for the Chi Square test of the data in 'o' with the default hypothesis of equally distributed */ chisq.test(o) **5.) As part of your homework, please carry out this test either in R, SPSS or Excel. If you want to calculate this statistic in Excel, you will need some additional manual intervention to calculate Chi Square using the formula given in the class slides. If you are using SPSS or R, you can skip the following section. ============== E x c e l I n s t r u c t i o n s ============== Make a column (say, column A) containing the three cell type counts (A1:A3), then using a formula, you can make a second column (say, Column B) that calculates the cell-by-cell values that sum to Chi Square. Click on cell B1, and enter the following formula: = (A1 - SUM($A$1:$A$3)/3)^2 / (SUM($A$1:$A$3)/3) Copy this formula and paste it into cells B2 and B3. Then, in another cell, say B4, enter the formula: =SUM(B1:B3) The quantity displayed in cell B4 is the value of Chi Square for N - 1 = 2 degrees of freedom. You will then need to use a Chi Square table to look up the critical value of Chi Square for two degrees of freedom and p = 0.05. If the value of Chi Square is greater than this critical value, the test is significant at < p = 0.05. By the way, a Google search finds several Chi Square tables available on line. For instance, checking the table at http://www.richland.edu/james/lecture/m170/tbl-chi.html it appears that 5.991, 9.21, and 10.597 are the critical values of Chi Square with 2 degrees of freedom for p=.05, p=.01, and p=.005 respectively. This, if the value of Chi Square is > 10.597, you can say "p < .005"; if Chi Square is between 9.21 and 10.597, you can say "p < .01"; and so forth. ==================================================================== The results of the Chi Square test above, allow us to reject the hypothesis of no difference in the numbers of cells of each type. After looking at the results of the cell counts, it is tempting to form a more specific hypothesis: that there are going to be twice as many cells of type 2 than cells of the other two types. Chi Square can also be used to test this hypothesis. In this case, we are asking if the observed counts are so close to the exact 2:1 ratio that it is very improbable that the sample was drawn from a population of cells that were not distributed *exactly* in 2:1 ratio. To test this, we first need to determine what the cell counts would be if the distribution was really 2:1. This would be the case if 50% of the total cells were of type 2 and the remaining 50% were distributed as 25% of the cells being type 1 and 25% type 3. Since there were a total of 19504 cells, there should be 0.5 * 19504 = 9752 type 2 cells and 0.25 * 19504 = 4876 cells of types 1 and 3. These "expected" counts can then be compared to the "observed" counts to calculate a value for Chi Square. In this case, however, the larger the value of Chi Square, the greater the deviation between expected and observed cell counts. What we are actually looking for is a very small value of Chi Square, and a very *large* associated p value. To express the p value for this test in more usual terms, we will want to state (1.0 - p) as the probability that the observed cell counts were drawn from a population that was not distributed in exactly 2:1 ratio. For homework, use SPSS, R, or Excel to calculate both Chi Square for the null hypothesis of an equal distribution of cell types and for the point hypothesis of a 2:1 ratio for cell type 2 versus either type 1 or type 3. Tell us what conclusions you draw from both tests. In the above examples, Chi Square was used to test hypotheses regarding how observations are distributed over a set of categories. There was no specific order of structure to the categories; it was just a list. Another application of Chi Square is cross tabulated tables where the rows and columns of the table relate to experimental factors or groups. Suppose, we interviewed 90 people regarding their preferences for three types of music (Rock, Blues, Polka) and divided our interviewees into two groups--Spring Chickens (SC) and Old Fogies (OF)--with 45 subjects in each group. The results could come out something like: Rock Blues Polka SC 20 15 10 OF 10 15 20 What we would like to know is whether age group affects ones preference for the three types of music. Chi Square can be used to test this hypothesis. In this case, the null hypothesis is typically that the individual cell counts can be predicted by the row and column sums. The Chi Square calculation is exactly as for the cell type example given previously, however, the number of degrees of freedom is now (Nrows - 1)(Ncols - 1) or 2 in this case instead of Ncells - 1 or 5 in this case. **6.) As the last homework problem, use SPSS, R, or Excel to calculate Chi Square for the above example.