This is the students' measurement report. It is presented in three different ways, ordered by measure (Table 7.1.1), by fit (Table 7.1.2), and by element number (Table 7.1.3). The information within it is identical, but the three sub-tables make it easy to identify key information. `Table 7.1.2 Students Measurement Report (arranged by fN).`
`+---------------------------------------------------------------------------------------------------------------+`
`| Total Total Obsvd Fair-M| Model | Infit Outfit |Estim.| Correlation | |`
`| Score Count Average Avrage|Measure S.E. | MnSq ZStd MnSq ZStd|Discrm| PtMea PtExp | Num Students |`
`|-------------------------------+--------------+---------------------+------+-------------+---------------------|`
`| 376 139 2.71 2.81| 4.04 .20 | 1.52 3.1 1.22 .8| .66 | .41 .55 | 996 996 |`
`| 339 140 2.42 2.52| 2.79 .16 | 1.25 1.9 1.39 2.3| .68 | .47 .63 | 850 850 |`
`| 380 136 2.79 2.88| 4.57 .24 | 1.21 1.1 1.16 .5| .83 | .39 .52 | 823 823 |`
`| 391 140 2.79 2.86| 4.41 .23 | 1.19 1.1 1.01 .1| .87 | .40 .50 | 973 973 |`
`| 385 140 2.75 2.85| 4.31 .22 | 1.19 1.2 1.04 .2| .85 | .44 .54 | 991 991 |`
`| 391 140 2.79 2.89| 4.68 .23 | 1.19 1.1 .91 -.1| .90 | .40 .50 | 831 831 |`
`| 326 133 2.45 2.54| 2.86 .17 | 1.13 1.0 .99 .0| .92 | .62 .64 | 790 790 |`
`| 354 140 2.53 2.64| 3.24 .17 | .92 -.5 .91 -.4| 1.07 | .62 .60 | 837 837 |`
`| 311 139 2.24 2.30| 2.12 .15 | .97 -.2 .88 -.9| 1.11 | .73 .66 | 847 847 |`
`| 314 140 2.24 2.31| 2.15 .15 | .86 -1.1 .80 -1.7| 1.20 | .71 .65 | 949 949 |`
`| 349 139 2.51 2.60| 3.09 .17 | .72 -2.3 .66 -2.1| 1.30 | .72 .62 | 915 915 |`
`| 354 140 2.53 2.58| 3.01 .17 | .82 -1.4 .66 -2.2| 1.27 | .74 .61 | 975 975 |`
`| 367 140 2.62 2.72| 3.60 .19 | .86 -1.0 .64 -1.8| 1.19 | .65 .59 | 968 968 |`
`| 379 140 2.71 2.79| 3.95 .20 | .90 -.6 .60 -1.7| 1.17 | .67 .56 | 891 891 |`
`|-------------------------------+--------------+---------------------+------+-------------+---------------------|`
`| 358.3 139.0 2.58 2.66| 3.49 .19 | 1.05 .2 .92 -.5| | .57 | Mean (Count: 14) |`
`| 26.6 2.0 .19 .19| .83 .03 | .21 1.5 .23 1.3| | .13 | S.D. (Population) |`
`| 27.6 2.0 .19 .20| .86 .03 | .22 1.5 .24 1.3| | .14 | S.D. (Sample) |`
`+---------------------------------------------------------------------------------------------------------------+`
`Model, Populn: RMSE .19 Adj (True) S.D. .81 Separation 4.17 Strata 5.90 Reliability .95`
`Model, Sample: RMSE .19 Adj (True) S.D. .84 Separation 4.34 Strata 6.12 Reliability .95`
`Model, Fixed (all same) chi-square: 269.2 d.f.: 13 significance (probability): .00`
`Model, Random (normal) chi-square: 12.4 d.f.: 12 significance (probability): .41`
----------------------------------------------------------------------------------------------------------------- The main body of the table presents columns of information about the individual students. The single most important information is given by the "Measure" column, which presents the person ability in logits. The conversion to logits results in an equal-interval measurement scale, so this column tells us how the students are spaced along the ability scale. Next is the standard error (S.E.) for each person. We can see that the error is not constant across the measurement scale, meaning that we have more confidence in the estimates of some persons than of others. Plus or minus 2 x S.E. gives an approximate 95% confidence interval, so, in general, we need about 0.8 logits separation between two persons to have confidence that they really are different in ability. A reliability coefficient is reported in the notes at the bottom of the table. Although the reliability of .95 reported in this case is excellent, it's not obvious what this means in real world terms. Therefore, a separation index is also reported. In this case, separation of greater than 4 is reported, meaning the level of measurement error is small enough compared to the range of ability that we can have confidence that the test can statistically separate students into 4 bands of performance. We can be extremely confident that the students with high logit measures really are more proficient than those with low ability measures. The first four columns summarize the raw responses. "Total Score" is the sum of all ratings for each person, "Total Count" is the number of ratings, and "Observed Average" is the mean rating. The "Fair-M Average" is slightly more complicated. Because this is peer-assessment data and students did not rate their own performances, each student was rated by a different set of raters. Also, there were a few missed observations, so some students were rated on a different set of items. Some raters were stricter than others and some items were more difficult than others, so the fair-measure average gives raw ratings adjusted for rater severity and item difficulty. If we need to report raw scores, the fair-measure is preferable to the observed average score. Fit statistics are one of the most confusing aspects of Rasch analysis, but provide very valuable quality control information about how the responses for each person, rater, or item match the expectations of the model. The infit statistic is "information weighted", meaning that it emphasizes responses where the person ability and item difficulty are well matched. This is the critical zone for measurement, so problems with infit statistics are an important indicator that measurement is distorted. The "outfit" figures are equally weighted for all responses, so this is more sensitive to outlying responses (for example, when a low ability person succeeds on a very difficult item, or a high ability person fails on a very easy item). The mean-square (MnSq) statistic shows the magnitude of the misfit, with an expected value of 1.00. We can think of this as representing 100% of the expected "noise" in the data. A value greater than 1.00 indicates more noise, while a value less than 1.00 means that the responses are more consistent than expected. A common rule-of-thumb is to investigate mean-square statistics greater than 1.50. In this case, the mean outfit value is 0.92, which indicates that the data is more consistent than expected overall. Paradoxically, this is not necessarily a good thing as it may indicate holistic rating, where raters consider the overall performance rather than the individual rubric items. This provides less information about the performances, and thus measurement is muted. We can also see that the mean infit value is 1.05, indicating less consistency in the inlying responses than expected. These fit statistics are not what we expect, so we need to investigate more closely to find out what happened. Looking at individual students, we can see that Student 996 has an infit value of 1.52 and an outfit value of 1.22. The outfit value can be thought of as representing 22% more noise than expected, a value that doesn't threaten measurement, but the infit value shows 52% more noise than expected, a level that is of concern. Measurement is still occurring for this person, but some unexpected responses have occurred, so we might want to investigate using Table 4.1. |