Index


Table 7.2

Table 7.2 gives the measurement report for raters. We can see that this provides the exact same information about raters as Table 7.1 did for students. Table 7.2.2 arranges the data by fit, and we can immediately see that Rater 991 might be the cause of the misfit shown by Student 996. The infit and outfit figures of 1.39 and 1.38 are somewhat misfitting, and the point-measure correlation is .39 compared with the expected value of .55. This suggests that this rater is interpreting the rubric somewhat differently than the average rater.

 

Table 7.2.2  Raters Measurement Report  (arranged by fN).
+-------------------------------------------------------------------------------------------------------------------+
|Total   Total   Obsvd  Fair-M|        Model | Infit      Outfit   |Estim.| Correlation | Exact Agree. |            |
|Score   Count  Average Avrage|Measure  S.E. | MnSq ZStd  MnSq ZStd|Discrm| PtMea PtExp | Obs %  Exp % | Num  Raters|
|-----------------------------+--------------+---------------------+------+-------------+--------------+------------|
| 303     139      2.18   2.21|   1.65   .15 | 1.50  3.7  1.44  3.4|  .46 |   .54   .61 |  44.4   45.0 |    1 1     |
| 334     129      2.59   2.67|    .11   .19 | 1.39  2.6  1.38  2.0|  .62 |   .39   .55 |  56.4   58.7 |  991 991   |
| 287     130      2.21   2.23|   1.60   .16 | 1.16  1.3  1.20  1.6|  .76 |   .47   .61 |  43.3   46.2 |  975 975   |
| 286     123      2.33   2.39|   1.11   .17 | 1.10   .8  1.10   .8|  .86 |   .42   .60 |  46.9   50.9 |  891 891   |
| 345     130      2.65   2.73|   -.15   .20 | 1.10   .7   .92  -.3| 1.01 |   .65   .54 |  61.5   60.6 |  968 968   |
| 349     130      2.68   2.75|   -.25   .20 | 1.08   .6   .91  -.3| 1.01 |   .60   .53 |  63.2   61.6 |  790 790   |
| 364     130      2.80   2.85|   -.88   .24 |  .91  -.4   .91  -.2| 1.04 |   .46   .47 |  63.7   63.8 |  949 949   |
| 354     130      2.72   2.81|   -.55   .21 |  .88  -.8  1.12   .5| 1.03 |   .48   .51 |  61.2   61.3 |  996 996   |
| 324     130      2.49   2.57|    .50   .17 |  .81 -1.6  1.01   .0| 1.15 |   .62   .57 |  57.3   55.9 |  823 823   |
| 341     130      2.62   2.69|    .04   .19 | 1.07   .5   .80 -1.1| 1.07 |   .70   .55 |  62.1   60.2 |  915 915   |
| 269     129      2.09   2.13|   1.87   .15 |  .75 -2.3   .77 -2.0| 1.24 |   .45   .61 |  35.5   41.3 |  973 973   |
| 350     127      2.76   2.81|   -.58   .22 |  .78 -1.4   .64 -1.5| 1.18 |   .58   .49 |  65.4   63.4 |  847 847   |
| 373     129      2.89   2.94|  -1.92   .30 |  .65 -1.6   .60  -.8| 1.18 |   .54   .39 |  63.9   61.6 |  831 831   |
| 371     130      2.85   2.91|  -1.45   .27 |  .71 -1.5   .50 -1.4| 1.19 |   .57   .43 |  64.8   63.2 |  837 837   |
| 366     130      2.82   2.88|  -1.08   .25 |  .64 -2.1   .46 -2.0| 1.28 |   .66   .46 |  67.1   63.5 |  850 850   |
|-----------------------------+--------------+---------------------+------+-------------+--------------+------------|
| 334.4   129.7    2.58   2.64|    .00   .20 |  .97  -.1   .92  -.1|      |   .54       |       Mean (Count: 15)    |
|  32.3     3.1     .25    .26|   1.12   .04 |  .25  1.7   .29  1.5|      |   .09       |       S.D. (Population)   |
|  33.4     3.2     .26    .27|   1.16   .05 |  .26  1.8   .30  1.6|      |   .09       |       S.D. (Sample)       |
+-------------------------------------------------------------------------------------------------------------------+
Model, Populn: RMSE .21  Adj (True) S.D. 1.10  Separation 5.26  Strata 7.34  Reliability (not inter-rater) .97
Model, Sample: RMSE .21  Adj (True) S.D. 1.14  Separation 5.45  Strata 7.60  Reliability (not inter-rater) .97
Model, Fixed (all same) chi-square: 475.2  d.f.: 14  significance (probability): .00
Model,  Random (normal) chi-square: 13.5  d.f.: 13  significance (probability): .41
Inter-Rater agreement opportunities: 12559  Exact agreements: 7168 = 57.1%  Expected: 7171.3 = 57.1%

 

However, in this case, Rater 1 is the teacher and is reported as quite misfitting, but gave an average rating of 2.18 (quite strict), while four raters are quite overfitting, with values much less than 1.00, and very lenient, with average ratings greater than 2.5 out of a maximum of 3. It is very likely that the apparent consistency of these raters is misleading, they appear to have assigned maximum ratings to anything except a very weak performance. Although they are consistent, they are consistently very lenient, but provide us with less information about the performances than the more severe raters.

 

A reliability coefficient is provided at the bottom of Table 7.2, but this can be confusing. This is not a report about how much raters agreed, but about how confident we are that they are of different severity. In other words, raw scores from different raters are not directly comparable, so we should use adjusted scores.

 

Inter-rater agreement is reported at the bottom of Table 7.2, in this case the observed agreements precisely matched the expected value of 57.1%