The Language Teacher • Feature Article | 3
Output tasks and vocabulary gains
Keywords
pushed output, vocabulary, motivation
Folse (2004) argued for the importance of
vocabulary instruction and the ef- fectiveness of list
learning, while Laufer and Girsai (2008) found mechanical
output tasks using contrastive analysis and translation
effective for vocabulary learning, following Swain and
Lapkin’s (1995) advocacy of pushed output using creative
tasks. Vocabulary gains over one semester were compared from
a treatment group of 37 learners taught vocabulary using
mechanical tasks with a control group of 67 learners
assigned creative output tasks in a
SwainandLapkin(1995)は 創造的タスクを使 う強制的アウトプットを提唱し、Folse (2004) は 語彙指導の重要性とリスト学習の有効性を 主張した。一方、Laufer and Girsai (2008)は
対照分析と翻訳を用いる機械的なアウトプット
を促すタスクが語彙学習に有効であると指摘 した。本論では、1学期間での語彙習得度を準 実験的形式で、機械的なタスクを使い語彙指 導を受けた37名の実験群と、創造的アウトプッ トタスクの指導を受けた67名の統制群を比較 した。語彙テストの事前・事後のスコアを等価 するために、ラッシュ分析を用いた。両グルー プの事後テストにおいて実質的な語彙習得が 認められたが、予測に反して実験群よりも統制 群における語彙習得の方が大きいという結果 になった。これは、強制的アウトプットが言語習 得に効果的な手段とする主張を支持する結果 であり、長期的な語彙習得には機械的タスク のみでは不十分であることを示唆する。
Trevor A. Holster
Darcy F. de Lint
Kyushu Sangyo University
Vocabulary is undoubtedly crucial to language, but “most vocabulary research in applied linguistics is based on
a narrow linguistic agenda that was to a large extent defined by the concerns of the vocabulary control movement in the 1920s” (Meara, 2002, p. 393), an agenda Meara termed the “vocabulary manifesto”. Folse (2004), advocating this agenda, argued for the effectiveness of list learning, despite being “dull”, and claimed that:
Unfortunately, traditionally vocabulary has received less attention in second language (L2) pedagogy than any of these other aspects, particularly grammar. Arguably, vocabulary is perhaps the most important component in L2 ability (p. 22).
In contrast, Swain and Lapkin (1995),
following Schmidt’s (1990) argument for conscious
“noticing”, argued that output tasks can lead to noticing of
linguistic shortcomings, “pushing” learners to modify
output. Laufer and Girsai (2008) compared contrastive analysis and translation (CAT)
tasks with meaning- focused instruction (MFI) and
Williams, and Cameron
(2002) found that, while
THE LANGUAGE TEACHER: 36.2 • March / April 2012
The Language Teacher • Feature Article
183), highlighting the importance of
longitudinal studies to investigate whether experimental treatments translate into improved
Schumann and Wood (2004, p. 23) described
Sustained Deep Learning (SDL) as underlying
each situation evaluated on novelty,
pleasant- ness, goal relevance,
coping ability, and self/ social image compatibility.
Biological value thus underlies preferences and enables
choices, with positive rewards affecting future preferences
and choices, making positive assessment of learn- ing experiences crucial for future
motivation. This raises questions about the motivational
effect and opportunity cost of dull mechanical vocabulary
tasks relative to the other tasks that must be dropped to
make time for vocabulary instruction.
These questions require
This is consistent with Hattie’s (2009)
review of educational
Background and research hypothesis
In 2009, a private Japanese university in
south- western Japan introduced a
vocabulary curricu- lum in an
attempt to improve scores on the read- ing
section of the TOEIC Bridge test (ETS, 2008), following
disappointment at modest gains in previous years. Students
at this institution take two compulsory
The Longman
(LEJ) (2006) was adopted as a mandatory
sup- plementary text for all
to teachers in September of 2008 for instruction and testing in 2009. The availability of bilingual example sentences raised the possibility of contrastive analysis of usage between English and Japanese without the need for bilingual teachers. This allowed the framing of a research hypothesis:
Mechanical output (MO) tasks based on
bilingual example sentences provide greater
(CO) tasks requiring creation of original mean- ing.
Task design
The vocabulary tasks, influenced by Laufer and
Girsai’s (2008) use of contrastive analysis, aimed
4 THE LANGUAGE TEACHER Online •
Holster & de Lint: Output tasks and vocabulary gains
to draw attention to target word forms and meanings and involved the following steps:
•Copy target words from projector to work sheets
•Compare gapped English example sentence with ungapped Japanese translation and choose target word to complete the gap
•Take a
•Take a
Another
The commonsensical expectation was that the MO group would show greater vocabulary gains, the question being whether these would be large enough to justify spending such a large proportion of class time on mechanical tasks.
Surprisingly, among the
target group of
& DeLint, 2010), although the
differences overall were not
statistically significant (t(120) =
alone unclear. Therefore, in 2010 the
vocabulary homework was discontinued, allowing com- parison between the 2009 MO group
and a 2010 CO group without exposure to the vocabulary
materials. In order to provide a larger sample size and greater generalizability, students
taught by a second teacher were included in the
cur- rent study. This teacher
used the MO tasks in 2009 but not in 2010, instead assigning
short personalized compositions based on
coursebook speaking practice activities for
homework, later used in class for small group presentations
and transcription or
Research instrument and methodology
As TOEIC Bridge
1) |
What's your _________? |
A) |
disassociate |
2) |
The country has serious |
B) |
fresh |
|
_________ problems. |
|
|
3) |
The teacher divided us |
C) |
groups |
|
into _________ of five. |
|
|
4) |
The red light _________ |
D) |
means |
|
“stop.” |
|
|
5) |
We _________ about $100 |
E) |
name |
|
a week on food. |
|
|
|
|
F) |
settles |
|
|
G) |
social |
|
|
H) |
spend |
Figure 1. Semester test example item cluster
The two test forms used each comprised 50
items in 10 clusters of five items each, with eight
Analysis of the
vocabulary
THE LANGUAGE TEACHER: 36.2 • March / April 2012 5
The Language Teacher • Feature Article
package for Rasch analysis (Linacre, 2010), |
dependent (Bond & Fox, 2007), so limiting the |
providing detailed analysis of test performance |
research sample to |
and the interval level measurement required |
constrains the range of person ability, leading to |
for statistical comparisons of the results (Bond |
lower reported reliability and separation when |
& Fox, 2007). Winsteps provides outputs in a |
this sample is analyzed in isolation. |
probabilistic unit called the
“logit”, or |
The research sample was limited to |
unit, so outputs were specified on a scale of 1 |
students with TOEIC Bridge scores below 100, |
logit = 10, with mean item difficulty specified as |
the target group for the MO tasks, giving a con- |
50, providing a |
venience sample of three classes from each year. |
urement based on |
Attendance and attrition are often problematic |
practical measures of effect sizes (Field, 2009, |
with these |
pp. |
pecially so of the MO group, as shown in Table 2. |
with ability of 50 would have a 50% expectation |
Of the 189 Japanese students assigned to the six |
of success on an item of mean difficulty, increas- |
classes, five students with less than eight correct |
ing to 73% for an item of difficulty of 40, and |
responses were eliminated from the |
27% on an item of difficulty 60. Engelhard (2009) |
the expected score from random guessing with |
reports a threshold of .30 logits as commonly |
this test format is six. Following pilot adminis- |
considered a substantively meaningful effect |
trations, students were allowed 25 minutes to |
size, equal to 3.0 on the score scale used here. |
complete the test, but some did not attempt to |
Table 1 gives summary statistics from the |
answer difficult items while others spent large |
anchoring analysis used to measure the difficulty |
amounts of time on difficult questions, resulting |
of the items in order to anchor them at specified |
in incomplete answer sheets. Missed responses |
values. Anchoring the items in this way allows |
were coded as incorrect, following assumed |
person ability to be directly compared between |
practice in TOEIC Bridge tests, but items with |
|
both a correct and incorrect response were coded |
gains in vocabulary knowledge. The separation |
as missing data. With 25 items printed on each |
index of 2.94 means that the ratio of measure- |
side of the question sheet, students who did |
ment error to the range of person ability is small |
not attempt the final 20 items were assumed to |
enough that this test can separate the persons in |
have been plodding or sleeping, eliminating four |
the anchoring sample into at least two distinct |
students, all from the MO group. Of the 92 MO |
bands. The sample of persons in the anchoring |
group students, 68 satisfactorily completed the |
analysis had a much larger range of ability than |
|
the research sample, so the reported separation |
group students. However, only 47 MO group |
index and person reliability of .90 must be con- |
students completed the |
sidered an upper limit for this test. The separa- |
with 72 CO group students. Ultimately, 37 MO |
tion index and person reliability are sample |
students completed both tests, compared with |
Table 1. Vocabulary test anchoring administration performance
|
Total Score |
Count |
Measure |
Model Error |
Infit MS |
Infit |
Outfit MS |
Outfit |
Mean |
26.2 |
49.6 |
51.80 |
3.61 |
1.00 |
.0 |
1.08 |
.1 |
SD |
9.7 |
4.0 |
11.69 |
.34 |
.20 |
1.1 |
.56 |
1.2 |
Max. |
76.0 |
100.0 |
78.94 |
5.08 |
1.75 |
4.2 |
4.74 |
5.4 |
Min. |
8.0 |
30.0 |
27.26 |
2.35 |
.52 |
|
.28 |
|
Real RMSE 3.77 |
True SD 11.06 |
Separation 2.94 |
Person reliability .90 |
|
||||
Model RMSE 3.62 |
True SD 11.11 |
Separation 3.07 |
Person reliability .90 |
|
SE of person mean = .24
Note. n = 2325,
Scale of 1 logit = 10.00, Mean item difficulty = 50.00, Person
raw
6 THE LANGUAGE TEACHER Online •
Holster & de Lint: Output tasks and vocabulary gains
67 CO students, attrition rates of 60% and 31% respectively, leaving a sample of 104 of the 189 eligible students, an overall attrition rate of 45%.
Table 2. Summary statistics for vocabulary and output groups
|
|
Group |
|
n |
|
Mean |
|
SD |
|
|
|
Vocabulary |
37 |
41.42 |
6.84 |
|
|||
|
|
Output |
67 |
42.92 |
8.15 |
|
|||
|
|
Vocabulary |
|
37 |
|
44.94 |
|
7.33 |
|
|
|
Output |
67 |
47.94 |
7.09 |
|
|||
Gain |
|
Vocabulary |
|
37 |
|
3.53 |
|
7.24 |
|
|
|
Output |
67 |
5.02 |
6.31 |
|
Results
The
Logit gains greater than .30 can be considered substantively meaningful, while Hattie (2009,
pp.
.09, d =
.50) and 5.02 scaled points (.50 logits) for the CO group (d = .60), as shown in Table 4. Gains of these magnitudes mean that a person having a
50% expectation of
success on an item in the pre- test would have
respectively a 59% and a 62% expectation of success on an
item of the same difficulty in the
THE LANGUAGE TEACHER: 36.2 |
• March / April 2012 |
7 |
The Language Teacher • Feature Article
implying an expectation of success falling from |
|
and for the MO group to better the CO group’s |
||||||
50% to 46%, or a lag of 22% of the pooled stand- |
|
gains by a substantively significant .30 logits, an |
||||||
ard deviation. Thus, the MO tasks did not result |
|
extra 29% of the MO students with mean gains |
||||||
in vocabulary gains substantively or statistically |
|
of approximately 1.25 logits would have been |
||||||
significantly greater than the CO tasks, justifying |
|
needed to be retained.An effect size of 1.25 logits |
||||||
rejection of the research hypothesis. |
|
|
|
means that an expectation of success of 50% on |
||||
|
|
|
|
|
|
|
|
the |
Table 3. Summary statistics for MO and CO groups |
|
an implausibly large reversal. The evidence from |
||||||
|
this study thus justifies a conclusion that this |
|||||||
|
Group |
n |
Mean |
|
SD |
|
treatment was not effective for students of this |
|
|
MO |
37 |
41.42 |
|
6.84 |
|
level at this institution. |
|
|
CO |
|
67 |
42.92 |
|
8.15 |
|
However, a number of concerns would need to |
|
MO |
37 |
44.94 |
|
7.33 |
|
be addressed before wider generalizability was |
|
|
CO |
|
67 |
47.94 |
|
7.09 |
|
warranted. These students had previous expo- |
|
|
|
|
sure to English at high school, took compulsory |
||||
Gain |
MO |
37 |
3.53 |
|
7.24 |
|
||
|
|
online vocabulary homework, were probably |
||||||
|
CO |
|
67 |
5.02 |
|
6.31 |
|
|
|
|
|
|
taught vocabulary by JTEs, and had incidental |
||||
|
|
|
|
|
|
|
|
exposure to vocabulary from the coursebooks |
Table 4. Effect sizes of score gains same as |
|
used by JTEs and NSTs, making discussion |
||||||
|
table 3 format |
|
|
|
of specific mechanisms of acquisition highly |
|||
|
|
|
|
|
|
|
|
speculative. It is plausible that CO served as |
Group |
n |
Logit |
Odds Ratio |
r |
d |
|
||
|
a mechanism to consolidate acquired learned |
|||||||
Combined |
104 |
.45* |
|
61/50 |
.29 |
.60* |
|
|
|
|
knowledge, but no claim is justified that such in- |
||||||
MO |
37 |
.35* |
|
59/50 |
.24 |
.50* |
|
cidental exposure will be an efficient mechanism |
CO |
67 |
.50* |
|
62/50 |
.31 |
.66* |
|
for learning previously unknown |
Difference |
|
|
|
46/50 |
.11 |
|
|
words, so one important future research direc- |
|
|
|
|
|
|
|
|
tion will be to compare CO and MO tasks for |
|
|
|
|
|
|
|
lower frequency vocabulary that students are |
|
* Indicates substantively significant effect size |
|
|
||||||
|
|
less likely to encounter incidentally. |
||||||
|
|
|
|
|
|
|
|
The causes of the high attrition rate could not |
Discussion and conclusions |
|
|
|
|
be investigated for this report, so qualitative |
|||
The hypothesis that mechanical output (MO) |
|
investigations of this should be undertaken in fu- |
||||||
|
ture studies. It is possible that the CO tasks led to |
|||||||
tasks provide greater |
|
|||||||
|
the positive goal appraisals theorized to underlie |
|||||||
gains than creative output (CO) tasks was not |
|
|||||||
|
sustained deep learning (Schumann & Wood, |
|||||||
supported. Both the treatment (MO) and control |
|
|||||||
|
2004), while MO tasks were perceived as dull |
|||||||
(CO) groups showed substantively significant |
|
|||||||
|
|
|||||||
gains in vocabulary knowledge. Although the |
|
|||||||
|
and high attrition. However, many other factors |
|||||||
MO group showed smaller gains than the CO |
|
|||||||
|
may have contributed to the differential attrition |
|||||||
group, the difference between them was neither |
|
|||||||
|
rate, including social effects leading to a small |
|||||||
substantively nor statistically significant. How- |
|
|||||||
|
number of individuals disproportionately affect- |
|||||||
ever, preparing and administering the MO tasks |
|
|||||||
|
ing the behavior of the group. If this did occur, |
|||||||
placed a heavy workload on teachers, and both |
|
|||||||
|
which the authors consider plausible, the chance |
|||||||
teachers’ impressions were that students found |
|
|||||||
|
assignment of a few exceptionally motivated or |
|||||||
them dull, consistent with Folse (2004). The |
|
|||||||
|
unmotivated students who influenced others to |
|||||||
attrition rate of 60% for the MO group versus |
|
|||||||
|
drop out or continue attending class may have |
|||||||
31% for the CO group was of great concern, |
|
|||||||
|
contributed to the differential attrition. Resolv- |
|||||||
raising the possibility that the dull nature of |
|
|||||||
|
ing such questions would require qualitative |
|||||||
MO tasks led to differential attrition of higher |
|
|||||||
|
research far beyond the practical scope of this |
|||||||
aptitude learners from the MO group. However, |
|
for an equal attrition rate between the groups
8 THE LANGUAGE TEACHER Online •
Holster & de Lint: Output tasks and vocabulary gains
investigation, but essential if the
achievement or lack of achievement of
This study also highlights important con- siderations for teachers seeking to develop classroom tasks based on experimental research findings. One is awareness of the problem of publication bias, where positive findings sup- porting the research hypothesis are emphasized over studies with null results. An intervention found to be successful in a small number of experimental studies may have failed on numer- ous other occasions not considered worthy of publication, so multiple replications are needed before the relative effectiveness of interven- tions can be judged. Secondly, findings from experimental studies cannot be automatically assumed to generalize to classroom contexts, nor can classroom studies conducted in one context be assumed to generalize to other contexts. The results of the current investigation support the view that new interventions should be carefully piloted to gather quantitative and qualitative
evidence of effectiveness under local
conditions before
References
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model (2nd ed.). London: Lawrence Erlbaum Associates.
Engelhard, G. (2009). Using item response
theory and
Psychological Measurement, 69(4),
10.1177/0013164408323240
ETS. (2008). TOEIC Bridge user guide. Retrieved from <ets.org/Media/Tests/TOEIC_Bridge/ pdf/TOEIC_Bridge_User_Guide.pdf>.
Field, A. P. (2009). Discovering statistics with SPSS
(3rd ed.). London: Sage.
Folse, K. S. (2004). Vocabulary myths. Ann Arbor: The University of Michigan Press.
THE LANGUAGE TEACHER: 36.2 • March / April 2012 9
The Language Teacher • Feature Article
Hattie, J. A. (2009). Visible learning: A synthesis of
over 800
New York: Routledge.
Holster, T. A., & DeLint, D. F.
(2010). Pushed output and vocabulary
gains. Paper presented at the
JALT
Laufer, B., &
Girsai, N. (2008).
Linacre, J. M. (2010). A user's guide to Winsteps
3.70.02. Retrieved from <winsteps.com/win- man/index.htm?copyright.htm>.
Longman eiwajiten:
Meara, P. M. (2002). The rediscovery of
vocabu- lary. Second
Language Research, 18(4),
Rott, S., Williams, J., & Cameron, R. (2002).
The effect of
Schmidt, R. W. (1990). The role of
conscious- ness in second
language learning. Applied Linguistics,
11(2),
Schumann, J. H., & Wood, L. A.
(2004). The neu- robiology of
motivation. In J. H. Schumann, S. E. Crowell, N. E. Jones,
N. Lee, S. A. Schuchert, & L. A. Wood (Eds.), The neurobiology of learn- ing. (pp.
Associates.
Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they gener- ate: A step towards second language learning.
Applied Linguistics, 16(3),
Thompson, B. (1999). Statistical
significance tests, effect size reporting and the vain
pursuit of
West, M. P. (1953). A general service list of English words. London: Longman, Green & Co.
Trevor Holster has taught English in Japan for over 15 years. His research interests include vocabulary acquisition and peer assessment.
Darcy de Lint has been teaching English in Japan for 20 years. He has research interests in the areas of pushed output, communication strate- gies and peer assessment
3rd Annual Shikoku JALT Conference
Sponsored by East Shikoku JALT, Matsuyama JALT, and Oxford University Press
Saturday, May 12 (1:00 – 5:00) Kochi University
•Keynote
Lecture: Mike
•Featured
Speaker: Jim
•Plus many other great presentations
Visit our website for the full conference schedule and access information
10 THE LANGUAGE TEACHER Online •