Grammar and vocabulary testing scores in L 2 reading

In language testing bibliography, it has been suggested that L2 grammar and vocabulary scores strongly and positively correlate with L2 reading comprehension. Jeon and Yamashita (2014), conducting a large-scale meta-analysis study, found strong correlations among these two variables: r=.85 for grammar and r=.79 for vocabulary in different tasks. Following this paradigm, the current study examines this relationship (performance on reading skills and language use – grammar and vocabulary) integrated into a single testlet/paper. The hypothesis is tested by examining scores in reading skills and language use (in which grammar an..d vocabulary items are included) extracted from a large authentic sample in Greece with Italian as the target language. Further, any correlation between the acquisition or learning of these sub-skills is assessed. It was found that the two variables do indeed correlate statistically, although the strength of the correlation is not very high. In that respect, this work confirms findings of earlier studies reported in the literature review. This analysis results in several hypotheses which relate to the possible interpretations of the results. This study wishes to contribute to the discussion by adding results from a Greek context using Italian as the target L2.


Introduction
Traditional language proficiency tests have followed a process typically developed from the performance of testers on different fragmented language testlets, despite Alderson's (1984) arguments against segmentation, in which the scholar questioned whether L2 reading abilities are purely reading-related or if they involve more general language skills.Many researchers contributed to the discussion either theoretically, empirically, or experimentally.In certain cases, they provided evidence that scores in the two variables (grammar and vocabulary) strongly correlate with each other and therefore with L2 reading comprehension (see below).
More specifically, Shiotsu & Weir (2007, p.99) used componential research approaches to identify explanatory skill factors or components involved in the reading process.Jung (2009) conducted bibliographical secondary review research on L2 reading (focusing on various factors involved, i.e., orthography, vocabulary, grammar, background knowledge, and metacognitive strategies) and recommended a list of topics, which, in his view, needed further consideration in the investigation into the relationship between L2 grammatical knowledge and reading comprehension.In a recent study, Jeon and Yamashita (2014, p.162) examined the same issue through a component skills-based approach to reading.The researchers identified differences in areas of individual ability and further meta-analytically investigated "whether the individual ability difference in L2 reading comprehension is better explained by the variance observed in L2 language knowledge (e.g.vocabulary, grammar)."The relationship between these two subskills (among others) was investigated through the scores in separated tests, as in most language test-cases, and not in the same single testlet (lexicogrammatical knowledge), as it is measured in this study.Ideally, the two tests measuring the two constructs separately would need to be at the same level, requiring a pilot before final test application, in which an item analysis would reveal the appropriate items for the job, and the tests employed to measure the two constructs would need to be of the same type; i.e. if a multiple choice-type test were selected for the measurement of one construct, then the same test type would have to be used for the other construct as well.Distinguishing the two tested variables (vocabulary and grammar) in language knowledge is discussed in more detail in the following chapter.Jeon and Yamashita's (2014) meta-analysis study confirmed hypothesized strong correlations among these variables, despite earlier claims by Zhang (2012, p.559), citing Bernhardt (2005), who stated that "the interplay of vocabulary and grammatical knowledge in L2 textual comprehension remains unclear." In the present study, the relationship between scores in "reading" and "grammar and vocabulary" sections is investigated in one examination through the same official test, the National Foreign Language Exam System in Greece.In support of Jeon and Yamashita's (2014) hypothesis, which was not language-specific, this study wishes to add to the discussion through findings from a) a single authentic and not purposefully constructed test, b) an authentic test environment/setup, c) an evaluation of the lexicogrammatical construct in a non-segmented form, d) a large sample, and e) one additional target language (Italian).A bibliographical study is presented below and findings are discussed.This is followed by a short presentation of several different tests evaluating grammar and vocabulary in Italian, including the one used for this study.The method of data collection is presented, together with the analysis and major conclusions, and the contribution of the findings to the related literature is offered in the last chapter (Summary and Discussion).

Review of Literature
Vocabulary and Grammar in L2 Reading Grammar and vocabulary knowledge are both multidimensional, composed of a variety of different elements and sub-skills to be acquired, as Zhang (2012, p.559) pointed out.These subcomponents have not been well-recognized: word production, word reception, vocabulary depth or breadth, word-formation, meaning, and use are some of the vocabulary terms for the types knowledge one can encounter when reviewing the related literature.Correspondingly, grammar consists of many different types of morphosyntactic knowledge, such as tense, aspect, subjectverb agreement, articles, and prepositions, in addition to syntactic knowledge and awareness.The relationship between the two independent variables (grammar and vocabulary) was investigated and further claims as to their impact in L2 reading (the dependent variable) were examined.Three hypotheses were pursued: a) whether scores in the two independent variables are related, b) whether accomplishment/performance in these two is further related with the dependent variable (L2 reading), and finally c) whether scores in vocabulary and grammar tests could be used as predictors for L2 reading skills.
Distinguishing vocabulary and grammar knowledge controls was suggested to be difficult by Alderson (2005, p.172) who pointed to word-formation and compound words, which are situated somewhere in between the two, as a morphological example.Celce-Murcia & Larsen-Freeman (1999, p.1), along the same line, pointed out that the term 'lexicogrammar' would include both words and grammatical structures.In agreement with the above statements, Liao (2007, p.39) added that "lexico-grammatical knowledge involves lexical form and meaning as well as syntactic form" in correspondence to Purpura's (2004) model of grammatical knowledge, "where knowledge of words and structures involves two dimensions: form and meaning."Liao (ibid) went on to describe the term 'grammatical knowledge' as "another term often used interchangeably with lexico-grammatical knowledge."A correlation analysis in Shiotsu & Weir (2007, p.122) indeed showed a significant and strong correlation (Pearson r at .85) between scores in syntax (representing grammar ability) and vocabulary sections.This could be interpreted as evidence of the initial claim of indivisibility of the two competencies; however, in the same study, Shiotsu & Weir (2007, p.106), following Urquhurt & Weir (1998), used "grammar" in the traditional sense to refer either to syntax or syntax and grammatical morphology.The suggested "lexicogrammatical factor" was proposed by Purpura as a predictor which could calculate and envisage reading ability to a high degree.In his studies (1997;1999), this "factor" was measured by word formation, sentence formation, vocabulary and grammar tests (Purpura used the term lexico-grammatical "ability" rather than "knowledge").
Six years earlier than the introduction of the incorporated lexicogrammatical approach to vocabulary and grammar, Alderson (1993), investigated one of the independent variables separately and claimed that the role of grammar in L2 reading was significant.This was also confirmed in a more recent study by Shiotsu & Weir (2007) who investigated the two variables again independently and found grammatical knowledge to be a stronger predictor than vocabulary of reading ability.Referring to grammatical competence, Jung (2009, p.33) stated that "most studies addressing the role of grammar in L2 reading explored the issue by measuring the correlation between learners' grammatical knowledge and their L2 reading comprehension ability."Notice that in most of the empirical or experimental studies mentioned above, the statistical test used was a correlation analysis.This statistical test measures the relationship between two or more variables and how they vary together (in this case, scores in different subjects on tests, in which the correlation coefficient r measures the strength and direction of a linear relationship between them).A positive correlation means that a high score on one test is associated with a high score on the other, while a negative correlation means the opposite.However, a correlation can only indicate the presence or absence of a relationship and not the nature of the relationship; thus it should not be interpreted as causation.By that respect, the score on one variable may not be understood as a predictor for the other unless a linear regression analysis is carried out.
A large-scale study on this topic by Jeon and Yamashita (2014), who meta-analytically examined 4,419 studies, concentrated on 29 of those studies that investigated L2 reading and vocabulary and 16 studies that investigated reading and grammar.This meta-analysis allowed them to quantify how strong these relationships were on average (r=.85 for grammar and r=.79 for vocabulary).They concluded (p.187) that grammar and vocabulary knowledge are equally important correlates of L2 reading comprehension.They also found (p.192) a higher mean correlation between reading and an embedded items test (r=.92)than a test featuring discrete items (r=.71).
It may be possible to conclude [as in response to question (a) stated above, at the end of the first paragraph in this chapter], based on the findings in the respective studies, that scores in vocabulary and grammar indeed statistically relate to each other [Shiotsu & Weir (2007)].Notice, however, that it is the scores in different subjects that are found to strongly correlate to each other.By no means does this correlation suggest that one of the two variables (grammar or vocabulary) is a predictor for the other, nor that these variables are one construct instead of two.As for question (b), it may be concluded that, despite the separate statistically significant correlations registered between each of the two independent variables with L2 reading ability (with slightly different r scores), any causation as to the dependent variable is not yet supported by sufficient evidence.This issue requires further research.The results remain uncertain, due to a lack of abundant research findings and the statistical tests selected for use (see above).This is noted by Jung (2009), who adds a quantity factor (weight) in addition to the prediction factor, and, in agreement with Shiotsu & Weir (2007), claimed that due to diverse findings, "the respective weight of vocabulary and grammar in L2 reading still remains inconclusive."More specifically, Shiotsu & Weir (2007) pointed out that "the literature on the relative contribution of the grammar and vocabulary knowledge to reading performance, is too limited to provide convincing evidence for supporting one or the other of the two predictors and would benefit significantly from further empirical research."This study wishes to contribute to the discussion by adding results from a Greek context with Italian as the target L2.

Grammar and vocabulary test items in Italian testing
Grammar and vocabulary knowledge in official certification systems are typically tested/assessed either in a separate paper or are integrated into the Reading component.In proficiency tests/certification papers in a Greek context for Italian as an L2, the reading and language use/structure components are examined by four certification systems: CELI, CILS, PLIDA and the National Foreign Language Exam System in Greece (NFLES-Gr).In the following paragraphs a short description of these tests is offered.
Each CELI (Certificato di Conoscenza della Lingua Italiana, from Universita di Perugia) examination tests four macro-skills: reading, writing, listening and speaking.At Level 3 (B2), there is also a separate language structure component that assesses grammatical and lexical elements.CILS (Certificazione di Italiano come Lingua Straniera from Universita di Siena) examinations test the skills of reading, writing, listening, and speaking.These examinations have a different component/paper that assesses language structure (Test di analisi delle structure di communicazione).This paper is comprised of four basic levels, from CILS Uno (1) to CILS Quattro (4), B1 to C2 according to the CEFR.There are also CILS A1 & CILS A2 levels, which have been developed into four different versions: Modulo Bambini, Modulo Adolescenti, Modulo per l'Integrazione in Italia, and Modulo Adulti.
The Dante Alighieri Society is the organiser of PLIDA tests, with academic approval from the Sapienza University of Rome.PLIDA (Progetto Lingua Italiana Dante Alighieri from the Societa Dante Alighieri) tests the skills of reading, writing, listening, and speaking in four different papers.There is no separate paper and no specific tasks that assess grammar or vocabulary knowledge as different components.PLIDA tests can be taken only by those who are not native speakers of Italian in the six levels of CEFR (A1-C2).
The NFLES-Gr tests the four skills of reading, listening, writing, and speaking; in addition, there are also some language structure elements integrated into the first paper.More specifically, reading comprehension and language use are tested in a common paper titled "reading comprehension and language awareness" (comprensione scritta e consapevolezza linguistica).Notice here that the term "language awareness" has another meaning than the one typically assigned in the related literature: "the development in learners of an enhanced consciousness of and sensitivity to the forms and functions of language" (Carter, 2003).All items are presented in context.Vocabulary measurement methods used in the NFLES-Gr involve both responses requiring word selection and/or word production in context.Usually this context is at the level of reading passages or sentences.There is only one task which maintains a more discrete-type character; this task is also a constructed-response type: the crossword.All items in this task focus on specific linguistic items in reading passages.Notice that all the above tests still proceed in a segmented manner.

Materials and Methods
This study examines the relationship between reading and language use (grammar and vocabulary) test scores, integrated in a single testlet/paper.Test scores are extracted from a large authentic sample in Greece with Italian as the target language.

Participants
Participants were 1,922 test-takers who completed three tests in official settings (400 test-takers at the A1-A2, 1294 at the B1-B2 and 228 at the C1 levels).L1 of the test-takers was Greek.

Materials
In the reading comprehension and language use testlet of the NFLES-Gr, there are various items testing different sub-skills or language elements.Three testlets were examined, one at each level.
A Level Paper consisted of 50 items in total (α = .85),out of which there were 30 reading comprehension items (α = .74)and 20 language use items (α = .81).The Cronbach's alpha test value coefficients, which measure internal consistency, display a very good-to-excellent reliability.The B Level Paper consisted of 60 items in total (α = .91),out of which 30 involved reading comprehension (α = .92),and 30 involved language use items (α = .63).The alpha values for the reading comprehension items indicate an excellent reliability, while for the language use items, the α value lays at an acceptable reliability margin.
Notice that in Prova 8 and 9 the Cloze exercises were of the constructed response-type (participants were asked to provide their own answers rather than selecting from the options offered).The C1 Level Paper consisted of 60 items in total (α = .75),out of which 23 were reading comprehension items (α = .65),and 37 were language use items (α = .62).Internal consistency could be considered satisfactory, although both values lay under the acceptable reliability level (α=.7).
As in the B Level paper above, in the included Cloze exercises students provided an answer rather than selecting one.The SPSS statistical package and MS Excel were used for test analyses.

Design and Procedure
Rather than constructing test items for research purposes, source scores were collected from the NFLES-Gr for the Italian language, including A, B and C1 levels of the Common European Framework of References (CEFR).The selection of these tests was deliberate, as these tests represent a good example of a widespread language proficiency test in Greece, and the nature of its mediation activities presupposes that the test-takers are native speakers or proficient users of Greek.The Reading Comprehension testlet also includes items dedicated to language use, in which grammar and vocabulary were tested under the title language awareness.The papers were completed in authentic conditions by the test-takers, and scores were analysed in terms of item difficulty and item discrimination indexes.All the items, testing reading skills or language use through grammar and vocabulary, were analysed.

Results
The appropriate test to investigate the stated hypothesis of a possible relationship in scores between the two variables (Reading Comprehension/RC on one side and vocabulary and grammar on the other in a test titled Language Use/LU) is the Pearson r correlation coefficient.However, it seemed necessary to follow a two-level approach in this study: A preliminary analysis was conducted to evaluate suitability of the instruments (the tests themselves) to examine the hypotheses.This analysis aimed to determine whether these tests were indeed the appropriate modus operandi to perform the task.This preliminary first stage was executed in two steps.The first step was to conduct an item analysis and determine the item difficulty index, which is used to establish whether or not items are working well and further contribute to the distinction between proficient and the nonproficient students.For example, if an item is answered correctly or wrongly by all testers, it does not have a discriminating value.The overall goal is to investigate if items have a discriminating usefulness.The second step was to calculate the Pearson pointbiserial correlation (r pbi) which measures the strength and direction of the association that exists between one continuous variable (ratio or interval, which in this study is the scores on the test) and one dichotomous/binary variable (e.g.yes/no, true/false, or correct/incorrect, the last of which is the case in this study).
A correlation analysis using the Pearson r correlation coefficient as suggested above, which tests relationships between two continuous variables (i.e.test scores).The results of both approaches are initially presented in three sections, each related to the language level of the tests.
A Level Scores: Preliminary analysis and correlations between the scores The difficulty index in both tests was estimated at 0.86 for RC and 0.61 for the LU items.The two scores show that a) the RC items were scored correctly by more test takers, in contrast to those covering the LU section, and b) the Pearson point-biserial correlation score (r-pbi, used here to confirm discrimination index) in both tests was found to be above the .20 'good test' level (RC items,. A paired samples correlation test that followed showed that the scores in the two variables (RC and LU) correlate significantly (Pearson r at 0.540, p<00.1),although the strength is only moderate (the closest the Pearson r is to 1 or -1, the stronger the relation).This finding clearly suggests that despite the surface differences between the two scores, the results are significantly associated.The difficulty index in both B level RC and LU scores was estimated at 0.78 for the former while 0.47 for the latter.The r pbi for both RC and LU was found to be 0.29.Correlating the scores of RC and LU items, the Pearson r was estimated at 0.639 (p<00.1),showing that the relationship is statistically significant and strong (closer to 1 than the previous analysis of A level scores).The difficulty index for the RC items at this level was estimated to be 0.70, while for the LU, selected response items only reached the 0.50 level.The r pbi for the tests was 0.29 for the former and 0.25 for the latter.
A paired samples correlation showed that the scores in the two variables (RC and LU) correlate significantly (Pearson r at 0.493 p<00.1).The strength was found to be moderate.Since this statistically significant difference is repeated at all levels, it becomes evident that the two variables do indeed exist in a linear manner (when a score is high in one variable, it is also high in the other), even though this correlation is not very strong (below the 0.800 level).Notice that RC scores are higher than those in LU tests (in which vocabulary and grammar items are included) at all levels.This may lead to the testing of further hypotheses, as stated in the next chapter.

Discussion and Conclusion
Similarly to the studies reported in the bibliographical section, this study's initial hypothesis was largely supported by the evidence ((an authentic test measuring lexicogrammatical constructs in a single testlet and in Italian).Grammar and vocabulary scores in LU tests prove to be significant correlates of RC, as in Jeon and Yamashita's (2014) work, although correlations were not found to be as strong as in their study, which measured the two variables independently.
A few further hypotheses may need to be investigated:  Are L2 RC skills always at a higher level in every learner than skills in LU (in which grammar and vocabulary is measured)? Do RC skills develop sooner, at an earlier stage, than LU skills?This study may suggest that the answers to the above questions are "yes," although it would be too soon to make such a general claim.Exploring tester opinions in dealing with RC and grammar and vocabulary items, and registering testers' answering strategies for these questions would illuminate this issue.It should be pointed out that RC items require a more general and universal knowledge of language and skill transfer, in which the tester has the chance to infer the meaning of a vocabulary item, sentence, or paragraph from its linguistic environment in the case that s/he is not equipped with the necessary knowledge.Grammar and vocabulary items, on the other hand, require specific prescriptive knowledge and perfection.The development of these linguistic elements in L2 learning seem, on the surface, to overlap to a substantial extent, and thus one should expect scores in vocabulary and grammar knowledge tested in context to be closer to scores in reading skills, as was confirmed by the correlation indexes.However, Pearson et al (2007), as cited by Jeon & Yamashita (2014, p.173), argues that lexical knowledge (as measured by discrete items on a test) is a distinct construct from inferencing or reasoning (as measured by embedded items on a test).
Finally, the impact of RC items may not be considered as significant to the overall score and decisions made in norm-referenced tests, as the test-takers' rankings remained the same (high Pearson correlations).However, in criterion-referenced situations "where there exists a predetermined criterion for the students to meet, low scores would hurt those at the borderline" (Farhady, 1996, p.222), and thus using more LU items than RC would impact those with a marginal (pass/non-pass) score.A well-structured criterionreferenced language proficiency test could balance a certain proportion of LU items, in which registered performance seems to be lower, with another proportion of RC items, where testers' performance is higher.