Faisal Mustafa, Heri Apriadi


Designing a reliable reading test has been proven very difficult and following test design steps is time-consuming; however, it is very significant both for teachers and test centers. Therefore, this research was aimed at proposing an alternative, time-saving method of designing a reading test to measure EFL student proficiency. The method proposed was to use a standardized test as a template both for texts and question types. To prove that this method was effective, a reading test was designed by using the method and tested for its reliability level, and compared to the reliability of reading test in PBT TOEFL. The results suggested that the test was highly reliable (85%), very close to the reliability level of the PBT TOEFL designed by ETS (86%). In addition, the scores obtained were significantly correlated with the scores achieved in an ETS-designed test (91%). Therefore, it is recommended that test makers follow the procedure of designing a reading test by using standardized test such as PBT TOEFL as the template in developing a reliable test, thus reliability test and revision process can both be bypassed.


Reading test; test design; language assessment; standardized test; TOEFL

Full Text:



Akbari, Z. (2014). The role of grammar in second language reading comprehension: Iranian ESP context. Procedia - Social and Behavioral Sciences, 98, 122–126.

Allan, A. (1992). Development and validation of a scale to measure test-wiseness in EFL/ESL reading test takers. Language Testing, 9, 101–119.

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.

Best J. W., & Kahn J. V. (2006). Research in education (10th Edition). Boston: Pearson Education, Inc.

Brown, D. (2004). Language assessment: Principles and classroom practices. New York: Longman.

Brown, J. D. (1996). Testing in language programs. New Jersey: Prentice Hall Regents.

Catelly, Y.-M. (2014). Optimizing language assessment – Focus on test specification and piloting. Procedia - Social and Behavioral Sciences, 128, 393–398.

Chen, A. C. (2016). A critical evaluation of text difficulty development in ELT textbook series : A corpus-based approach using variability neighbor clustering. System, 58, 64–81.

Cobb, T., & Horst, M. (2004). Is there room for an academic wordlist in French? In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition and testing (pp. 15–38). Philadelphia: John Benjamins Publishing.

Cohen, A. D. (2012). Test-taking strategies and task design. In G. Fulcher & F. Davidson (Eds.), The Rouledge handbook of language testing (pp. 262–277). Oxon: Routledge.

Cohen, A. D., & Upton, T. A. (2007). `I want to go back to the text’: Response strategies on the reading subtest of the new TOEFL(R). Language Testing, 24(2), 209–250.

Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.

ETS. (2009). ETS international principles for fairness review of assessments: A manual for developing locally appropriate fairness review guidelines in various countries. Princeton, N. J.: ETS. Retrieved from

Tannenbaum, R. J., & Baron, P. A. (2011). Mapping the TOEFL® ITP Tests onto the Common European Framework of Reference. Princeton, N. J.: ETS. Retrieved from:

Frisbie, D. a. (1988). Reliability of scores from teacher-made tests. Educational Measurement: Issues and Practice, 7(1), 25–35.

Heilman, M., Collins-Thompson, K., Callan, J., & Eskenazi, M. (2007). Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of the Human Language Technology Conference (pp. 460–467). Rochester, New York.

Hilke, R., & Wadden, P. (1997). The TOEFL and its imitators: Analyzing the TOEFL and evaluating TOEFL-prep texts. RELC Journal, 28(1), 28–53.

Lee, G. (2002). The influence of several factors on reliability for complex reading comprehension tests. Journal of Educational Measurement, 39(2), 149–164.

Leopold, C., & Leutner, D. (2012). Science text comprehension: Drawing, main idea selection , and summarizing as learning strategies. Learning and Instruction, 22(1), 16–26.

Marzuki, D. (2008). Keterampilan Reading TOEFL like test mahasiswa semester V Jurusan Akuntansi Politeknik Negeri Padang. Jurnal Akuntansi & Manajemen, 3(2), 95–106.

Miltsakaki, E., & Troutt, A. (2008). Real-time web text classification and analysis of reading difficulty. In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications (pp. 89–97). Columbus, Ohio: Association for Computational Linguistics.

Mufidah, N. (2014). The English teachers’ mastery in TOEFL Prediction. Journal on English as a Foreign Language, 4(2), 79–86.

Mustafa, F. (2015). Using corpora to design a reliable test instrument for English proficiency assessment. In Teaching and Assessing L2 Learners in the 21st Century (pp. 344–352). Denpasar.

Nation, P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9–13.

Nurhayati, I. K., & Gir, R. R. W. (2014). Analisis perbandingan nilai TOEFL dengan nilai mata kuliah Bahasa Inggris mahasiswa [Analysis of comparison between TOEFL score and the score for English module]. Jurnal Sosioteknologi, 13(2), 134–146.

Palandi, J. F., & Pudyastuti, Z. E. (2011). Desain sistem TOEFL Prediction untuk membantu persiapan tes TOEFL [Designing TOEFL PRediction software to hep with TOEFL test preparation]. SNATIKA, 1(1).

Phillips, D. (2001). Longman complete course for the TOEFL test: Preparation for the computer and paper tests. New York: Pearson Education.

Sabarun. (2012). The students’ scores on the different Institutional TOEFLs at the sixth semester English Department students of the Palangka Raya State Islamic College. Educate, 1(2).

Salam, U., Fergina, A., & Suparjan. (2012). Kebijakan TOEFL di Universitas Tanjungpura: Analisis studi kasus [TOEFL policy in Tanjungpura University: A case analysis]. Jurnal Guru Membangun, 28(2), 13–25.

Sheehan, K. M., Kostin, I., Futagi, Y., & Flor, M. (2010). Generating automated text complexity classifications that are aligned with targeted text complexity standards. New Jersey.

Sugeng, B., Saleh, S. M., & Suharto, G. (2012). Penguasaan Bahasa Inggris mahasiswa baru UNY Tahun Akademil 2005/2006 - 2009/2010 pada kriteria TOEFL-Like. LITERA, 11(2), 189–203.

Wainer, H., & Lukhele, R. (1997). How reliable is the TOEFL test? ETS Research Report Series, 1997(1), i-23.

Zhang, D. (2012). Vocabulary and grammar knowledge in second language reading comprehension: A structural equation modeling study. The Modern Language Journal, 96(4), 558–575.


  • There are currently no refbacks.