Basics Statistics Critical Thinking Test : Reliability and Validity Issues

This study aims to examine the validity and reliability of Statistics Critical Thinking Test (SCTT) for Institute of Teachers Education students using the Content Validity Index (CVI). The assessment was conducted through the evaluation by 3 experts and 30 students selected via purposive sampling. The instrument involved 30 items with two main constructs such as evaluation and interpretation. The result of analysis, CVI is 0.99. This instrument also has 0.71 of reliability value. The results of the study prove that the instrument has good validity and reliability. SCTT has a great potential to be promoted as a good measurement instrument. This instrument is recommended to be used to measure critical thinking level for others college students.


Introduction
In recent years critical thinking has become popular in educational circle.For many reasons, educators especially teachers and lecturer have become very interested in teaching thinking skills of various kind in contrast with teaching information and content.Many educators say that they already try their best to teach "how to think".They do that implicitly in the course of teaching contents.However, the result in Trend in Mathematics and Sciences Study (TIMSS) show that our students thinking skills level is below the average international standard (Ministry of Education Malaysia [MOE], 2013).
Critical thinking is a way of thinking about any subject, content or issues where thinkers improve the quality of his thinking with skilled to take over the structure that exists in the thinking and intellectual standards impose on (Paul, Fisher & Nosich, 1993;Ghadi, Bakar, Alwi, & Talib, 2013;Fani, 2011).According to the MOE (1994) and Fani (2011), thinking skills are divided into two groups, critical thinking skills and creative thinking skills.These skills happen when people use the mind in defining, generalize, categorize, analyze, predict, interpreting, exploring and solve the problems.These skills help individuals hurl opinions, generate ideas, criticize, make a mental picture, draw conclusions and correlate the information received.
The importance of critical thinking has been known to the public from time immemorial.This is evidenced by the revision of the curriculum by the Malaysian government.So, the educators have to apply Critical thinking skills and conducted during teaching and learning in schools or institutes.According to Jackson and Newberry (2012), nearly 90% of respondents claimed that critical thinking constituted a primary objective of their teaching.This statement shows just how important critical thinking is in the eyes of educators.
One of the elements identified to be developed to the student's is critical thinking skills.This is recognized by the Ministry of Education (MOE) to put these skills as one of the student aspirations to be achieved Malaysia Education Blueprint 2013-2025(MOE, 2013).As a lecturer in Teacher Training Institute, critical thinking skills are very important, so build and develop an instruments to measure thinking skills is also important to look at the level of critical thinking skills of the future teachers.According to UNESCO (2000), measure critical thinking skills are necessary to improve the quality of education.If the test is an instrument to measure of teaching and curriculum, so the best way to know the level of quality of teaching is to develop a good test (Yeh, 2001).In order to know the level of students' critical thinking skills in the classroom, the instrument to test the skills must take into account the particular subjects in which these skills are taught (Gelerstein, Rio, Nussbaum, & Chiuminatto, 2016).
There is many critical thinking tests such as Watson Glaser Critical Thinking Appraisal, Cornell Critical Thinking Test and California Critical Thinking Skills Test (Ennis, 1993).But there is a few Critical Thinking Tests Instruments in Statistics.Researchers have built this test as appropriate in the context of Basic Statistics.These tests form is suitable for the measurement of college or university students.There are 30 items in this critical thinking skills test.The components of critical thinking skills are inference, deduction, assessment, interpretation, and identify assumptions (Marlina & Shaharom, 2010;Renjith & James, 2015).Renjith and James (2015) define each component of critical thinking skills as follows: a. Inference is to test the extent of the conclusions that are decided on the observable or proper facts.b.Deduction is to test the ability of students to make a conclusion based on the statements given.c.Interpretation is to evaluate whether any conclusions that are proposed are logical or do not exceed the reasonable doubt of the information given.d.Evaluation is to evaluate the arguments given either strong or weak.
e. Identify assumption is to identify the assumptions based on the statements given.This test consists of two sub-test consists of interpretation and evaluation.These two constructs are chosen because the developed module applies these two aspects.Both elements are also seen as having a positive relationship to the elements of leadership.In addition, the findings of Marlina and Shaharom (2010) find that interpretation has a meaningful relationship to the students' academic achievement.Therefore, the selection of both elements as sub elements in critical thinking skills is accurate.This instrument also involves statistical elements.
Statistics is an important mathematical branch where its use is widespread in various fields such as social sciences, physics, engineering, medicine, and business sciences.It covers daily life (Johnson & Kuby, 2012;Nur"Azah & Mazlan Mohamad, 2000;Yeoh, Mohd Afifi, Ket, & Narmal, 2015).This course is significant in all fields of education for all schooling grades from pre-school to higher education.All majors of undergraduate programs in the country or abroad include this course (Chew & Dillon, 2014;Kalaian & Kasim, 2014).Therefore, the selection of statistics as an element in this test is accurate.So, the development of these instruments should be tested and analysed in terms of validity and reliability.So, the development of these instruments should be tested and analysed in terms of validity and reliability.

Content Validity
Content validity of the critical thinking skills tests (basic statistics) is carried out using a quantitative approach.The method of calculating the content validity for each item is done by Content Validity Index (CVI).CVI is a measurement analysis that uses an empirical way to validate the instruments (Lynn, 1986;Lawshe, 1975;Polit & Beck, 2006).The benefits of using this method is easily administered, save costs and time, and easy to implement (Mohd Effendi Mohd Matore & Ahmad Zamri Khairani, 2015).So that, many researchers all over the world are use this method to see their content validity of the instruments.Two methods are commonly practiced by researchers is based on two main famous researchers, namely Lawshe year 1975 and Lynn year 1986.Comparison of procedures for the two methods is illustrated by Table 1 below.Thus, performing this analysis is important for researchers who want to build instruments in their study.This will ensure content validity of the instrument can be measured and verified using the method or procedure that accurately and correctly.

Face Validity
Face validity refers to the subjective assessment which evaluated the aspects of the reasonableness of an instrument.It ensure that whether each item in the relevant instrument visible, clear and not ambiguous (Oluwatayo, 2012).This can be judged by the appointment of a specialist in the field (Lynn, 1986;Lawshe, 1975;DeVon et al., 2007).So that, the researcher is already appoint three evaluators to do so.According Oluwatayo (2012), there are some things that need to be considered by the experts.It is about items sentence structure, the clarity of each item, spelling, writing space, instrument instructions, and the objective reasonableness of the measure item, easy to read and format.
Face validity can also be done with governing the instruments to the respondents who have characteristics similar to the sample (Mohd Effendi Mohd Matore & Ahmad Zamri Khairani, 2015).This administration is known as a pilot test.This pilot test process should be done as a real study.So, the step or procedures to collect real data should be followed.During this pilot test administration, it should be noted the item that are not understand by respondents and has a spelling mistake.This is the task of researchers to do that.Respondents who did not understand the meaning of paragraph item instrument can also ask the researchers to obtain certainty.These two aspects would increase the instrument face validity.

Research Method
The study was conducted using a quantitative approach by using correlation study.This approach was chosen based on an analysis of all the data in a quantitative form in which data can be measured.So, this is right design.Data analysis is carried out by calculating the value of critical thinking skills test validity and reliability.This instrument is also having been pilot to test the validity and reliability.In summary, this study procedure described by Figure 1.
Figure 1 Research Procedure The validity of the instrument is carried out to ensure that the instruments to be used really measure precisely related to what should be measured (Cohen, Manion, & Morrison, 2007. Hair, Anderson, Tathan, & Black, 1998;Jackson, 2009).This process can be done by appointing experts in the field.There are two types of validity to do that is face validity and content validity (Oluwatayo, 2012;Faezah Abd. Ghani & Mazlan Aris, 2012).In the context of the study, the process begins with a discussion between researcher and supervisor.The results of the talks, the three experts should be appointed.This expert was appointed to check and evaluate every item in the instrument.This statement is in line with Lynn (1986).He suggested that the number of experts should be appointed to evaluate the instrument is at least three people.After that, the process of seeking expert approval needs to do.Researchers have contacted experts via email and phone.Next, a letter of expert appointment has been written.
Then, researchers set up an appointment with an expert for carried out the instrument assessment.This process is performed on all three experts.All three experts are given critical thinking tests and one assessment instrument (Apendix 1).The assessment instrument was

Developing and Adoption an Instruments Item
Expert Validity

CVI Analysis
Pilot Test developed to review the expert's consent to the critical thinking skills test items.They have expressed agreement on the items that have been built on the critical thinking test.After that, analysis was made using the CVI method pioneered by Lynn (1986).Each item is then checked by a language specialist, a Malay language specialist.As a result of both processes, these tests then are tested for their reliability.
This reliability process begins by administering pilot studies on instruments.31 samples were involved during the process of reliability testing of critical thinking skills.The sample size for pilot study was selected based on Chua's suggestion (2006).He considers that 30 samples are adequate for pilot test purposes.Chua (2006) also outlines some steps during the pilot study.
First, conduct a pilot study using the same method as the actual study.Second, note the feedback on the confusion that arises.Chua's guide ( 2006) is also in line with Peat, Mellis, Williams, and Xuan (2002) views where two of these are mandatory during pilot studies for enhancing the internal validity of the instruments.In addition, he and his colleagues also stress that the researchers record the time taken by the sample to answer the instrument.Other steps are like throwing all the unnecessary, unclear or hard-to-understand questions.Additionally, it is necessary to shorten the question, revise and conduct a pilot review, review the answered questions and write the answered questions not as expected.
This pilot study begins with collecting all the students in the classroom.The next student was given a briefing on their goals being collected.Students are reminded to ask the researcher if they have any questions or concerns about the item.If there are any questions, the notes are made and the item should take further action for the purpose of improvement.After that, the test paper is re-assembled and revised.Data from the pilot study were analyzed to see the reliability index.Whereas the records that have been taken are assessed and the instrument improvements are made.The instrument is then repaired before being used in the actual study.

Result and Discussion
All experts gave positive comments on the items in the instrument.This is evidenced by the analysis of Content Validity Index (CVI).Manual calculations done by reference Lynn (1986) for CVI.Researchers using analytical calculations put forward by Lynn (1986).Analysis show that the average of the CVI value is 0.99.This value is approaching 1 with the difference 0.01.Polit and Beck (2006) suggested that CVI value for each item should exceed the minimum values highlighted by Lynn.Based on the number of panels that evaluate that instrument, item 12 need to be removed.However the researchers not to do that but have conducted discussions with expert A again.Then do the correction and improvement.Ratings second time by expert qualifies Lynn requirement (DeVon et al., 2007).So, these instruments are categorized has good content validity.
The process of collecting data for the analysis of Kuder-Richardson-20 test began by administering a test pilot.All pilot test procedures and processes have to be followed.This will help us to get accurate and good data to be analysed.The result of this reliability test is 0.71.
These values were analysed using MS Excel software.Chua (2006), Fraenkal et al. (2012) and Jackson (2009) have agreed that these values have good internal consistency.
This article were discusses the issues related to the validity and reliability of Statistics critical thinking skills test.There are several things that must be considered by every researcher during the validity and reliability.This must be done so that the item in the constructed and adapted instrument is measure what should be measured during the study was conducted.
Among the procedures that should be implemented by the researchers is to determine the validity and reliability by using CVI and Kuder-Richardson 20.In the process of validity, researchers also suggest that all researchers have to appoint at least five assessors to evaluate their instrument.Based on the experience of researchers in the course of the evaluation process, the use of three assessors is quite difficult to qualify Lynn and Lawshe (DeVon et al., 2007).If researchers still want to use three evaluators, the Fuzzy Delphi method is encouraged.So that the item that we built is really have content validity.In the context of the study, the researchers used two rounds evaluation of panel A. This is because panel A disagreed with item 12.So researcher need to do improvements to item 12 until this item is agreed by panel.Item 12 is initially as follows."Pelajar ponteng sekolah untuk pergi kerja sampingan, maka kehadiran mereka bertambah baik".However, the panel has changed it to "Jika mereka ponteng sekolah diberi kerja sampingan, kehadiran mereka ke sekolah bertambah baik".The panel restructured the sentences to see more mathematical reasoning.Then the value of CVI in the second round is 1.This value coincides with the value of the proposed by Polit & Beck (2006), Lynn (1986) and Lawshe (1975).For the aspects of reliability, the pilot study should be conducted and the reliability of the test Kuder-Richardson 20 needs to be calculated.On the whole, these instruments have a very good validity and strong reliability.Therefore, the researchers recommend these instruments to be used by others researcher, teachers and lecturers.So, they can assess the level of their student critical thinking skills at institute, college or university.

Conclusion
This research report is about the validity and reliability process of the instrument developed through adaptation and design item instrument.This instrument is aim to assess critical thinking skills in basics statistics element.It is assessed through two main aspects, namely the interpretation and evaluation.This instrument is an alternative instrument or in addition instruments to others thinking skills instrument.However, more in-depth research can be done to refine the instruments mainly other aspects of critical thinking skills.This step can improve the usability of these instruments in the future.Usability means, these instruments can be used on different respondent.11.Di bandar itu, penguatkuasaan peraturan secara ketat terhadap kehadiran ke sekolah tidak mencegah 85 peratus daripada murid tidak hadir sekali-sekala semasa penggal sekolah.

Table 1 .
Comparison Procedures Content Validity

Kuder-Richardson 20 Reliability
Interpretation of the reliability of the test is the same as the interpretation of Cronbach alpha test.It is based on the reliability value.According to Jackson(2009), the reliability of 0.70 to 1.00 is strong; medium 0.30 to 0.69 and 0.29 is weak.
(Fraenkal, Wallen, & Hyun, 2012)surement(Ariffin, 2003).Statistical test Kuder-Richardson 20 coefficient is done to look at the reliability of this instrument.Kuder Richardson test used to determine the test or instrument that scored right or wrong only(Fraenkal, Wallen, & Hyun, 2012).b.Determine the reliability of the Basics Statistics critical thinking skills test.