Rumah Anthares: LANGUAGE TESTING II

written by:

YURIN PRILLIA (A1D208084)

FARLIA (A1D208048)

ASTRINA LATAEWA (A1D208012)

HERMIN PUJI HANTARI (A1D208056)

SITI HALMIANTI (A1D208030)

INTRODUCTION

This paper presents three main topics namely the way of making good test, validity, and way of measuring data manually. The topics intend us to choose because owning important role in testing language especially at planning of making of test and measuring test. This paper we expect can give additional benefit of knowledge to readers about language testing.

Firstly this paper present about how to make a good test. We put it in the early of discussion because it has been a crucial thing that we have to know before make a test. As we know that, a good test in the end has to earn to yield its function better as according to plan which have been determined previously. Therefore the test can run its function hence there are some conditions which must be fulfill. And otherwise hence that test will become ugly because disagree with its function. The requirements are valid, reliable and practical. They will be discussed in our paper’ body more.

The Next we present about kinds of validity. Validity is defined as the extent to which the instrument measures what it purports to measure. For example, a test that is used to screen applicants for a job is valid if its scores are directly related to future job performance. There are many different types of validity. So, Validity is concept which is very complex. Word “validity” is very often joined by a predicate to express special meaning of congeniality of validity mean. For example what referred as by concurrent validity has other congeniality with referred as by predictive validity. Validity types will get attention of us in this chapter. By successively we will discuss: (1) criterion validity; ( 2) content validity ; ( 3) construct validity; ( 4) face validity.

The last, in this paper presents how to calculate data manually. It is important Measurement is at the core of doing research. Measurement is the assignment of numbers to things. In almost all research, everything has to be reduced to numbers eventually. Precision and exactness in measurement are vitally important. The measures are what are actually used to test the hypotheses. A researcher needs good measures for both independent and dependent variables. Measurement consists of two basic processes called conceptualization and operationalization, then an advanced process called determining the levels of measurement, and then even more advanced methods of measuring reliability and validity. Measurement can be done by using manual way and other technique which run automatically, using spss for example. But in this paper we will focus on measurement way manually.

Therefore, after reading our paper we hope that reader will get clear description about how to make a good test, how to differ kinds of validity and how to calculate data manually. but maybe this paper far from perfection so that we expect critic and suggestion of readers if find lacking in our paper because the critic later will be very useful for us .

What we want to learn?

We want to make a paper about:

1. The requirement of a good test
2. Different type of test validity
3. How to calculate the data manually of research result manually

Planning of Project

We divide duty to look for the material about what we want to learn

- Hermin Pujihantari and Halmianti look for different types of test validity

- Farlia look for the requirement of a good test

- Yurin Prillia and Astrina Lataewa look for how to calculate the data of research result manually

2. Looking for resources: we look for our material from different resources, namely:

- We find from the internet

We found some of our topic from the internet.

- Library

In this case we look for the resources from the library of Teacher Training and Educational Faculty, Perpustakaan Daerah, and Taman Baca.

- Ask friends/expert

Besides from internet and library, we also ask our senior who has knowledge about our material.

3. We collect the material from the resources. Each member has to submit all of their material that she found.

4. We discuss with our group member. After submitting the material, then we share about what we have found to our group member.

5. We make a summary or paper. After sharing, we conclude all of our discussion and then we make a summary.

6. Share to another group.

7. Ask proved reader. The first proved reading comes from our group members. They suggested:

· I recommend that every topic should be written based on the occurrence sequence, from the plan to the result processing. In this case, how to make a good test put in the first place, then kinds of validity and the last is how to calculate data manually.
· The using of point of view should be use first person plural ( we)because this paper is written by a group not individually. (See. Chapter 2; What make a good test? ;Almost of every teacher find it difficult to make a good test for their students including me.(paragraph 1 line 3) . I am sure that it is going to be useful for everyone.( line 5)

DISCUSSION

1. The Requirement of a Good Test

What make a good test?

Everyone understands that if they are teachers they are considered to make good test for the various test. Almost of every teacher find it difficult to make a good test for their students including me. In order to make a good test what we should know is that " is the test valid, reliable and practical". If we know the ingredients that make good test, our responsibility is to make efforts and apply it in our class. I am sure that it is going to be useful for everyone.

Choosing a Test Format:

Before you begin to write test questions, you need to determine which type of test format you are going to utilize. The most common test formats include multiple choice questions, true or false questions, fill in the blank questions and open-ended questions. Choose the format that best measures the student's cognitive ability in the given subject matter.

For example, if you want the student to compare and contrast an issue taught during a history lesson, open ended questions may be the best option to evaluate the student's understanding of the subject matter. If you are seeking to measure the student's reasoning skills, analysis skills or general comprehension of a subject matter, consider selecting primarily multiple choice test questions. Or, for a varied approach, utilize a combination of all available test question types so that you can appeal to the learning strengths of any student on an exam.

Valid:

If you are going to make a test what you need to consider first is that is your test is valid. To make sure that the test is convered what the students had already learned before. If the test is convered what they hadn't learned before, it means that the test is not valid. So, make sure that the test is convered what they have already learned to be a good test maker.

Validity is a measure of a test's usefulness. Scores on the test should be related to some other behavior, reflective of personality, ability, or interest. For instance, a person who scores high on an IQ test would be expected to do well in school or on jobs requiring intelligence. A person who scores high on a scale of depression should be diagnosed as depressed by mental health professionals who assess him. A validity coefficient reflects the degree to which such relationships exist. Most tests have validity coefficients (correlations) of up to .30 with "real world" behavior. This is not a high correlation, and emphasizes the need to use tests in conjunction with other information. Relatively low correlations mean that some people may score high on a scale of schizophrenia without being schizophrenic and some people may score high on an IQ test and yet not do well in school. Correlations are high as .50 are seen between IQ and academic performance.

Reliable:

Reliable is also required to be a good test. How should the test be reliable for students. Most of the teachers is not giving the same marks for the same answer. That's the reason why they can't make a good test. So, if you are a teacher you need to make sure that you are giving the same marks for the same. Some teachers say that they it doesn't seem very difficult to mark for Mulitiple Test but it seems difficult when they have to mark the essay. In order to give the same mark for essays, i would recommend everyone to read teaching books.

Reliability is a measure of the test's consistency. A useful test is consistent over time. As an analogy, think of a bathroom scale. If it gives you one weight the first time you step on it, and a different weight when you step on it a moment later, it is not reliable. Similarly, if an IQ test yields a score of 95 for an individual today and 130 next week, it is not reliable. Reliability also can be a measure of a test's internal consistency. All of the items (questions) on a test should be measuring the same thing -- from a statistical standpoint, the items should correlate with each other. Good tests have reliability coefficients which range from a low of .65 to above .90 (the theoretical maximum is 1.00).

Practical:

Practical also plays the most important part of making a good test. While you are considering how to make a test, it is very useful to think that the test is easy to administrate and mark. Whoever marking for the student's answer sheet, the students must get the same marks for the same answer. So, you are necessary to make a practical test to be a good teacher.

Standardization is the process of trying out the test on a group of people to see the scores which are typically obtained. In this way, any test taker can make sense of his or her score by comparing it to typical scores. This standardization provides a mean (average) and standard deviation (spread) relative to a certain group. When an individual takes the test, she can determine how far above or below the average her score is, relative to the normative group. When evaluating a test, it is very important to determine how the normative group was selected. For instance, if everyone in the normative group took the test by logging into a website, you are probably being compared to a group which is very different from the general population.

2. Different Type of Test Validity

Test validity is an indicator of how much meaning can be placed upon a set of test results. Validity refers to the accuracy of an assessment. Also, if a test is valid, it is almost always reliable. Test validity incorporates a number of different validity types, including criterion validity, content validity, construct validity, and face validity. If a research project scores highly in these areas, then the overall test validity is high. In order to have confidence that a test is valid (and therefore the inferences we make based on the test scores are valid), all three kinds of validity evidence should be considered.

Criterion Validity

Criterion validity establishes whether the test matches a certain set of abilities. Criterion validity assesses whether a test reflects a certain set of abilities. To measure the criterion validity of a test, researchers must calibrate it against a known standard or against itself. Comparing the test with an established measure is known as concurrent validity; testing it over a period of time is known as predictive validity. It is not necessary to use both of these methods, and one is regarded as sufficient if the experimental design is strong. One of the simplest ways to assess criterion related validity is to compare it to a known standard.

a. Concurrent validity is a measure of how well a particular test correlates with a previously validated measure. It is commonly used in social science, psychology and education.

Example:

Researchers give a group of students a new test, designed to measure mathematical aptitude. They then compare this with the test scores already held by the school, a recognized and reliable judge of mathematical ability. Cross referencing the scores for each student allows the researchers to check if there is a correlation, evaluate the accuracy of their test, and decide whether it measures what it is supposed to. The key element is that the two methods were compared at about the same time. If the researchers had measured the mathematical aptitude, implemented a new educational program, and then retested the students after six months, this would be predictive validity.

b. Predictive validity is a measure of how well a test predicts abilities, such as measuring whether a good grade point average at high school leads to good results at university. Predictive validity involves testing a group of subjects for a certain construct, and then comparing them with results obtained at some point in the future.

Example:

The most common use for predictive validity is inherent in the process of selecting students for university. Most universities use high-school grade point averages to decide which students to accept, in an attempt to find the brightest and most dedicated students. In this process, the basic assumption is that a high-school pupil with a high grade point average will achieve high grades at university. Quite literally, there have been hundreds of studies testing the predictive validity of this approach. To achieve this, a researcher takes the grades achieved after the first year of studies, and compares them with the high school grade point averages. A high correlation indicates that the selection procedure worked perfectly, a low correlation signifies that there is something wrong with the approach.

Content Validity

This type of validity is important to make sure that the test or questionnaire that is prepared actually covers all aspects of the variable that is being studied. If the test is too narrow, then it will not predict what it claims. Content validity, sometimes called logical or rational validity, is the estimate of how much a measure represents every single element of a construct. Content validity establishes how well a test compares to the real world.

For example, a school test of ability should reflect what is actually taught in the classroom.

Construct Validity

Construct validity defines how well a test or experiment measures up to its claims. It refers to whether the operational definition of a variable actually reflects the true theoretical meaning of a concept. Construct validity is a type of statistical validity that ensures that the actual experimentation and data collection conforms to the theory that is being studied. A questionnaire regarding public opinion must reflect construct validity to provide an accurate picture of what people really think about issues. There are essentially two types of construct validities:

a. Convergent validity – this validity ensures that if the required theory predicts that one measure be correlated with the other, then the statistics confirm this.

b. Divergent or Discriminator validity – this validity ensures that if the required theory predicts that one variable doesn’t correlate with others, then statistics need to conform this.

Face Validity

This is related to content validity and is a quick starting estimate of whether the given experiment actually mimics the claims that are being verified. In other words, face validity measures whether or not the survey has the right questions in order to answer the research questions that it aims to answer. Face validity, as the name suggests, is a measure of how representative a research project is ‘at face value,’ and whether it appears to be a good project.

3. How to Calculate the Data Manually (without using SPSS)

a. Mean

c. Modus

To count the mode score (Mo), the writer determined the most appeared score

d. Count range

To count the range score, the writer used formula

Range (Rn) = Maximum score – minimum score

e. Standard Deviation

Provides a reference of a group of scores to the normal curve or, it describes the variability in a group of scores.

Instrument Testing

a. Validity

Product moment

To know coefficient, we can see from the scale below:

0,80< r 1 = valid very high

0,60 < r 0,79 = valid high

0,40 < r 0,59 = valid medium

0,20 < r 0,39 = valid low

0,00 < r 0,19 = valid very low

r 0,00 = invalid

(Suharsimi Arikunto, 2005: 263)

b. Reliability

· To identify the reliability of the test, the researcher can used KR-21

After that, we use Spearman Brown formula:

0,00-0,20 Bad

0,21-0,20 Enough

0,41-0,71 Good

0,71-1,00 Very good

Interpretation of level of difficulty

0,00-0,30 Difficult

0,31-0,70 Medium

0,71-1,00 Easy

Hypothesis Testing

In examining the hypothesis testing, the researcher used one tailed t-test with formula as follows:

CONCLUSION

From our discussion we conclude that:

The requirement of a good test are

First, a good test before manufactured should be recognized test format that will be chosen. Second, a good test must be valid. Validity is important to know whether the test is covered what the students had already learned before. Third, a good test must be reliable. To make your test be reliable you should make sure that you are giving the same marks to the same answer. Fourth, a good test should be practical. In this case before making a test we should think whether the test is easy to administer and mark.

There are four kinds of test validity, criterion validity, content validity, construct validity and face validity.

Rumah Anthares

WELCOME TO RUMAH ANTHARES

Cari Blog Ini

Minggu, 11 Desember 2011