Which of the following allows us to compare one students performance on a test to another student?

Lesson 1.3: Frame of Reference for Interpreting Scores— NRT and CRT

Which of the following allows us to compare one students performance on a test to another student?

  1. Lesson Objectives
  2. Frames of Reference for Interpreting Test Scores
  3. Norm-referenced Interpretation
  4. Criterion-referenced Interpretation
  5. Show What You Know: Application
Lesson Objectives
  • Define norm-referenced and criterion-referenced interpretations.
  • Classify frames of reference as norm-referenced or criterion-referenced.
  • Define different types of scores: standard scores, percentile rank, grade equivalents, and scaled scores.
  • Interpret student performance and progress using different types of test scores.

Without a frame of reference, a raw score by itself does not tell us how much a student knows or what he or she can or cannot do within a content area. A test score must be referenced or compared to something to give meaning to it. For example, a student receives a score of 88 on a test with no other information provided. Some questions that may come to the student’s mind are:

  • What is the maximum score?
  • How did others do?
  • Is it a percentage? A raw score?
  • Is it an improvement from a previous test?
Frames of Reference for Interpreting Test Scores

There are four frames of references for interpreting test scores are explained: Ability, growth, norm, and criterion referenced interpretations. The table below summarizes the four types of frames of reference for interpreting test scores. For this course, we will concentrate on two of them: norm-referenced (NRT) and criterion-referenced (CRT) interpretations, as these are the ones commonly used in school reports.

Frames of Reference for Interpreting Test Scores

Interpretation provided by this reference

Condition that must be present for this reference to be useful

Ability-referenced

How are students performing relative to what they are capable of doing?

Requires good measures of what students are capable of doing; their maximum possible performance.

Growth-referenced

How much have students changed or improved relative to what they were doing earlier?

Requires pre- and post-measures of performance that are highly reliable.

Norm-referenced

How well are students doing with respect to what is typical or reasonable?

To whom students are being compared must be clearly understood.

Criterion-referenced

What can or cannot students do?

Content domain that was assessed must be well defined.

Oosterhof (2003), p. 12

Achievement tests, such as the FCAT, can provide two types of information: 1) norm-referenced, which shows the relative ranking of students among other students; and 2) criterion-referenced, which describes what students can do in reference to the Sunshine State Standards without referring to how other students perform.

Gronlund (2003) points out, “strictly speaking, the terms norm-referenced and criterion-referenced refer only to the method of interpreting results. Thus, both types of interpretation could be applied to the same assessment” (p. 27). An example would be: Jack surpassed 90% of the students (norm-referenced interpretation) by naming 13 of the 26 letters on the DIBELS letter naming test.

Criterion-referenced interpretations are particularly relevant for instructional purposes because we can best help students improve by targeting what tasks they cannot perform. Norm-referenced interpretations, on the other hand, are most useful when our concern is with ranking students for such purposes as screening, classification, and grouping.

Back to Top

Norm-referenced Interpretation

Norm-referenced interpretations compare a student’s score to a range of previously observed performances, usually the performance of other students. How much a student knows is determined by his or her relative ranking within the norm group (the group of reference). Returning to the sample test score of 88 -- suppose that the student’s score was the 88th percentile on the reading portion of the FCAT Norm-Referenced Test (NRT). A norm-referenced interpretation would mean that he scored better than 88 percent of students in the norm group. To say, “A student is third in reading comprehension in a class of 50,” is a norm-referenced interpretation. Percentile ranks are the most commonly used scores in norm-referenced interpretations.

Many of the tests you will be using for Screening, Diagnosis, and Outcome assessment will be standardized test scores that are interpreted in a normative fashion. When making screening decisions as to whether or not students will be at risk or not at risk of failure of attaining grade level progress, you will most likely be relying on norm-referenced interpretations. For example, on the norm-referenced version of FCAT (FCAT NRT) suppose you discover several sixth graders at your school had scored at the 25th percentile at the end of their 5th grade year. Your interpretation would be that they scored better than 25% of similar 5th graders in the norming sample. As such, they may need to receive immediate intensive instruction in reading in order to be able to meet sixth grade-level academic requirements.

To ensure that scores are interpreted appropriately, test norms should be relevant, representative, comparable, and adequately described.

Example: The FCAT NRT is a test of reading comprehension and math problem solving for which norm-referenced scores are reported for Florida students. Students’ national percentile rank (NPR) indicates the percent of students who earned the same score or lower on a nationally normed sample. Students who score at the national average earn an NPR of 50. For a sample student report of the FCAT NRT, click the following link: http://www.firn.edu/doe/sas/fcat/pdf/fc_ufr2004.pdf. To see a statewide comparison of students’ scores on the FCAT NRT, click this link: http://fcat.fldoe.org/pdf/fc_StatewideComparisonNRT2004.pdf

Back to Top

Criterion-referenced Interpretation

Criterion-referenced interpretations allow us to compare a student’s performance to a well-defined content domain (e.g., strands), rather than to rank students or compare them to a norm group. Students’ scores tell us their knowledge and performance levels within a particular content area. They provide an absolute interpretation (i.e., what a student can and cannot do) and not a comparative interpretation (i.e., how a student’s performance compares to others). To make an adequate criterion-referenced interpretation, the content area must be well defined so that you can describe a student’s achievement level from the content sub-scores (strands). Typically, percent scores and raw scores are used in determining the score needed for passing or mastery.

Example: The FCAT SSS (Sunshine State Standards) is a criterion-referenced test. It assesses student achievement on the knowledge and skills described in the state curriculum framework called the Sunshine State Standards (content domain). One of the scores reported is the number of points a student earns in each content area (Words and Phrases in Context, Main Ideas, Plot and Purpose, Comparisons and Cause/Effect, and Reference and Research). http://www.firn.edu/doe/sas/fcat/pdf/fc_ufr2005.pdf.

Major Differences between NRT and CRT frames of Reference

Differences

Norm-referenced

Criterion-referenced

Interpretation

How well students are doing with respect to other students.

What students can and cannot do with respect to a content domain.

Required Conditions

To whom students’ scores are being compared must be clearly understood and the norm group must be well defined

The content domain that is being referenced must be clearly defined.

Most tests used in schools are either norm-referenced or criterion-referenced. As previously stated, norm-referenced interpretations compare students’ scores to a reference group (e.g., norming group, other 8th graders, other students in a class), and criterion-referenced interpretations specify what knowledge and skills students learned within a specified content domain.

As a reference, the definitions of scores contained in FCAT reports are provided on page 37 of the following website: http://www.firn.edu/doe/sas/fcat/pdf/fc_ufr2004.pdf.

Harcourt provides a useful site of general measurement terms at: Glossary of Measurement Terms

Which type of test the students performance is compared with other students?

What are Norm-referenced tests? These test measure student's performance in comparison to other students. Also, the age and question paper is same for both of them. They measure whether the students have performed better or worse than other test takers.

How do you measure student performance?

Formal tests, quizzes, and exams are the traditional methods for assessing student achievement. Surveys are the traditional method to solicit feedback from students about the course and the instructor. Survey questions are not assigned a point value and Surveys are not graded.

What type of test might compare student performance to specific standards or benchmarks?

Criterion-referenced tests compare a person's knowledge or skills against a predetermined standard, learning goal, performance level, or other criterion. With criterion-referenced tests, each person's performance is compared directly to the standard, without considering how other students perform on the test.

Which test is designed to rank and compare students in relation to one another?

Norm-referenced tests are designed to rank test takers on a “bell curve,” or a distribution of scores that resembles, when graphed, the outline of a bell—i.e., a small percentage of students performing poorly, most performing average, and a small percentage performing well.