Lesson 1.3: Frame of Reference for Interpreting Scores—
NRT and CRT
- Lesson Objectives
- Frames of Reference for Interpreting Test Scores
- Norm-referenced Interpretation
- Criterion-referenced Interpretation
- Show What You Know: Application
- Define norm-referenced and criterion-referenced interpretations.
- Classify frames of reference as norm-referenced or criterion-referenced.
- Define different types of scores: standard scores, percentile rank, grade equivalents, and scaled scores.
- Interpret student performance and progress using different types of test scores.
Without a frame of reference, a raw score by itself does not tell us how much a student knows or what he or she can or cannot do within a content area. A test score must be referenced or compared to something to give meaning to it. For example, a student receives a score of 88 on a test with no other information provided. Some questions that may come to the student’s mind are:
- What is the maximum score?
- How did others do?
- Is it a percentage? A raw score?
- Is it an improvement from a previous test?
There are four frames of references for interpreting test scores are explained: Ability, growth, norm, and criterion referenced interpretations. The table below summarizes the four types of frames of reference for interpreting test scores. For this course, we will concentrate on two of them: norm-referenced (NRT) and criterion-referenced (CRT) interpretations, as these are the ones commonly used in school reports.
Frames of Reference for Interpreting Test Scores
Interpretation provided by this reference | Condition that must be present for this reference to be useful | |
Ability-referenced | How are students performing relative to what they are capable of doing? | Requires good measures of what students are capable of doing; their maximum possible performance. |
Growth-referenced | How much have students changed or improved relative to what they were doing earlier? | Requires pre- and post-measures of performance that are highly reliable. |
Norm-referenced | How well are students doing with respect to what is typical or reasonable? | To whom students are being compared must be clearly understood. |
Criterion-referenced | What can or cannot students do? | Content domain that was assessed must be well defined. |
Oosterhof (2003), p. 12
Achievement tests, such as the FCAT, can provide two types of information: 1) norm-referenced, which shows the relative ranking of students among other students; and 2) criterion-referenced, which describes what students can do in reference to the Sunshine State Standards without referring to how other students perform.
Gronlund (2003) points out, “strictly speaking, the terms norm-referenced and criterion-referenced refer only to the method of interpreting results. Thus, both types of interpretation could be applied to the same assessment” (p. 27). An example would be: Jack surpassed 90% of the students (norm-referenced interpretation) by naming 13 of the 26 letters on the DIBELS letter naming test.
Criterion-referenced interpretations are particularly relevant for instructional purposes because we can best help students improve by targeting what tasks they cannot perform. Norm-referenced interpretations, on the other hand, are most useful when our concern is with ranking students for such purposes as screening, classification, and grouping.
Back to Top
Norm-referenced InterpretationNorm-referenced interpretations compare a student’s score to a range of previously observed performances, usually the performance of other students. How much a student knows is determined by his or her relative ranking within the norm group (the group of reference). Returning to the sample test score of 88 -- suppose that the student’s score was the 88th percentile on the reading portion of the FCAT Norm-Referenced Test (NRT). A norm-referenced interpretation would mean that he scored better than 88 percent of students in the norm group. To say, “A student is third in reading comprehension in a class of 50,” is a norm-referenced interpretation. Percentile ranks are the most commonly used scores in norm-referenced interpretations.
Many of the tests you will be using for Screening, Diagnosis, and Outcome assessment will be standardized test scores that are interpreted in a normative fashion. When making screening decisions as to whether or not students will be at risk or not at risk of failure of attaining grade level progress, you will most likely be relying on norm-referenced interpretations. For example, on the norm-referenced version of FCAT (FCAT NRT) suppose you discover several sixth graders at your school had scored at the 25th percentile at the end of their 5th grade year. Your interpretation would be that they scored better than 25% of similar 5th graders in the norming sample. As such, they may need to receive immediate intensive instruction in reading in order to be able to meet sixth grade-level academic requirements.
To ensure that scores are interpreted appropriately, test norms should be relevant, representative, comparable, and adequately described.
Example: The FCAT NRT is a test of reading comprehension and math problem solving for which norm-referenced scores are reported for Florida students. Students’ national percentile rank (NPR) indicates the percent of students who earned the same score or lower on a nationally normed sample. Students who score at the national average earn an NPR of 50. For a sample student report of the FCAT NRT, click the following link: //www.firn.edu/doe/sas/fcat/pdf/fc_ufr2004.pdf. To see a statewide comparison of students’ scores on the FCAT NRT, click this link: //fcat.fldoe.org/pdf/fc_StatewideComparisonNRT2004.pdf
Back to Top
Criterion-referenced InterpretationCriterion-referenced interpretations allow us to compare a student’s performance to a well-defined content domain (e.g., strands), rather than to rank students or compare them to a norm group. Students’ scores tell us their knowledge and performance levels within a particular content area. They provide an absolute interpretation (i.e., what a student can and cannot do) and not a comparative interpretation (i.e., how a student’s performance compares to others). To make an adequate criterion-referenced interpretation, the content area must be well defined so that you can describe a student’s achievement level from the content sub-scores (strands). Typically, percent scores and raw scores are used in determining the score needed for passing or mastery.
Example: The FCAT SSS (Sunshine State Standards) is a criterion-referenced test. It assesses student achievement on the knowledge and skills described in the state curriculum framework called the Sunshine State Standards (content domain). One of the scores reported is the number of points a student earns in each content area (Words and Phrases in Context, Main Ideas, Plot and Purpose, Comparisons and Cause/Effect, and Reference and Research). //www.firn.edu/doe/sas/fcat/pdf/fc_ufr2005.pdf.
Major Differences between NRT and CRT frames of Reference
Differences | Norm-referenced | Criterion-referenced |
Interpretation | How well students are doing with respect to other students. | What students can and cannot do with respect to a content domain. |
Required Conditions | To whom students’ scores are being compared must be clearly understood and the norm group must be well defined | The content domain that is being referenced must be clearly defined. |
Most tests used in schools are either norm-referenced or criterion-referenced. As previously stated, norm-referenced interpretations compare students’ scores to a reference group (e.g., norming group, other 8th graders, other students in a class), and criterion-referenced interpretations specify what knowledge and skills students learned within a specified content domain.
As a reference, the definitions of scores contained in FCAT reports are provided on page 37 of the following website: //www.firn.edu/doe/sas/fcat/pdf/fc_ufr2004.pdf.
Harcourt provides a useful site of general measurement terms at: Glossary of Measurement Terms