Authentic Assessment Resources
from
the ERIC Clearinghouse on Assessment and
Evaluation
Recommendations for Teachers
Teachers who have begun to use alternative assessment in
their classrooms
are good sources for ideas and guidance. The following
recommendations were
made by teachers in Virginia after they spent six months
developing and
implementing alternative assessment activities in their
classrooms.
1. Start small. Follow someone else's
example in the beginning, or
do one
activity in combination with a traditional test.
2. Develop clear rubrics. Realize that
developing an effective
rubric
(rating scale with several categories) for judging student
products and performances is harder than carrying out the
activity. Standards and expectations must be clear. Benchmarks
for
levels of performance are essential. Characteristics of
typical
student products and performances may be used to generate
performance assessment rubrics and standards for the class.
3. Expect to use more time at first.
Developing and evaluating
alternative assessments and their rubrics requires additional
time
until you and your students become comfortable with the
method.
4. Adapt existing curriculum. Plan
assessment as you plan
instruction, not as an afterthought.
5. Have a partner. Sharing ideas and
experiences with a colleague is
beneficial to teachers and to students.
6. Make a collection. Look for examples of
alternative assessments
or
activities that could be modified for your students and keep a
file
readily accessible.
7. Assign a high value (grade) to the
assessment. Students need to
see the
experience as being important and worth their time. Make
expectations clear in advance.
8. Expect to learn by trial and error. Be
willing to take risks and
learn
from mistakes, just as we expect students to do. The best
assessments are developed over time and with repeated use.
9. Try peer assessment activities. Relieve
yourself of some grading
responsibilities and increase student evaluation skills and
accountability by involving them in administering assessments.
10. Don't
give up. If the first tries are not as successful as you
had
hoped, remember, this is new to the students, too. They can
help
you refine the process. Once you have tried an alternative
assessment, reflect and evaluate the activities. Ask yourself
some
questions. What worked? What needs
modification? What
would I
do differently? Would I use this
activity again? How
did the
students respond? Did the end results
justify the time
spent? Did students learn from
the activity?
Virginia
Education Association and the
Appalachia Educational
Laboratory (1992)
A LONG OVER VIEW ON ALTERNATIVE ASSESSMENT
Prepared by Lawrence Rudner, ERIC/AE and Carol Boston,
ACCESS ERIC
So, what's all the hoopla about? Federal commissions
have endorsed
performance assessment.
It's been discussed on C-SPAN and in a number of
books and articles.
Full issues of major education journals, including
Educational Leadership (April 1989 and May 1992) and Phi
Delta Kappan
(February 1993), have been devoted to performance
assessment. A
surprisingly large number of organizations are actively
involved in
developing components of a performance assessment
system. Chances are good
that one or more of your professional associations is in
the middle of
debating goals and standards right now.
Is this just the latest bandwagon? Another short-term fix? Probably not.
The performance assessment movement encompasses much
more than a technology
for testing students.
It requires examining the purposes of education,
identifying skills we want students to master, and empowering
teachers.
Even without an assessment component, these activities
can only be good for
education. You
can be certain they will have an impact on classrooms.
This article describes performance assessments, weighs
their advantages and
disadvantages as instructional tools and accountability
measures, and
offers suggestions to teachers and administrators who
want to use
performance assessments to improve teaching and
learning.
Key Features of Performance Assessment
The Office of Technology Assessment (OTA) of the U.S.
Congress (1992)
provides a simple, yet insightful, definition of
performance assessment:
testing
that requires a student to create an answer or a product
that
demonstrates his or her knowledge or skills.
A wide variety of assessment techniques fall within this
broad definition.
Several are described in Table 1. One key feature of all performance
assessments is that they require students to be active
participants.
Rather than choosing from presented options, as in
traditional multiple-
choice tests, students are responsible for creating or
constructing their
responses. These
may vary in complexity from writing short answers or
essays to designing and conducting experiments or
creating comprehensive
portfolios. It
is important to note that proponents of "authentic
assessment" make distinctions among the various
types of performance
assessments, preferring those that have meaning and
value in themselves to
those that are meaningful primarily in an academic
context. In a chemistry
class, for example, students might be asked to identify
the chemical
composition of a premixed solution by applying tests for
various
properties, or they might take samples from local lakes
and rivers and
identify pollutants.
Both assessments would be performance-based, but the
one involving the real-world problem would be considered
more authentic.
Testing has traditionally focused on whether students
get the right
answers; how they arrive at their answers has been
considered important
only during the test development. When students take a standardized
mathematics test, for example, there is no way to
distinguish among those
who select the correct answer because they truly
understand the problem,
those who understand the problem but make a careless
calculation mistake,
and those who have no idea how to do the work but simply
guess correctly.
Performance assessments, on the other hand, require
students to demonstrate
knowledge or skills; therefore, the process by which
they solve problems
becomes important.
To illustrate, if high school juniors are asked to
demonstrate their understanding of interest rates by
comparison shopping
for a used-car loan and identifying the best deal in a
report, a teacher
can easily see if they understand the concept of
interest, know how to
calculate it, and perform mathematical operations
accurately.
In performance assessment, items directly reflect
intended outcomes.
Whereas a traditional test might ask students about
grammar rules, a
performance assessment would have them demonstrate their
understanding of
English grammar by editing a poorly written
passage. A traditional auto
mechanics test might include questions about a front-end
alignment; a
performance assessment would have students do one.
Performance assessments can also measure skills that
have not traditionally
been measured in large groups of students skills such as
integrating
knowledge across disciplines, contributing to the work
of a group, and
developing a plan of action when confronted with a novel
situation. Grant
Wiggins (1990) captures their potential nicely:
Do we
want to evaluate student problem-posing and problem-solving
in
mathematics? Experimental research in
science? Speaking,
listening, and facilitating a discussion? Doing document-based
historical inquiry? Thoroughly
revising a piece of imaginative
writing
until it `works' for the reader? Then
let our assessment
be
built out of such exemplary intellectual challenges.
What's Wrong With the Way We've Been Doing It?
Many tests used in state and local assessments, as well
as the Scholastic
Aptitude Test and the National Assessment of Educational
Progress, have
been criticized for failing to provide the information
we need about
students and their ability to meet specific curricular
objectives. Critics
contend that these tests, as currently formulated, often
assess only a
narrow range of the curriculum; focus on aptitudes, not
specific curriculum
objectives; and emphasize minimum competencies, thus
creating little
incentive for students to excel. Further, they yield results that are
analyzed primarily on the national, state, and district
levels rather than
used to improve the performance of individual pupils or
schools.
The true measure of performance assessment must,
however, lie in its
ability to assess desired skills, not in the alleged
inability of other
forms of assessment.
Here We Go Again?
You might ask, "Is performance assessment really
new?" Good classroom
teachers have used projects and portfolios for years,
preparing numerous
activities requiring students to blend skills and
insights across
disciplines.
Performance assessment has been particularly common in
vocational education, the military, and business. ERIC has used
"performance tests" as a descriptor since
1966.
What is new is the widespread interest in the potential
of performance
assessment. Many
superintendents, state legislators, governors, and
Washington officials see high-stakes performance tests
as a means to
motivate students to learn and schools to teach concepts
and skills that
are more in line with today's expectations. This perspective will be
called the motivator viewpoint in this article. Many researchers,
curriculum specialists, and teachers, on the other hand,
see performance
assessment as empowering teachers by providing them with
better
instructional tools and a new emphasis on teaching more
relevant skills, a
perspective that will be referred to as the empowerment
viewpoint here.
Proponents of both viewpoints agree on the need to
change assessment
methods but differ in their views about how assessment
information should
be used.
On the Value of Performance Assessments
Advocates of the motivator and empowerment viewpoints concur
that
performance assessments can form a solid foundation for
improving schools
and increasing what students know and can do. However, the two groups
frame the advantages differently. Their positions are sketched here
briefly, then developed more fully in the sections that
follow.
The motivators emphasize that performance-based
assessments, if instituted
on a district, state, or national level, will allow us
to monitor the
effectiveness of schools and teachers and track
students' progress toward
achieving national educational goals (see
"Standards, Assessments, and the
National Education Goals" on pp. X X). According to the motivator
viewpoint, performance assessments will make the
educational system more
accountable for results. Proponents expect them to do the following:
˛ prompt schools to focus on important,
performance-based outcomes;
˛ provide sound data on achievement, not
just aptitude;
˛ allow valid comparisons among schools,
districts, and states; and
˛ yield results for every important level of
the education system,
from
individual children to the nation as a whole.
Those in the empowerment camp, on the other hand, tend
to focus on how
performance assessments will improve teaching and
learning at the classroom
level.
Instructional objectives in most subject areas are being redefined
to include more practical applications and more emphasis
on synthesis and
integration of content and skills. Performance assessments that are
closely tied to this new curriculum can give teachers
license to emphasize
important skills that traditionally have not been
measured. Performance
assessments can also provide teachers with diagnostic
information to help
guide instruction.
The outcomes-based education (OBE) movement supports
instructional activities closely tied to performance
assessment tasks.
Under
OBE, students who do not demonstrate the level of
accomplishment
their local communities and school districts have agreed
upon receive
additional instruction to bring them up to the
level.
High-Stakes Performance Assessments as Motivators
One of the most historic events concerning education
occurred in September
1989, when President George Bush held an education
summit in
Charlottesville, Virginia, with the nation's
governors. Together, the
participants hammered out six far-reaching national
education goals,
effectively acknowledging that education issues
transcend state and local
levels to affect the democratic and economic foundations
of the entire
country. In a
closing statement, participants announced,
We
unanimously agree that there is a need for the first time in this
nation's
history to have specific results-oriented goals. We
recognize
the need for ... accountability for outcome-related results.
Consensus is now building among state legislators,
governors, members of
Congress, Washington officials, and the general public
regarding the
desirability and feasibility of some sort of voluntary
national assessment
system linked with high national standards in such
subject areas as
mathematics, science, English, history, geography, and
the arts. A number
of professional organizations have received funding to
coordinate the
development of such standards (see sidebar on p.
X). The groundbreaking
work of the National Council of Teachers of Mathematics
(NCTM) serves as a
model for this process:
NCTM developed its Standards in CB: date and is
now developing curriculum frameworks and assessment
guidelines to match it
(see "From Standards to Assessment" on p. X).
The National Council on Education Standards and Testing
(NCEST), an
advisory group formed by Congress and the President in
response to national
and state interest in national standards and
assessments, describes the
motivational effect of a national system of assessments
in its 1992 report,
Raising Standards for American Education:
National
standards and a system of assessments are
desirable
and feasible mechanisms for raising
expectations, revitalizing instruction, and
rejuvenating
educational reform efforts for all
American
schools and students (p. 8).
Envision, if you will, the enormous potential of an
assessment that
perfectly and equitably measures the right skills. NCEST believes that
developing standards and high-quality assessments has
"the potential to
raise learning expectations at all levels of education,
better target human
and fiscal resources for educational improvement, and
help meet the needs
of an increasingly mobile population". This is a shared vision. At least
a half-dozen groups have begun calling for a national
assessment system or
developing instrumentation during the past two years
(see Calls for New
Assessments.
According to NCEST, student standards must be
"world class" and include the
"specification of content what students should know
and be able to do and
the level of performance students are expected to
attain" (p. 3). NCEST
envisions standards that include substantative content
together with
complex problem-solving and higher-order thinking
skills. Such standards
would reflect "high expectations not expectations
of minimal competency"
(p. 13). NCEST
believes in the motivation potential of these world-class
standards, stating that they will "raise the
ceiling for students who are
currently above average" and "lift the floor
for those students who now
experience the least success in school" (p. 4).
Acknowledging that tests tend to influence curriculum,
NCEST suggests that
assessments should be developed to reflect the new high
standards. Such
assessments would not be immediately associated with
high stakes. However,
once issues of validity, reliability, and fairness have
been resolved,
these assessments "could be used for such
high-stakes purposes as high
school graduation, college admission, continuing
education, or
certification for employment. Assessments could also be used by states and
localities as the basis for system accountability"
(p. 27).
The U.S. already has one national assessment in place,
the National
Assessment of Educational Progress (NAEP). Since 1969, the U.S. Department
of Education-sponsored NAEP has been used to assess what
our nation's
children know in a variety of curriculum areas,
including mathematics,
reading, science, writing, U.S. history, and
geography. Historically, NAEP
has been a multiple-choice test administered to random
samples of fourth-,
eighth-, and twelfth-graders in order to report on the
educational progress
of our nation as a whole. As interest in accountability has grown, NAEP
has begun to conduct trial state-level assessments. NAEP is also
increasing the number of performance-based tasks to
better reflect what
students can do (see "Performance-Based Aspects of
the National Assessment
of Educational Progress" on p. x). The National Council on Education
Standards and Testing envisions that large-scale sample
assessments such as
NAEP will be one component of a national system of
assessments, to be
coupled with assessments that can provide results for
individual students.
Supporters argue that a system of national assessments
would improve
education by giving parents and students more accurate,
relevant, and
comparable data and encouraging students to strive for
world-class
standards of achievement. Critics of a national assessment system are
equally visible.
The National Education Association and other professional
associations have argued that high-stakes national
assessments will not
improve schooling and could easily be harmful. They are particularly
concerned that students with disabilities, students
whose native language
is not English, and students and teachers attending
schools with minimal
resources will be penalized under such a system. Fearing that a national
assessment system might not be a good model and could short-circuit
current
reform efforts, The National Center for Fair and Open
Testing, or FairTest,
testified that the federal dollars would be better spent
in support of
state efforts.
Performance Assessment for Teacher Empowerment
An enormous amount of activity is taking place in the
area of establishing
national standards and a system of assessments. The assessments are
expected to encompass performance-based tasks that call
on students to
demonstrate what they can do. They may well have strong accountability
features and be used eventually to make high-stakes
decisions. Should
building principals and classroom teachers get excited
about performance
assessment now?
Absolutely. Viewed in its larger
context, performance
assessment can play an important part in the school
reform/restructuring
movement:
Performance
assessment can be seen as a lever to
promote the
changes needed for the assessment to be
maximally
useful. Among these changes are a
redefinition
of learning and a different conception of
the place of
assessment in the education process
(Mitchell,
1992).
In order to
implement performance assessment fully, administrators and
teachers must have a clear picture of the skills they
want students to
master and a coherent plan for how students are going to
master those
skills. They
need to consider how students learn and what instructional
strategies are most likely to be effective. Finally, they need to be
flexible in using assessment information for diagnostic
purposes to help
individual students achieve. This level of reflection is consistent with
the best practices in education. As Joan Herman, Pamela Aschbacher, and
Lynn Winters note in their important book, A Practical
Guide to Alternative
Assessment (1992),
No longer is
learning thought to be a one-way
transmission
from teacher to students, with the teacher
as lecturer
and the students as passive receptacles.
Rather,
meaningful instruction engages students
actively in
the learning process. Good teachers
draw
on and
synthesize discipline-based knowledge, knowledge
of student
learning, and knowledge of child
development. They use a variety
of instructional
strategies,
from direct instruction to coaching, to
involve
their students in meaningful activities . . .
and to
achieve specific learning goals (p. 12).
Quality
performance assessment is a key part of this vision because "good
teachers constantly assess how their students are doing,
gather evidence of
problems and progress, and adjust their instructional
plans accordingly"
(p. 12).
Properly implemented, performance assessment offers an
opportunity to align
curriculum and teaching efforts with the important
skills we wish children
to master.
Cognitive learning theory, which emphasizes that knowledge is
constructed and that learners vary, provides some
insight into what an
aligned curriculum might look like (see Implications
from Learning Theory).
The Case for Authentic Assessment.
ERIC Digest. ED328611
TM016142
Wiggins, Grant
American
Institutes for Research, Washington, DC.; ERIC Clearinghouse on
Tests, Measurement, and Evaluation, Washington, DC. Dec 1990
Mr. Wiggins,
a researcher and consultant on school reform issues, is a
widely-known advocate of authentic assessment in
education. This digest is
based on materials that he prepared for the California
Assessment Program.
WHAT IS AUTHENTIC ASSESSMENT?
Assessment is
authentic when we directly examine student performance on
worthy intellectual tasks. Traditional assessment, by
contract, relies on
indirect or proxy 'items'--efficient, simplistic
substitutes from which we
think valid inferences can be made about the student's
performance at those
valued challenges.
Do we want to
evaluate student problem-posing and problem-solving in
mathematics? experimental research in science? speaking,
listening, and
facilitating a discussion? doing document-based
historical inquiry?
thoroughly revising a piece of imaginative writing until
it "works" for the
reader? Then let our assessment be built out of such
exemplary intellectual
challenges.
Further
comparisons with traditional standardized tests will help to
clarify what "authenticity" means when
considering assessment design and
use:
* Authentic
assessments require students to be effective performers
with acquired knowledge. Traditional tests tend to
reveal only whether the
student can recognize, recall or "plug in"
what was learned out of context.
This may be as problematic as inferring driving or
teaching ability from
written tests alone. (Note, therefore, that the debate
is not "either-or":
there may well be virtue in an array of local and state
assessment
instruments as befits the purpose of the measurement.)
* Authentic
assessments present the student with the full array of
tasks that mirror the priorities and challenges found in
the best
instructional activities: conducting research; writing,
revising and
discussing papers; providing an engaging oral analysis
of a recent
political event; collaborating with others on a debate,
etc. Conventional
tests are usually limited to paper-and-pencil, one-
answer questions.
* Authentic
assessments attend to whether the student can craft
polished, thorough and justifiable answers, performances
or products.
Conventional tests typically only ask the student to
select or write
correct responses--irrespective of reasons. (There is
rarely an adequate
opportunity to plan, revise and substantiate responses
on typical tests,
even when there are open-ended questions). As a result,
* Authentic
assessment achieves validity and reliability by emphasizing
and standardizing the appropriate criteria for scoring
such (varied)
products; traditional testing standardizes objective
"items" and, hence,
the (one) right answer for each.
* "Test
validity" should depend in part upon whether the test simulates
real-world "tests" of ability. Validity on
most multiple-choice tests is
determined merely by matching items to the curriculum
content (or through
sophisticated correlations with other test results).
* Authentic
tasks involve "ill-structured" challenges and roles that
help students rehearse for the complex ambiguities of
the "game" of adult
and professional life. Traditional tests are more like
drills, assessing
static and too-often arbitrarily discrete or simplistic
elements of those
activities.
Beyond these
technical considerations the move to reform assessment is
based upon the premise that assessment should primarily
support the needs
of learners. Thus, secretive tests composed of proxy
items and scores that
have no obvious meaning or usefulness undermine
teachers' ability to
improve instruction and students' ability to improve
their performance. We
rehearse for and teach to authentic tests--think of
music and military
training--without compromising validity.
The best
tests always teach students and teachers alike the kind of
work that most matters; they are enabling and
forward-looking, not just
reflective of prior teaching. In many colleges and all
professional
settings the essential challenges are known in
advance--the upcoming
report, recital, Board presentation, legal case, book to
write, etc.
Traditional tests, by requiring complete secrecy for
their validity, make
it difficult for teachers and students to rehearse and
gain the confidence
that comes from knowing their performance obligations.
(A known challenge
also makes it possible to hold all students to higher
standards).
WHY DO WE NEED TO INVEST IN THESE LABOR-INTENSIVE FORMS
OF ASSESSMENT?
While
multiple-choice tests can be valid indicators or predictors of
academic performance, too often our tests mislead
students and teachers
about the kinds of work that should be mastered. Norms
are not standards;
items are not real problems; right answers are not
rationales.
What most
defenders of traditional tests fail to see is that it is the
form, not the content of the test that is harmful to
learning;
demonstrations of the technical validity of standardized
tests should not
be the issue in the assessment reform debate. Students
come to believe that
learning is cramming; teachers come to believe that
tests are
after-the-fact, imposed nuisances composed of contrived
questions--irrelevant to their intent and success. Both
parties are led to
believe that right answers matter more than habits of
mind and the
justification of one's approach and results.
A move toward
more authentic tasks and outcomes thus improves teaching
and learning: students have greater clarity about their
obligations (and
are asked to master more engaging tasks), and teachers
can come to believe
that assessment results are both meaningful and useful
for improving
instruction.
If our aim is
merely to monitor performance then conventional testing
is probably adequate. If our aim is to improve
performance across the board
then the tests must be composed of exemplary tasks,
criteria and standards.
WON'T AUTHENTIC ASSESSMENT BE TOO EXPENSIVE AND
TIME-CONSUMING?
The costs are
deceptive: while the scoring of judgment-based tasks
seems expensive when compared to multiple-choice tests
(about $2 per
student vs. 1 cent) the gains to teacher professional
development, local
assessing, and student learning are many. As states like
California and New
York have found (with their writing and hands-on science
tests) significant
improvements occur locally in the teaching and assessing
of writing and
science when teachers become involved and invested in
the scoring process.
If costs
prove prohibitive, sampling may well be the appropriate
response--the strategy employed in California, Vermont
and Connecticut in
their new performance and portfolio assessment projects.
Whether through a
sampling of many writing genres, where each student gets
one prompt only;
or through sampling a small number of all student papers
and school-wide
portfolios; or through assessing only a small sample of
students, valuable
information is gained at a minimum cost.
And what have
we gained by failing to adequately assess all the
capacities and outcomes we profess to value simply
because it is
time-consuming, expensive, or labor-intensive? Most
other countries
routinely ask students to respond orally and in writing
on their major
tests--the same countries that outperform us on
international comparisons.
Money, time and training are routinely set aside to
insure that assessment
is of high quality. They also correctly assume that high
standards depend
on the quality of day-to-day local assessment--further
offsetting the
apparent high cost of training teachers to score student
work in regional
or national assessments.
WILL THE PUBLIC HAVE ANY FAITH IN THE OBJECTIVITY AND
RELIABILITY OF
JUDGMENT-BASED SCORES?
We forget
that numerous state and national testing programs with a high
degree of credibility and integrity have for many years
operated using
human judges:
* the New York Regents exams, parts of which
have included essay
questions since their inception--and which are scored
locally (while
audited by the state);
* the
Advanced Placement program which uses open-ended questions and
tasks, including not only essays on most tests but the
performance-based
tests in the Art Portfolio and Foreign Language exams;
* state-wide
writing assessments in two dozen states where model
papers, training of readers, papers read
"blind" and procedures to prevent
bias and drift gain adequate reliability;
* the
National Assessment of Educational Progress (NAEP), the
Congressionally-mandated assessment, uses numerous
open-ended test
questions and writing prompts (and successfully piloted
a hands-on test of
science performance);
*
newly-mandated performance-based and portfolio-based state-wide
testing in Arizona, California, Connecticut, Kentucky,
Maryland, and New
York.
Though the
scoring of standardized tests is not subject to significant
error, the procedure by which items are chosen, and the
manner in which
norms or cut-scores are established is often quite
subjective--and
typically immune from public scrutiny and oversight.
Genuine
accountability does not avoid human judgment. We monitor and
improve judgment through training sessions, model
performances used as
exemplars, audit and oversight policies as well as
through such basic
procedures as having disinterested judges review student
work "blind" to
the name or experience of the student--as occurs
routinely throughout the
professional, athletic and artistic worlds in the
judging of performance.
Authentic
assessment also has the advantage of providing parents and
community members with directly observable products and
understandable
evidence concerning their students' performance; the
quality of student
work is more discernible to laypersons than when we must
rely on
translations of talk about stanines and renorming.
Ultimately,
as the researcher Lauren Resnick has put it, What you
assess is what you get; if you don't test it you won't
get it. To improve
student performance we must recognize that essential
intellectual abilities
are falling through the cracks of conventional testing.
ADDITIONAL READING
Archbald, D.
& Newmann, F. (1989) "The Functions of Assessment and the
Nature of Authentic Academic Achievement," in
Berlak (ed.) Assessing
Achievement: Toward the development of a New Science of
Educational
Testing. Buffalo, NY: SUNY Press.
Frederiksen,
J. & Collins, A. (1989) "A Systems Approach to Educational
Testing," Educational Researcher, 18, 9 (December).
National
Commission on Testing and Public Policy (1990) From Gatekeeper
to Gateway: Transforming Testing in America. Chestnut
Hill, MA: NCTPP,
Boston College.
Wiggins, G.
(1989) "A True Test: Toward More Authentic and Equitable
Assessment," Phi Delta Kappan, 70, 9 (May).
Wolf, D.
(1989) "Portfolio Assessment: Sampling Student Work,"
Educational Leadership 46, 7, pp. 35-39 (April).
Alternatives to Standardized Educational Assessment.
ED312773
EA021431 ERIC Digest Series Number EA 40.
Bowers, Bruce
C.
ERIC
Clearinghouse on Educational Management, Eugene, Oreg. 1989
An
American educator who was examining the British educational system
once asked a headmaster why so little standardized
testing took place in
British schools. "My dear fellow," came the
reply, "In Britain we are of
the belief that, when a child is hungry, he should be
fed, not weighed."
This anecdote suggests the complementary question:
"Why is it that we do so
much standardized testing in the United States?"
WHAT ARE THE MAIN USES OF STANDARDIZED TESTING IN
AMERICAN PUBLIC SCHOOLS?
Advocates of
standardized testing assert that it simply achieves more
efficiently and fairly many of the purposes for which
grading and other
traditional assessment procedures were designed. Even
critics of
standardized testing acknowledge that it has filled a
vacuum. As Grant
Wiggins (1989a) puts it, "Mass assessment resulted
from legitimate concern
about the failure of schools to set clear, justifiable,
and consistent
standards to which it would hold its graduates and
teachers accountable."
Standardized
testing is currently used to fulfill (1) the
administrative function of providing comparative scores
for individual
students so that placement decisions can be made; (2)
the guidance function
of indicating a student's strengths or weaknesses so
that he or she may
make appropriate decisions regarding a future course of
study; and, more
recently, (3) the accountability function of using
student scores to assess
the effectiveness of teachers, schools, and even entire
districts (Robinson
and Craver 1989).
WHAT PROBLEMS HAVE ARISEN AS A RESULT OF WIDESPREAD USE
OF STANDARDIZED
TESTING?
The phrase
"test-driven curriculum" (Livingston, Castle, and Nations
1989) captures the essence of the major controversy
surrounding
standardized testing. When test scores are used on a
comparative basis not
only to determine the educational fate of individual
students, but also to
assess the relative "quality" of teachers,
schools, and school districts,
it is no wonder that "teaching to the test" is
becoming a common practice
in our nation's schools. This would not necessarily be a
problem if
standardized tests provided a comprehensive, indepth
assessment of the
knowledge and skills that indicate mastery of a given
subject matter.
However, the main purpose of standardized testing is to
sort large numbers
of students in as efficient a manner as possible. This
limited goal, quite
naturally, gives rise to short-answer, multiple-choice
questions. When
tests are constructed in this manner, active skills,
such as writing,
speaking, acting, drawing, constructing, repairing, or
any of a number of
other skills that can and should be taught in schools
are automatically
relegated to a second-class status.
WHAT ALTERNATIVES TO STANDARDIZED TESTING HAVE BEEN
SUGGESTED?
It is
reasonable to assume that the demand for test results that can be
compared across student populations will remain strong.
The critical
question is whether such results can be obtained from
tests that attempt a
more comprehensive assessment of student abilities than
the present
standardized tests are capable of providing. An
ancillary, but equally
critical, question is whether such tests are too costly
to be widely
administered.
Suggested
alternatives are based on the concept of a
"performance-based" assessment. Depending on
the subject matter being
tested, the performance may consist of demonstrating any
of the active
skills mentioned above. For example, in the area of
writing, drawing, or
any of the "artistic expression" skills, it
has been suggested that a
"portfolio assessment," involving the ongoing evaluation
of a cumulative
collection of creative works, is the best approach (Wolf
1989). For
subjects that require the organization of facts and
theories into an
integrated and persuasive whole (for example, sciences
and social
sciences), an assessment modelled after the oral defense
required of
doctoral candidates has been suggested (Wiggins 1989a).
A third
approach, which might be termed the "problem solving model,"
can be adapted to almost any knowledge-based discipline.
It involves the
presentation of a problematic scenario that can be
resolved only through
the application of certain major principles (theories,
formulae) that are
central to the discipline under examination (Archbald
and Newmann 1988).
CAN PERFORMANCE-BASED ASSESSMENTS BE USED TO COMPARE
STUDENTS ACROSS
DIFFERENT SETTINGS?
Performance-based assessment is more easily scored using a
criterion-referenced, rather than a norm-referenced
approach. Instead of
placing a student's score along a normal distribution of
scores from
students all taking the same test, a
criterion-referenced approach focuses
on whether a student's performance meets a criterion
level, normally
reflecting mastery of the skills being tested.
How can such
an assessment be reliably compared to similar assessments
made by other teachers in other settings? It has been
suggested that
American educators adopt the "exemplary
system" being called for in Great
Britain. In this system, teachers involved in scoring
meet regularly "to
compare and balance results on their own and national
tests" (Wiggins
1989b), thus increasing reliability across settings.
Clearly, however, such
an approach (similar to the approach currently in use
for the scoring of
Advanced Placement essay exams) could be prohibitively
expensive if carried
out on a large scale. A key question is whether the
costs associated with
this labor intensive scoring system would be offset by
the presumed
instructional gains obtained from an assessment model
that rewarded a more
thorough and holistic approach to instruction.
HAVE THERE BEEN ANY STATEWIDE EFFORTS TO PROVIDE
ALTERNATIVES TO
STANDARDIZED TESTING?
California
has probably made the greatest effort in this direction,
beginning in 1987 with its statewide writing test and
continuing with its
current development of performance-based assessment in
science and history
(Massey 1989). The Connecticut Assessment of Educational
Progress Program
uses a variety of performance tasks in its assessment of
science, foreign
languages, and business education (Baron 1989).
(However, this assessment
includes only a sample of students at any given grade
level, and, in
addition, every year there is change in the subjects for
which performance
tasks are required.) Vermont education officials are
currently seeking
legislative approval for funds to pursue a portfolio
assessment approach in
addition to the current standardized testing (Massey
1989).
WHAT IS THE PROGNOSIS FOR A GENERAL SHIFT AWAY FROM
STANDARDIZED TESTING
AND TOWARD PERFORMANCE-BASED TESTING?
In
psychometric terms, the tradeoff in such a shift is to sacrifice
reliability for validity. That is, performance-based
tests do not lend
themselves to a cost- and time-efficient method of
scoring that, in
addition, provides reliable results. On the other hand,
they actually test
what the educational system is presumably responsible
for teaching, namely,
the skills prerequisite for performing in the real
world. The additional
costs involved in producing reliable results across
different settings for
performance-based tests are unknown.
The question
is whether a majority of educators will echo the
sentiments of George Madaus, director of the Center for
the Study of
Testing, Evaluation, and Educational Policy, who
believes that
performance-based testing "is not efficient; it's
expensive; it doesn't
lend itself to mass testing with quick turnaround
time--but it's the way to
go" (Brandt 1989).
RESOURCES
Archbald,
Doug A., and Fred M. Newmann. "Beyond Standardized Testing:
Assessing Authentic Academic Achievement in the
Secondary School." Reston,
VA: National Association of Secondary School Principals,
1988. 65 pages. ED
301 587.
Baron, Joan
B. "Performance Testing in Connecticut." EDUCATIONAL
LEADERSHIP 46, 7 (April 1989): 8. EJ 387 136.
Brandt, Ron.
"On Misuse of Testing: A Conversation with George Madaus."
EDUCATIONAL LEADERSHIP 46, 7 (April 1989): 26-29. EJ 387
140.
Livingston,
Carol, Sharon Castle, and Jimmy Nations. "Testing and
Curriculum Reform: One School's Experience."
EDUCATIONAL LEADERSHIP 46, 7
(April 1989): 23-25. EJ 387 139.
Massey, Mary.
"States Move to Improve Assessment Picture." ACSD UPDATE
31, 2 (March 1989): 7.
Ralph, John,
and M. Christine Dwyer. "Making the Case: Evidence of
Program Effectiveness in Schools and Classrooms."
Washington, D.C.: U.S.
Department of Education, Office of Educational Research
and Improvement,
November 1988. 54 pages.
Robinson,
Glen E., and James M. Craver. "Assessing and Grading Student
Achievement." Arlington, VA: Educational Research
Service, 1989. 198 pages.
Wiggins,
Grant. "A True Test: Toward More Authentic and Equitable
Assessment." PHI DELTA KAPPAN 70, 9 (May
1989a):703-13. EJ 388 723.
Wiggins,
Grant. "Teaching to the (Authentic) Test." EDUCATIONAL
LEADERSHIP 46, 7 (April 1989b): 41-47.
Wolf, Dennie
P. "Portfolio Assessment: Sampling Student Work."
EDUCATIONAL LEADERSHIP 46, 7 (April 1989): 35-39. EJ 387
143. -----