Authentic Assessment Resources


the ERIC Clearinghouse on Assessment and Evaluation




Recommendations for Teachers


Teachers who have begun to use alternative assessment in their classrooms

are good sources for ideas and guidance. The following recommendations were

made by teachers in Virginia after they spent six months developing and

implementing alternative assessment activities in their classrooms.


 1.       Start small. Follow someone else's example in the beginning, or

          do one activity in combination with a traditional test.


 2.       Develop clear rubrics. Realize that developing an effective

          rubric (rating scale with several categories) for judging student

          products and performances is harder than carrying out the

          activity. Standards and expectations must be clear. Benchmarks

          for levels of performance are essential. Characteristics of

          typical student products and performances may be used to generate

          performance assessment rubrics and standards for the class.


 3.       Expect to use more time at first. Developing and evaluating

          alternative assessments and their rubrics requires additional

          time until you and your students become comfortable with the



 4.       Adapt existing curriculum. Plan assessment as you plan

          instruction, not as an afterthought.


 5.       Have a partner. Sharing ideas and experiences with a colleague is

          beneficial to teachers and to students.


 6.       Make a collection. Look for examples of alternative assessments

          or activities that could be modified for your students and keep a

          file readily accessible.


 7.       Assign a high value (grade) to the assessment. Students need to

          see the experience as being important and worth their time. Make

          expectations clear in advance.


 8.       Expect to learn by trial and error. Be willing to take risks and

          learn from mistakes, just as we expect students to do. The best

          assessments are developed over time and with repeated use.


 9.       Try peer assessment activities. Relieve yourself of some grading

          responsibilities and increase student evaluation skills and

          accountability by involving them in administering assessments.


10.       Don't give up. If the first tries are not as successful as you

          had hoped, remember, this is new to the students, too. They can

          help you refine the process. Once you have tried an alternative

          assessment, reflect and evaluate the activities. Ask yourself

          some questions. What worked?  What needs modification?  What

          would I do differently?  Would I use this activity again?  How

          did the students respond?  Did the end results justify the time

          spent?  Did students learn from the activity?


                                                  Virginia Education Association and the

                                                  Appalachia Educational Laboratory (1992)





Prepared by Lawrence Rudner, ERIC/AE and Carol Boston, ACCESS ERIC


So, what's all the hoopla about? Federal commissions have endorsed

performance assessment.  It's been discussed on C-SPAN and in a number of

books and articles.  Full issues of major education journals, including

Educational Leadership (April 1989 and May 1992) and Phi Delta Kappan

(February 1993), have been devoted to performance assessment.  A

surprisingly large number of organizations are actively involved in

developing components of a performance assessment system.  Chances are good

that one or more of your professional associations is in the middle of

debating goals and standards right now.


Is this just the latest bandwagon?  Another short-term fix?  Probably not.

The performance assessment movement encompasses much more than a technology

for testing students.  It requires examining the purposes of education,

identifying skills we want students to master, and empowering teachers.

Even without an assessment component, these activities can only be good for

education.  You can be certain they will have an impact on classrooms.


This article describes performance assessments, weighs their advantages and

disadvantages as instructional tools and accountability measures, and

offers suggestions to teachers and administrators who want to use

performance assessments to improve teaching and learning. 


Key Features of Performance Assessment


The Office of Technology Assessment (OTA) of the U.S. Congress (1992)

provides a simple, yet insightful, definition of performance assessment:


          testing that requires a student to create an answer or a product

          that demonstrates his or her knowledge or skills.


A wide variety of assessment techniques fall within this broad definition.

Several are described in Table 1.  One key feature of all performance

assessments is that they require students to be active participants.

Rather than choosing from presented options, as in traditional multiple-

choice tests, students are responsible for creating or constructing their

responses.  These may vary in complexity from writing short answers or

essays to designing and conducting experiments or creating comprehensive

portfolios.  It is important to note that proponents of "authentic

assessment" make distinctions among the various types of performance

assessments, preferring those that have meaning and value in themselves to

those that are meaningful primarily in an academic context.  In a chemistry

class, for example, students might be asked to identify the chemical

composition of a premixed solution by applying tests for various

properties, or they might take samples from local lakes and rivers and

identify pollutants.  Both assessments would be performance-based, but the

one involving the real-world problem would be considered more authentic.   



Testing has traditionally focused on whether students get the right

answers; how they arrive at their answers has been considered important

only during the test development.  When students take a standardized

mathematics test, for example, there is no way to distinguish among those

who select the correct answer because they truly understand the problem,

those who understand the problem but make a careless calculation mistake,

and those who have no idea how to do the work but simply guess correctly.

Performance assessments, on the other hand, require students to demonstrate

knowledge or skills; therefore, the process by which they solve problems

becomes important.  To illustrate, if high school juniors are asked to

demonstrate their understanding of interest rates by comparison shopping

for a used-car loan and identifying the best deal in a report, a teacher

can easily see if they understand the concept of interest, know how to

calculate it, and perform mathematical operations accurately.


In performance assessment, items directly reflect intended outcomes.

Whereas a traditional test might ask students about grammar rules, a

performance assessment would have them demonstrate their understanding of

English grammar by editing a poorly written passage.  A traditional auto

mechanics test might include questions about a front-end alignment; a

performance assessment would have students do one.


Performance assessments can also measure skills that have not traditionally

been measured in large groups of students skills such as integrating

knowledge across disciplines, contributing to the work of a group, and

developing a plan of action when confronted with a novel situation.  Grant

Wiggins (1990) captures their potential nicely:


          Do we want to evaluate student problem-posing and problem-solving

          in mathematics?  Experimental research in science?  Speaking,

          listening, and facilitating a discussion?  Doing document-based

          historical inquiry?  Thoroughly revising a piece of imaginative

          writing until it `works' for the reader?  Then let our assessment

          be built out of such exemplary intellectual challenges.


What's Wrong With the Way We've Been Doing It?


Many tests used in state and local assessments, as well as the Scholastic

Aptitude Test and the National Assessment of Educational Progress, have

been criticized for failing to provide the information we need about

students and their ability to meet specific curricular objectives.  Critics

contend that these tests, as currently formulated, often assess only a

narrow range of the curriculum; focus on aptitudes, not specific curriculum

objectives; and emphasize minimum competencies, thus creating little

incentive for students to excel.  Further, they yield results that are

analyzed primarily on the national, state, and district levels rather than

used to improve the performance of individual pupils or schools.

The true measure of performance assessment must, however, lie in its

ability to assess desired skills, not in the alleged inability of other

forms of assessment. 


Here We Go Again?


You might ask, "Is performance assessment really new?" Good classroom

teachers have used projects and portfolios for years, preparing numerous

activities requiring students to blend skills and insights across

disciplines.  Performance assessment has been particularly common in

vocational education, the military, and business.  ERIC has used

"performance tests" as a descriptor since 1966.


What is new is the widespread interest in the potential of performance

assessment.  Many superintendents, state legislators, governors, and

Washington officials see high-stakes performance tests as a means to

motivate students to learn and schools to teach concepts and skills that

are more in line with today's expectations.  This perspective will be

called the motivator viewpoint in this article.  Many researchers,

curriculum specialists, and teachers, on the other hand, see performance

assessment as empowering teachers by providing them with better

instructional tools and a new emphasis on teaching more relevant skills, a

perspective that will be referred to as the empowerment viewpoint here.

Proponents of both viewpoints agree on the need to change assessment

methods but differ in their views about how assessment information should

be used.


On the Value of Performance Assessments


Advocates of the motivator and empowerment viewpoints concur that

performance assessments can form a solid foundation for improving schools

and increasing what students know and can do.  However, the two groups

frame the advantages differently.  Their positions are sketched here

briefly, then developed more fully in the sections that follow.


The motivators emphasize that performance-based assessments, if instituted

on a district, state, or national level, will allow us to monitor the

effectiveness of schools and teachers and track students' progress toward

achieving national educational goals (see "Standards, Assessments, and the

National Education Goals" on pp. X X).  According to the motivator

viewpoint, performance assessments will make the educational system more

accountable for results.  Proponents expect them to do the following:


     ˛    prompt schools to focus on important, performance-based outcomes;

     ˛    provide sound data on achievement, not just aptitude;

     ˛    allow valid comparisons among schools, districts, and states; and

     ˛    yield results for every important level of the education system,

          from individual children to the nation as a whole.


Those in the empowerment camp, on the other hand, tend to focus on how

performance assessments will improve teaching and learning at the classroom

level.  Instructional objectives in most subject areas are being redefined

to include more practical applications and more emphasis on synthesis and

integration of content and skills.  Performance assessments that are

closely tied to this new curriculum can give teachers license to emphasize

important skills that traditionally have not been measured.  Performance

assessments can also provide teachers with diagnostic information to help

guide instruction.  The outcomes-based education (OBE) movement supports

instructional activities closely tied to performance assessment tasks.

Under OBE, students who do not demonstrate the level of accomplishment

their local communities and school districts have agreed upon receive

additional instruction to bring them up to the level.   


High-Stakes Performance Assessments as Motivators


One of the most historic events concerning education occurred in September

1989, when President George Bush held an education summit in

Charlottesville, Virginia, with the nation's governors.  Together, the

participants hammered out six far-reaching national education goals,

effectively acknowledging that education issues transcend state and local

levels to affect the democratic and economic foundations of the entire

country.  In a closing statement, participants announced,


     We unanimously agree that there is a need for the first time in this

     nation's history to have specific results-oriented goals.  We

     recognize the need for ... accountability for outcome-related results.


Consensus is now building among state legislators, governors, members of

Congress, Washington officials, and the general public regarding the

desirability and feasibility of some sort of voluntary national assessment

system linked with high national standards in such subject areas as

mathematics, science, English, history, geography, and the arts.  A number

of professional organizations have received funding to coordinate the

development of such standards (see sidebar on p. X).  The groundbreaking

work of the National Council of Teachers of Mathematics (NCTM) serves as a

model for this process:  NCTM developed its Standards in CB: date and is

now developing curriculum frameworks and assessment guidelines to match it

(see "From Standards to Assessment" on p. X).


The National Council on Education Standards and Testing (NCEST), an

advisory group formed by Congress and the President in response to national

and state interest in national standards and assessments, describes the

motivational effect of a national system of assessments in its 1992 report,

Raising Standards for American Education:


     National standards and a system of assessments are

     desirable and feasible mechanisms for raising

     expectations, revitalizing instruction, and

     rejuvenating educational reform efforts for all

     American schools and students (p. 8).


Envision, if you will, the enormous potential of an assessment that

perfectly and equitably measures the right skills.  NCEST believes that

developing standards and high-quality assessments has "the potential to

raise learning expectations at all levels of education, better target human

and fiscal resources for educational improvement, and help meet the needs

of an increasingly mobile population".  This is a shared vision.  At least

a half-dozen groups have begun calling for a national assessment system or

developing instrumentation during the past two years (see Calls for New



According to NCEST, student standards must be "world class" and include the

"specification of content what students should know and be able to do and

the level of performance students are expected to attain" (p. 3).  NCEST

envisions standards that include substantative content together with

complex problem-solving and higher-order thinking skills.  Such standards

would reflect "high expectations not expectations of minimal competency"

(p. 13).  NCEST believes in the motivation potential of these world-class

standards, stating that they will "raise the ceiling for students who are

currently above average" and "lift the floor for those students who now

experience the least success in school" (p. 4).


Acknowledging that tests tend to influence curriculum, NCEST suggests that

assessments should be developed to reflect the new high standards.  Such

assessments would not be immediately associated with high stakes.  However,

once issues of validity, reliability, and fairness have been resolved,

these assessments "could be used for such high-stakes purposes as high

school graduation, college admission, continuing education, or

certification for employment.  Assessments could also be used by states and

localities as the basis for system accountability" (p. 27).


The U.S. already has one national assessment in place, the National

Assessment of Educational Progress (NAEP).  Since 1969, the U.S. Department

of Education-sponsored NAEP has been used to assess what our nation's

children know in a variety of curriculum areas, including mathematics,

reading, science, writing, U.S. history, and geography.  Historically, NAEP

has been a multiple-choice test administered to random samples of fourth-,

eighth-, and twelfth-graders in order to report on the educational progress

of our nation as a whole.  As interest in accountability has grown, NAEP

has begun to conduct trial state-level assessments.  NAEP is also

increasing the number of performance-based tasks to better reflect what

students can do (see "Performance-Based Aspects of the National Assessment

of Educational Progress" on p. x).  The National Council on Education

Standards and Testing envisions that large-scale sample assessments such as

NAEP will be one component of a national system of assessments, to be

coupled with assessments that can provide results for individual students.


Supporters argue that a system of national assessments would improve

education by giving parents and students more accurate, relevant, and

comparable data and encouraging students to strive for world-class

standards of achievement.  Critics of a national assessment system are

equally visible.  The National Education Association and other professional

associations have argued that high-stakes national assessments will not

improve schooling and could easily be harmful.  They are particularly

concerned that students with disabilities, students whose native language

is not English, and students and teachers attending schools with minimal

resources will be penalized under such a system.  Fearing that a national

assessment system might not be a good model and could short-circuit current

reform efforts, The National Center for Fair and Open Testing, or FairTest,

testified that the federal dollars would be better spent in support of

state efforts. 


Performance Assessment for Teacher Empowerment


An enormous amount of activity is taking place in the area of establishing

national standards and a system of assessments.  The assessments are

expected to encompass performance-based tasks that call on students to

demonstrate what they can do.  They may well have strong accountability

features and be used eventually to make high-stakes decisions.  Should

building principals and classroom teachers get excited about performance

assessment now?  Absolutely.  Viewed in its larger context, performance

assessment can play an important part in the school reform/restructuring



     Performance assessment can be seen as a lever to

     promote the changes needed for the assessment to be

     maximally useful.  Among these changes are a

     redefinition of learning and a different conception of

     the place of assessment in the education process

     (Mitchell, 1992).


     In order to implement performance assessment fully, administrators and

teachers must have a clear picture of the skills they want students to

master and a coherent plan for how students are going to master those

skills.  They need to consider how students learn and what instructional

strategies are most likely to be effective.  Finally, they need to be

flexible in using assessment information for diagnostic purposes to help

individual students achieve.  This level of reflection is consistent with

the best practices in education.  As Joan Herman, Pamela Aschbacher, and

Lynn Winters note in their important book, A Practical Guide to Alternative

Assessment (1992),  


     No longer is learning thought to be a one-way

     transmission from teacher to students, with the teacher

     as lecturer and the students as passive receptacles.

     Rather, meaningful instruction engages students

     actively in the learning process.  Good teachers draw

     on and synthesize discipline-based knowledge, knowledge

     of student learning, and knowledge of child

     development.  They use a variety of instructional

     strategies, from direct instruction to coaching, to

     involve their students in meaningful activities . . .

     and to achieve specific learning goals (p. 12).


     Quality performance assessment is a key part of this vision because "good

teachers constantly assess how their students are doing, gather evidence of

problems and progress, and adjust their instructional plans accordingly"

(p. 12).


Properly implemented, performance assessment offers an opportunity to align

curriculum and teaching efforts with the important skills we wish children

to master.  Cognitive learning theory, which emphasizes that knowledge is

constructed and that learners vary, provides some insight into what an

aligned curriculum might look like (see Implications from Learning Theory).




The Case for Authentic Assessment.

ERIC Digest. ED328611  TM016142


  Wiggins, Grant

  American Institutes for Research, Washington, DC.; ERIC Clearinghouse on

Tests, Measurement, and Evaluation, Washington, DC.  Dec 1990


    Mr. Wiggins, a researcher and consultant on school reform issues, is a

widely-known advocate of authentic assessment in education. This digest is

based on materials that he prepared for the California Assessment Program.



    Assessment is authentic when we directly examine student performance on

worthy intellectual tasks. Traditional assessment, by contract, relies on

indirect or proxy 'items'--efficient, simplistic substitutes from which we

think valid inferences can be made about the student's performance at those

valued challenges.

    Do we want to evaluate student problem-posing and problem-solving in

mathematics? experimental research in science? speaking, listening, and

facilitating a discussion? doing document-based historical inquiry?

thoroughly revising a piece of imaginative writing until it "works" for the

reader? Then let our assessment be built out of such exemplary intellectual


    Further comparisons with traditional standardized tests will help to

clarify what "authenticity" means when considering assessment design and


    * Authentic assessments require students to be effective performers

with acquired knowledge. Traditional tests tend to reveal only whether the

student can recognize, recall or "plug in" what was learned out of context.

This may be as problematic as inferring driving or teaching ability from

written tests alone. (Note, therefore, that the debate is not "either-or":

there may well be virtue in an array of local and state assessment

instruments as befits the purpose of the measurement.)

    * Authentic assessments present the student with the full array of

tasks that mirror the priorities and challenges found in the best

instructional activities: conducting research; writing, revising and

discussing papers; providing an engaging oral analysis of a recent

political event; collaborating with others on a debate, etc. Conventional

tests are usually limited to paper-and-pencil, one- answer questions.

    * Authentic assessments attend to whether the student can craft

polished, thorough and justifiable answers, performances or products.

Conventional tests typically only ask the student to select or write

correct responses--irrespective of reasons. (There is rarely an adequate

opportunity to plan, revise and substantiate responses on typical tests,

even when there are open-ended questions). As a result,

    * Authentic assessment achieves validity and reliability by emphasizing

and standardizing the appropriate criteria for scoring such (varied)

products; traditional testing standardizes objective "items" and, hence,

the (one) right answer for each.

    * "Test validity" should depend in part upon whether the test simulates

real-world "tests" of ability. Validity on most multiple-choice tests is

determined merely by matching items to the curriculum content (or through

sophisticated correlations with other test results).

    * Authentic tasks involve "ill-structured" challenges and roles that

help students rehearse for the complex ambiguities of the "game" of adult

and professional life. Traditional tests are more like drills, assessing

static and too-often arbitrarily discrete or simplistic elements of those


    Beyond these technical considerations the move to reform assessment is

based upon the premise that assessment should primarily support the needs

of learners. Thus, secretive tests composed of proxy items and scores that

have no obvious meaning or usefulness undermine teachers' ability to

improve instruction and students' ability to improve their performance. We

rehearse for and teach to authentic tests--think of music and military

training--without compromising validity.

    The best tests always teach students and teachers alike the kind of

work that most matters; they are enabling and forward-looking, not just

reflective of prior teaching. In many colleges and all professional

settings the essential challenges are known in advance--the upcoming

report, recital, Board presentation, legal case, book to write, etc.

Traditional tests, by requiring complete secrecy for their validity, make

it difficult for teachers and students to rehearse and gain the confidence

that comes from knowing their performance obligations. (A known challenge

also makes it possible to hold all students to higher standards).



    While multiple-choice tests can be valid indicators or predictors of

academic performance, too often our tests mislead students and teachers

about the kinds of work that should be mastered. Norms are not standards;

items are not real problems; right answers are not rationales.

    What most defenders of traditional tests fail to see is that it is the

form, not the content of the test that is harmful to learning;

demonstrations of the technical validity of standardized tests should not

be the issue in the assessment reform debate. Students come to believe that

learning is cramming; teachers come to believe that tests are

after-the-fact, imposed nuisances composed of contrived

questions--irrelevant to their intent and success. Both parties are led to

believe that right answers matter more than habits of mind and the

justification of one's approach and results.

    A move toward more authentic tasks and outcomes thus improves teaching

and learning: students have greater clarity about their obligations (and

are asked to master more engaging tasks), and teachers can come to believe

that assessment results are both meaningful and useful for improving


    If our aim is merely to monitor performance then conventional testing

is probably adequate. If our aim is to improve performance across the board

then the tests must be composed of exemplary tasks, criteria and standards.



    The costs are deceptive: while the scoring of judgment-based tasks

seems expensive when compared to multiple-choice tests (about $2 per

student vs. 1 cent) the gains to teacher professional development, local

assessing, and student learning are many. As states like California and New

York have found (with their writing and hands-on science tests) significant

improvements occur locally in the teaching and assessing of writing and

science when teachers become involved and invested in the scoring process.

    If costs prove prohibitive, sampling may well be the appropriate

response--the strategy employed in California, Vermont and Connecticut in

their new performance and portfolio assessment projects. Whether through a

sampling of many writing genres, where each student gets one prompt only;

or through sampling a small number of all student papers and school-wide

portfolios; or through assessing only a small sample of students, valuable

information is gained at a minimum cost.

    And what have we gained by failing to adequately assess all the

capacities and outcomes we profess to value simply because it is

time-consuming, expensive, or labor-intensive? Most other countries

routinely ask students to respond orally and in writing on their major

tests--the same countries that outperform us on international comparisons.

Money, time and training are routinely set aside to insure that assessment

is of high quality. They also correctly assume that high standards depend

on the quality of day-to-day local assessment--further offsetting the

apparent high cost of training teachers to score student work in regional

or national assessments.




    We forget that numerous state and national testing programs with a high

degree of credibility and integrity have for many years operated using

human judges:

    * the New York Regents exams, parts of which have included essay

questions since their inception--and which are scored locally (while

audited by the state);

    * the Advanced Placement program which uses open-ended questions and

tasks, including not only essays on most tests but the performance-based

tests in the Art Portfolio and Foreign Language exams;

    * state-wide writing assessments in two dozen states where model

papers, training of readers, papers read "blind" and procedures to prevent

bias and drift gain adequate reliability;

    * the National Assessment of Educational Progress (NAEP), the

Congressionally-mandated assessment, uses numerous open-ended test

questions and writing prompts (and successfully piloted a hands-on test of

science performance);

    * newly-mandated performance-based and portfolio-based state-wide

testing in Arizona, California, Connecticut, Kentucky, Maryland, and New


    Though the scoring of standardized tests is not subject to significant

error, the procedure by which items are chosen, and the manner in which

norms or cut-scores are established is often quite subjective--and

typically immune from public scrutiny and oversight.

    Genuine accountability does not avoid human judgment. We monitor and

improve judgment through training sessions, model performances used as

exemplars, audit and oversight policies as well as through such basic

procedures as having disinterested judges review student work "blind" to

the name or experience of the student--as occurs routinely throughout the

professional, athletic and artistic worlds in the judging of performance.

    Authentic assessment also has the advantage of providing parents and

community members with directly observable products and understandable

evidence concerning their students' performance; the quality of student

work is more discernible to laypersons than when we must rely on

translations of talk about stanines and renorming.

    Ultimately, as the researcher Lauren Resnick has put it, What you

assess is what you get; if you don't test it you won't get it. To improve

student performance we must recognize that essential intellectual abilities

are falling through the cracks of conventional testing.




    Archbald, D. & Newmann, F. (1989) "The Functions of Assessment and the

Nature of Authentic Academic Achievement," in Berlak (ed.) Assessing

Achievement: Toward the development of a New Science of Educational

Testing. Buffalo, NY: SUNY Press.


    Frederiksen, J. & Collins, A. (1989) "A Systems Approach to Educational

Testing," Educational Researcher, 18, 9 (December).


    National Commission on Testing and Public Policy (1990) From Gatekeeper

to Gateway: Transforming Testing in America. Chestnut Hill, MA: NCTPP,

Boston College.


    Wiggins, G. (1989) "A True Test: Toward More Authentic and Equitable

Assessment," Phi Delta Kappan, 70, 9 (May).


    Wolf, D. (1989) "Portfolio Assessment: Sampling Student Work,"

Educational Leadership 46, 7, pp. 35-39 (April).


Alternatives to Standardized Educational Assessment.

ED312773  EA021431 ERIC Digest Series Number EA 40.


  Bowers, Bruce C.

  ERIC Clearinghouse on Educational Management, Eugene, Oreg.  1989

       An American educator who was examining the British educational system

once asked a headmaster why so little standardized testing took place in

British schools. "My dear fellow," came the reply, "In Britain we are of

the belief that, when a child is hungry, he should be fed, not weighed." 

This anecdote suggests the complementary question: "Why is it that we do so

much standardized testing in the United States?"



    Advocates of standardized testing assert that it simply achieves more

efficiently and fairly many of the purposes for which grading and other

traditional assessment procedures were designed. Even critics of

standardized testing acknowledge that it has filled a vacuum. As Grant

Wiggins (1989a) puts it, "Mass assessment resulted from legitimate concern

about the failure of schools to set clear, justifiable, and consistent

standards to which it would hold its graduates and teachers accountable."

    Standardized testing is currently used to fulfill (1) the

administrative function of providing comparative scores for individual

students so that placement decisions can be made; (2) the guidance function

of indicating a student's strengths or weaknesses so that he or she may

make appropriate decisions regarding a future course of study; and, more

recently, (3) the accountability function of using student scores to assess

the effectiveness of teachers, schools, and even entire districts (Robinson

and Craver 1989).




    The phrase "test-driven curriculum" (Livingston, Castle, and Nations

1989) captures the essence of the major controversy surrounding

standardized testing. When test scores are used on a comparative basis not

only to determine the educational fate of individual students, but also to

assess the relative "quality" of teachers, schools, and school districts,

it is no wonder that "teaching to the test" is becoming a common practice

in our nation's schools. This would not necessarily be a problem if

standardized tests provided a comprehensive, indepth assessment of the

knowledge and skills that indicate mastery of a given subject matter.

However, the main purpose of standardized testing is to sort large numbers

of students in as efficient a manner as possible. This limited goal, quite

naturally, gives rise to short-answer, multiple-choice questions. When

tests are constructed in this manner, active skills, such as writing,

speaking, acting, drawing, constructing, repairing, or any of a number of

other skills that can and should be taught in schools are automatically

relegated to a second-class status.



    It is reasonable to assume that the demand for test results that can be


compared across student populations will remain strong. The critical

question is whether such results can be obtained from tests that attempt a

more comprehensive assessment of student abilities than the present

standardized tests are capable of providing. An ancillary, but equally

critical, question is whether such tests are too costly to be widely


    Suggested alternatives are based on the concept of a

"performance-based" assessment. Depending on the subject matter being

tested, the performance may consist of demonstrating any of the active

skills mentioned above. For example, in the area of writing, drawing, or

any of the "artistic expression" skills, it has been suggested that a

"portfolio assessment," involving the ongoing evaluation of a cumulative

collection of creative works, is the best approach (Wolf 1989). For

subjects that require the organization of facts and theories into an

integrated and persuasive whole (for example, sciences and social

sciences), an assessment modelled after the oral defense required of

doctoral candidates has been suggested (Wiggins 1989a).

    A third approach, which might be termed the "problem solving model,"

can be adapted to almost any knowledge-based discipline. It involves the

presentation of a problematic scenario that can be resolved only through

the application of certain major principles (theories, formulae) that are

central to the discipline under examination (Archbald and Newmann 1988).




    Performance-based assessment is more easily scored using a

criterion-referenced, rather than a norm-referenced approach. Instead of

placing a student's score along a normal distribution of scores from

students all taking the same test, a criterion-referenced approach focuses

on whether a student's performance meets a criterion level, normally

reflecting mastery of the skills being tested.

    How can such an assessment be reliably compared to similar assessments

made by other teachers in other settings? It has been suggested that

American educators adopt the "exemplary system" being called for in Great

Britain. In this system, teachers involved in scoring meet regularly "to

compare and balance results on their own and national tests" (Wiggins

1989b), thus increasing reliability across settings. Clearly, however, such

an approach (similar to the approach currently in use for the scoring of

Advanced Placement essay exams) could be prohibitively expensive if carried

out on a large scale. A key question is whether the costs associated with

this labor intensive scoring system would be offset by the presumed

instructional gains obtained from an assessment model that rewarded a more

thorough and holistic approach to instruction.




    California has probably made the greatest effort in this direction,

beginning in 1987 with its statewide writing test and continuing with its

current development of performance-based assessment in science and history

(Massey 1989). The Connecticut Assessment of Educational Progress Program

uses a variety of performance tasks in its assessment of science, foreign

languages, and business education (Baron 1989). (However, this assessment

includes only a sample of students at any given grade level, and, in

addition, every year there is change in the subjects for which performance

tasks are required.) Vermont education officials are currently seeking

legislative approval for funds to pursue a portfolio assessment approach in

addition to the current standardized testing (Massey 1989).




    In psychometric terms, the tradeoff in such a shift is to sacrifice

reliability for validity. That is, performance-based tests do not lend

themselves to a cost- and time-efficient method of scoring that, in

addition, provides reliable results. On the other hand, they actually test

what the educational system is presumably responsible for teaching, namely,

the skills prerequisite for performing in the real world. The additional

costs involved in producing reliable results across different settings for

performance-based tests are unknown.

    The question is whether a majority of educators will echo the

sentiments of George Madaus, director of the Center for the Study of

Testing, Evaluation, and Educational Policy, who believes that

performance-based testing "is not efficient; it's expensive; it doesn't

lend itself to mass testing with quick turnaround time--but it's the way to

go" (Brandt 1989).



    Archbald, Doug A., and Fred M. Newmann. "Beyond Standardized Testing:

Assessing Authentic Academic Achievement in the Secondary School." Reston,

VA: National Association of Secondary School Principals, 1988. 65 pages. ED

301 587.


    Baron, Joan B. "Performance Testing in Connecticut." EDUCATIONAL

LEADERSHIP 46, 7 (April 1989): 8. EJ 387 136.


    Brandt, Ron. "On Misuse of Testing: A Conversation with George Madaus."

EDUCATIONAL LEADERSHIP 46, 7 (April 1989): 26-29. EJ 387 140.


    Livingston, Carol, Sharon Castle, and Jimmy Nations. "Testing and

Curriculum Reform: One School's Experience." EDUCATIONAL LEADERSHIP 46, 7

(April 1989): 23-25. EJ 387 139.


    Massey, Mary. "States Move to Improve Assessment Picture." ACSD UPDATE

31, 2 (March 1989): 7.


    Ralph, John, and M. Christine Dwyer. "Making the Case: Evidence of

Program Effectiveness in Schools and Classrooms." Washington, D.C.: U.S.

Department of Education, Office of Educational Research and Improvement,

November 1988. 54 pages.


    Robinson, Glen E., and James M. Craver. "Assessing and Grading Student

Achievement." Arlington, VA: Educational Research Service, 1989. 198 pages.


    Wiggins, Grant. "A True Test: Toward More Authentic and Equitable

Assessment." PHI DELTA KAPPAN 70, 9 (May 1989a):703-13. EJ 388 723.


    Wiggins, Grant. "Teaching to the (Authentic) Test." EDUCATIONAL

LEADERSHIP 46, 7 (April 1989b): 41-47.


    Wolf, Dennie P. "Portfolio Assessment: Sampling Student Work."

EDUCATIONAL LEADERSHIP 46, 7 (April 1989): 35-39. EJ 387 143. -----