CAN, AND SHOULD, TEACHERS BE EVALUATED
USING STUDENT LEARNING DATA?

Right now there is an energetic political press to judge teacher quality by measuring pupil performance. However, the technical and sociological problems to this change in professional practice are huge. But it is impossible to avoid the current preeminent question in teacher evaluation: Should we evaluate teachers based upon how well their students are learning? The answer is a qualified yes, school districts should begin to move to include the opportunity for pupil achievement data for some teachers in their evaluation, under six conditions:

1. The achievement information must be gains, increases, or changes in students during the influence of the teacher, adjusted according to the students' prior learning. The technical requirement is for gains adjusted for prior achievement.

2. Other data sources or information must be included in the evaluation, for example student surveys, peer review of materials, teacher test scores, administrator reports, or documentation of professional activity. The technical requirement is for multiple data sources.

3. Not all teachers in a system should be evaluated using student learning data. Individual teachers must select pupil achievement as a data source for their own individual evaluation. The technical requirement is for data sources that vary by teacher.

4. Interpretations and judgments about pupil achievement should be made by teacher-dominated panels, rather than individual building administrators or independent measurement experts. The technical requirement is for multiple judges having best objective data available, bias controlled, involvement of interested audiences, and an explicit logic of value.

5. The school district system should be installed over a 3-5 year period. This allows time for a focus on teacher quality and satisfaction, the necessary technical, sociological, and political changes, and avoidance of dangerous and unnecessary instructional disruption. The technical requirement is for time and other resources to transition to a new way of understanding teacher quality.

6. Pupil achievement data should not be tied to teacher pay. There are not enough data for all teachers, and ensuing competition would be too disruptive. Such connections of performance and compensation have been tried before and failed, and it would detract from other developments to address personnel issues using pupil data.

Are These Complicated Conditions Really Necessary?
At this time, YES! The techniques of using student learning in teacher evaluation are under researched and poorly developed. The task certainly will be more complicated than current practice, and it must be designed to accommodate many conceptual problems. The enormity of the job must not be underestimated, and there is much at stake with a successful transition. However, the use of pupil achievement data in teacher evaluation is one of the most intriguing and important questions of value today.

Should a Claim be Made That This Strategy will Improve Student Learning?

The arguments for using pupil achievement data in teacher evaluation used so far in this description are political pressure, greater technical understanding of teacher performance, and opportunities for more teacher decisions. However, the argument that teacher evaluation based even partially on pupil achievement will improve student learning is a very weak one. There may be some cases where increased attention is positive, but there is no evidence that such a system in fact will improve teacher attitudes, performance, or pupil achievement. There remains a need for advocates of pupil achievement data to show in the refereed literature instances where such practice is causally tied to better results for students and teachers.

Further Description of the Conditions for Installation

Adjusted Student Learning Gains

"The worst possible use of test data for public reporting is the presentation of simple test averages" (Sanders, 2000, p. 5). It is absolutely wrong to use only post-instruction test scores, such as grade level tests or state benchmark test scores, for any kind of teacher evaluation.

Because of the wide variability of student responsiveness to teaching, it is essential to adjust observed achievement in students according to information about students prior to the teacher effects of interest. The most defensible methods of adjusting these gains are a) pre-instruction testing, and regression adjustments of gains shown on post-instruction testing or b) value-added assessments in which gain data are adjusted by regressions on individual student records over a 3-5 year history.

Multiple Data Sources

No single data source, even well analyzed pupil achievement data, should be used by itself in the evaluation of teachers. As described by Sanders (2000), "there are too many other duties, dimensions, and responsibilities" (p. 10) of a teacher which must be measured by other techniques. We are concerned about more than just pupil achievement in judging teacher quality. There is not complete agreement on what should be counted as pupil achievement. There is variation within the ranks of teachers who show high pupil achievement, some teachers go considerably beyond with additional performance that we care about. Finally, we can obtain other important information about teacher quality, such as pupil reports, parent perceptions, and peer review of instructional materials.

Variable Data Sources

No single data source works for every teacher. Good teachers are good for different constellations of reasons. There are some excellent teachers for whom we just cannot good, defensible gain data. Teachers teach in different contexts and settings. All of these are arguments to vary data by individual teacher.

Judgments/Interpretations by Teacher-Dominated Panels

Judgments obtained from building administrators have notorious inaccuracy, susceptibility to bias, and lack of discrimination (Peterson, 2000). Also, apparently "objective" data such as adjusted gains will not give satisfactorily discriminating judgments.

The best current evidence is that we will need to rely on panel decisions that interpret results, prioritize (rank) performances, and discuss and make assignments. For example, panels can make the final distinctions about the suitability (or even comparative superiority) of teachers to certain assignments (e.g., chronically underserved student groups). Effective panels can be constituted, for example, of 4 teachers, 2 administrators, 1 parent and 1 high school student (Peterson, 1988, 2000). It is necessary that these people do not have direct relationships with the teachers they judge. This practice replaces judgment/report by an individual building administrator or distant measurement experts, which show very low defensibility.

Panels represent judgment-based decision making (Popham, 1988) rather than a numerical criterion-based (NCB) decision making. If it is argued that the NCB is more "objective," it more likely is the case that the informed, expert subjectivity is moved from the decision itself to the process for setting up the decision. That is, decisions to recognize teachers for specific performances, on specific measures, for specific reasons themselves are informed, expert subjectivity and not a special kind of objective decision making.

Finally, student achievement data require expert interpretation. The situation is much like data from a blood test. It takes a licensed physician to tell us what the blood gas, platelet count, presence/absence of chemicals, hematocrit count, and hemoglobin level all mean for each individual patient.

Why Achievement Data SHOULD Be Used in Teacher Evaluation

Achievement data should be used in teacher evaluation because they are so important in the life and work of teachers. In addition, they are called for by important stake holders. While the overall performance of teachers in this country is high (Berliner & Biddle, 1995), there can be instances where performance deviates importantly, and should be taken into account. Some teachers see pupil achievement as a proper and deserved focus for their work, and can make the technical case that it can be assessed in their case. Educators interested in preservice and inservice education need to identify effective practitioners for emulation.

One important reason to take pupil achievement into account is the current divide of educational opportunity for rich and poor students. Arguments can be made that we at least should have the opportunity to discuss the redistribution of one kind of teacher effectiveness in order to make up for this injustice. This argument stands even for those who see the divide to be an economic inequity in which educational results are just one result, rather than the educational divide being an independent phenomenon.

A final reason to include pupil achievement data is one of plain realism. Mechanisms to include pupil progress should be installed to avoid much bad evaluation practice that is likely to be foisted on the profession. It simply is time to make progress before avoidable disruption and interruption occur.

Participation of Teachers in Change, and Decision to Change

This move to include pupil achievement data requires the assent and initiative of classroom teachers (Peterson & Chenoweth, 1992). The assumption that teachers don't care about pupil achievement or can't solve the problem of documenting pupil gain is faulty. What needs to be changed are the structure, arena, and system so that teacher care and competence CAN be translated into remedies for educational challenges. Examples of this systemic change are teacher dominated panels as a replacement for administrator hegemony, and creation of teacher/administrator Evaluation Units as an absolutely necessary tool for teachers.

Berliner, D.C., & Biddle, B.J. (1995). The manufactured crisis: Myths, fraud, and the attack on America's schools. NY: Addison-Wesley.
Peterson, K.D. (1988). Reliability of panel judgments for promotion in a school teacher career ladder system. Journal of Research and Development in Education, 21 (4), 95-99.
Peterson, K.D. (2000). Teacher evaluation: A comprehensive guide to new directions and practices (2nd ed.). Thousand Oaks, CA: Corwin Press.
Peterson, K.D. & Chenoweth, T. (1992). School teachers' control and involve-ment in their own evaluation. Journal of Personnel Evaluation in Education, 6, 177-189.
Popham, W.J. (1987). The shortcomings of champagne teacher evaluations. Journal of Personnel Evaluation in Education, 1, 25-28.
Sanders, W.L. (2000). Value-added assessment from student achievement data: Opportunities and hurdles. Jason Millman Award Speech. CREATE National Evaluation Institute, San Jose, CA. July 21.