Performance Assessment#

While there are numerous definitions of performance assessment, Eisner (1999, p. 659) suggests that performance assessment is

aimed at moving away from testing practices that require students to select the single correct answer from an array of four or five distracters to a practice that requires students to create evidence through performance that will enable assessors to make valid judgments about “what they know and can do” in situations that matter.

Brady and Kennedy (2009, p. 61) suggests that regardless of the definition used, performance assessment is most consistent with a constructivist perspective of learning. Performance assessment is an example of authentic assessment, and as a result the characteristics of authentic assessment described in Table 3 can be applied to performance assessment. Eisner (1999, p. 659) suggests that performance assessment “is the most important development in evaluation since the invention of the short-answer test and its extensive use during World War 1”.

Performance assessment can also be seen as:

Assessing process and product.

Rather than simply examining the quality of an end product, performance assessment can also be used to evaluate student engagement during the process of creation.
Ill-suited, or even incapable, of fulfilling the comparative or accountability function required by many assessment stakeholders.

Performance assessment tends to focus on the assessment of the individual and consequently may not provide information about a class or school useful for comparison against other cohorts (Eisner, 1999). The increase in validity offered by authentic assessment comes with the cost of decreased reliability (Montgomery, 2002, p. 36). Performance-based tests do not generalise well across tasks or situations (Swanson et al., 1995, p. 11).
Creating uncertainty and difficulty.

A simple example of the uncertainty and difficulty created are the problems faced by authors (e.g. Brady & Kennedy, 2009; Eisner, 1999) in trying to define performance assessment. Getting acceptance of performance assessment from all stakeholders in the assessment activity – especially those interested in assessment for accountability and comparison purposes – is made more difficult if there isn’t a widely accepted definition. Lesh and Lamon (1994, p. 3) offer another perspective when they describe how for mathematics educators it is easier to define “what we want to move away from” (test-based assessment) and ask questions such as what is meant by real-life situations, authentic mathematics, or performance activities.
Difficult and potentially increase teacher workload.

The use of authentic assessment tasks – perhaps requiring a unique task for each student - and effectively observing student performance of those tasks are likely to have a significant workload implication for teachers. Especially when teachers are new to the concept and practice of performance assessment. Difficulty and workload issues may be one explanation for the observation by Cumming and Maxwell (1999, p. 188) that attempts at authentic tasks account to little more than a gloss over the top of existing assessment practices.
Increasing workload and possibly inequitable for students.

Montgomery (2002, p. 36) suggests that two of the concerns around authentic assessment include the limited experience and skills some students have with authentic assessment, and how the increased linguistic complexity of some authentic assessment examples may result in equity issues.

Swanson, Norman and Linn (1995) present eight lessons learned from a long history of using performance-based assessment in medicine, these lessons are:

The fact that examinees are tested in realistic performance situations does not make test design and domain sampling simple and straightforward. Sampling must consider both context (situation/task) and construct (knowledge/skill) dimensions and complex interactions are present between these dimensions
No matter how realistic a performance-based assessment is, it is still a simulation, and examinees do not behave in the same way they would in real life.
While high-fidelity performance-based assessment methods often yield rich and interesting examinee behaviour, scoring that rich and interesting behaviour can be problematic.
Regardless of the assessment method used, performance in one context, does not predict performance in other contexts very well.
Correlational studies of the relationships between performance-based test scores and other assessment methods targeting different skills typically produce variable and uninterpretable results.
Because performance-based assessment methods are often complex to administer, multiple test forms and test administrators are required to test large numbers of examinees.
All high-stakes assessments, regardless of the method used, have an unpredictable impact on teaching and learning.
Neither traditional testing nor performance-based assessment methods are a panacea. Selection of assessment methods should depend on the skills to be assessed, and, generally, use of a blend of methods is desirable.