Teaching evaluations are treated with suspicion by profs and teachers, but are loved by university bureaucrats and administrators. Students may often not realize, but these evaluations, specifically the mean numbers that come out after some basic number crunching often have a huge impact on the careers and success of academics. A rather small difference between a 1.9 and 2.5 can determine whether somebody gets promoted, is hired or gets a bonus. In extreme cases, a teacher may lose a contract. So how much evidence is there that these numbers provide good evidence of teacher effectiveness? And are teachers who do get higher ratings actually those teachers that help students succeed in other courses?
A number of recent studies cast some big doubt on the usefulness of these criteria. Let's start with a fun example.
Quality of teaching or just easy and 'hot'?
James Felton, a Professor of Finance at Central Michigan University and colleagues examined the evaluations submitted to http://www.ratemyprofessors.com/. There are a couple of different criteria that students can rate their professors on. The two core areas are 'helpfulness' (how helpful and approachable a teacher is) and 'clarity' (how organized, clear and effective is a teacher). These two are averaged to get a rating of overall teaching quality. There are two more evaluations though. The first one is 'easiness', meaning how easy or difficult the classes are and how much work is needed to get an A. The second is 'hotness', a simple rating of whether a student thinks that a teacher is hot or not. Obviously, we would want to have teachers that are effective and helpful, but these perceptions should not be driven by how easy a course is or how attractive a teacher is. When looking at the data from ratings for 6,852 profs from 369 institutions... the answer is that the hotter you are and the easier your course is, the better are your evaluations. The correlation between easiness and quality is a whopping .62, whereas hotness and quality correlate .64.
They offer this explanation:
We see Quality as a function of Easiness, but it could be argued that Easiness is a function of Quality, where professors who are skilled in the classroom take difficult material and make it seem easy. We wish that were the case, but we see Quality as a function of Easiness the majority of the time for two reasons. First, as stated previously, Ratemyprofessors.com (2004) defines Easiness as the ability to get a high grade without having to work very hard. Second, professors with high Easiness scores usually have student comments regarding a light work load and high grades. Similarly, we see Quality as a function of Hotness, but it could be argued that Hotness is a function of Quality, where a brilliant professor, regardless of physical appearance, is considered sexy by his or her students. Again we wish that were the case, but most student comments point toward Quality as a function of Hotness when they focus on physical characteristics of their professors that could be captured in photographs.
The lesson that might be learned from this correlational study is that it does not hurt to dumb down your lecture content and hit the gym (well, the latter would be good regardless).
Lecture fluency or welcome back, Dr Fox...
Now, let's enter study number 2. Shana Carpenter and colleagues from Iowa State University in a study recently published showed students a short video of the same teacher presenting the same material. The major difference was that in one video the prof acted in what was called a fluent way: upright, confident, with eye contact and speaking fluently without notes. In the other condition, the prof acted disfluent: slumped, looking away, speaking haltingly and relying on notes. In two experiments, students were tested on how much they actually learned and ratings of the prof were also obtained. The results very clearly showed that the fluent prof was rated much better (surprise surprise), but also that students thought that they had learned more and would remember more from the fluent prof compared to the disfluent prof. However, when later tested, there were no differences between the two groups. This means instructor fluency increases perceptions of learning but not actual learning! There were also some curious smaller findings. For example, for the disfluent group -when given the opportunity to read the transcript of the lecture, students who spent more time rehearsing had higher test scores. This was not the case for the fluent group. It is an ambiguous finding, but could indicate that fluent lectures may decrease the attention paid to study material when preparing for an exam. Not sure whether this is desirable.
This really sounds like the famous Dr Fox effect. Talk nonsense as long as you are dynamic, engage the audience and are make jokes...
Some disturbing findings when using random assignment of students to profs
The most concerning study though used a controlled random assignment of students to courses that overcomes a lot of the shortcomings of previous studies (including self-selection of students to courses and professors). Scott Carrell and James West studied student achievement and course feedback as students moved through mandatory classes in maths, science and engineering. The unique aspect of their study is that professors rotated in sections of the course, assessment was not done by the professors themselves and students were randomly allocated to professors (but all studied the same content). A first finding that is of practical importance is that less academically qualified instructors got students more (erroneously?) interested in the topics which resulted in better immediate student performance, but then led to lower scores in follow-on related courses. More experienced and qualified professors in contrast had students that did not well in the introductory classes, but those students than excelled later on. Those students were able to build on what they had learned during the initial courses.
What is even more important is that professors who were rated positively by students did better in the initial courses. However, the rating of the effectiveness of the professor did not predict later performance! In fact, in a number of cases the correlation flipped - students studying with the more highly rated professors did worse in half the courses than those who studied with a prof who was not rated as highly (note: only one of these correlations was significant - the point remains the same though: ratings of teacher effectiveness does not predict long-term student achievement). As Carreel and West argue:
'Since many U.S. colleges and universities use student evaluations as a measurement of teaching quality for academic promotion and tenure decisions, this finding draws into question the value and accuracy of this practice.'
Are there alternatives? Yes!The reliance on student evaluations for courses and teachers is problematic, if these evaluations are not considered in a larger context of what is achieved in a course. In the business world, this has been long realized. Teaching is effectively training. In the organizational world, Donald Kirkpatrick developed a famous four stage model of training evaluation. The four criteria for the evaluation of the effectiveness of training are:
- Student reactions - this is essentially equivalent of student evaluations, assessments of students thought they had learned and how they felt about the course/the teaching
- Learning - this is measured by the increase in knowledge or capability after the course, we could consider the test performance in a test or exam as a good measure of this (of course only if the assessment is independent of the teacher - see above the problem with the easiness of a course)
- Behaviour change - this refers to the changes in the behaviour outside the teaching environment that are a result of the teaching, including applications of what has been learned to new situations outside the teaching/training context
- Results - this is the effect of the teaching on the business or the larger environment that results from the performance and the behaviour changes induced by the teaching/training
The 3rd and 4th points are what universities (and society) should be concerned about. There has been a lot of questioning of the value of tertiary education recently (see for example here, here and here). These criteria can help in re-adjusting both the focus of universities as well as criteria that are used to evaluate professors.
Students and society deserve better, not just the profs ;)
Comments are welcome as usual :)