The Teach classroom observation tool – launched by the World Bank last week is a great initiative. Let’s collaborate to test and improve it.

The World Bank launched the Teach classroom observation tool on January 31st 2019. (See image: Domains assessed by the Teach tool.) The materials supporting the tool are impressively comprehensive--a background paper, reliability and validity data, an instruction manual in 4 languages, training resources, videos, a guide to adaptation to different contexts and code for analysis and cleaning. In his presentation of the tool, Ezequiel Molina said one example of poor teaching was explaining something, asking for questions from the class and then hoping they keep quiet. In that spirit, I am a class member piping up with questions and with my hopes and fears for the use of this tool. Here are what I see as the challenges it has to overcome, in descending order of importance.

1. Reliable observations aren't easy

The challenge of making reliable classroom observations is often underestimated. It involves understanding what you are looking for and knowing when you see it. It’s not until you try to get two people to agree on what they are seeing that you understand how challenging even simple behaviours can be to record accurately. The Teach tool has a few behaviours that could be complex to observe. For example, whether or not “the teacher adjusts teaching to the level of the student” is a relatively subjective judgement. My experience is that reliability is tough to achieve in low-capacity contexts on anything but the most concrete of constructs. An example from our work in Kenya was that observers could reliably observe when a literacy teacher was talking about a letter vs. a word vs. a sentence but not in determining whether the goal of instruction was comprehension or decoding, even though those two terms are well defined. A helpful article from RTI colleagues reviews different approaches to classroom observations and discusses the problems of subjectivity in "high-inference" versus "low-inference" observation tools and how they can be tackled.

It is critical that every project using this tool conducts inter-rater reliability assessments and uses data only for the indicators for which reliability is good. If two people don’t agree on what an item measures, it probably doesn’t measure anything useful. The Teach tool has impressive reliability statistics from a field test in Pakistan, but this is only one context and one set of assessors. Challenges maybe different in different settings, particularly in a lower income,  lower-capacity country. The danger is that users will see the World Bank logo--and the impressive team behind the tool--as a stamp of quality and bypass their own reliability checks. If there is one piece of advice to underscore from the Teach developers, it is: test reliability of each item before you use the tool. Fortunately, this task is made relatively easy by the resources available, including stata and excel tools to analyze reliability. And reliablity checks cost a small fraction of the main assessment exercise. 

2. It may not be possible to thin-slice behaviour

“Thin-slicing” describes the ability to find patterns in behaviour based on a narrow window of experience. This is what the Teach tool (along with most observation tools) proposes to do by observing a class for 2 x 15 minutes periods. If that is the only observation of that teacher it will cover only a tiny fraction of all the things she does in the classroom. It’s quite possible that the observation periods take place at the wrong time – when teachers are doing an activity that doesn’t allow them to demonstrate their skills, or maybe just on a bad day. This challenge of thin-slicing behaviour, coupled with the reliability issues discussed above, explains why classroom observations have such a poor record of predicting learning outcomes in general. (My impression from talking to people in the field is that most classroom observations explain only a small proportion of variance in learning outcomes, if any at all). There may be no easy solution to this one. We either need to take thin-slicing seriously--do the research to work out which behaviours observed in one lesson are indeed indicative of an overall pattern--or we need to observe teachers more often and for longer.

3. Are the most important classroom processes universal to all subjects?

The Teach tool aims to assess universal aspects of classroom processes - those that are common to lessons in literacy, numeracy, civics and history. It’s possible that the key indicators of quality in a classroom are specific to the subject being taught. We found that the HALI literacy intervention in Kenya had its impact through improvements in student interaction with text and with teachers breaking down words into letters and sounds. These changes in practice were predictive of improvements in student learning (as described in this blog) but would not be captured by the Teach tool. Observations about literacy instruction practices are key to Tangerine Tutor which has been used in a number of successful RTI projects such as Tusome in Kenya. My hope is that researchers explore subject-specific extensions to the tool and see Teach as capturing the foundations but not the entirety of good teaching practice.

4. Are the most important classroom processes universal to all contexts?

As the Teach tool is used in more contexts, it will be important to assess how widely applicable the framework is. It’s promising to see that the Teach manual advocates the use of locally made videos to demonstrate how, for example, “the teacher treats all students respectfully” should be interpreted in each context.  A deeper level of contextual adaptation would question whether the same teacher behaviours are equally important across all contexts. “The teacher’s explanation of content is clear” is surely important everywhere, but it’s easier to imagine that “the teacher promotes students’ interpersonal skills” may be more important in some contexts and at some stages of education than others.

Which is all to say that we need to continually test Teach as it is used, both its reliability and its predictive validity (its relationship with learning outcomes in particular). The great thing is that the open access nature of the tool and the extensive documentation will encourage its widespread use and hopefully efforts to test and improve it. My view is that Teach is an improvement on any observation tools that have been widely used in low- and middle-income countries to date. But the achievement of the Teach team is not that they have delivered the ultimate in classroom observation tools, but that they’ve given us a fantastic platform to work together towards that goal.


I'm grateful for recent conversations on this topic with Luis Crouch, Ben Piper and Sarah Pouezevara at RTI and Jack Rossiter at CGD among others.

