Is an Ofsted judgement a lagging or leading indicator of school performance?

The school inspectorate, Ofsted, was created in an era where detailed pupil background and attainment data was not collected. The only way to judge whether a school was doing a good job was to visit it.

Of course, Ofsted aims to do more than replicate exam performance monitoring. It can report on other dimensions of schools life – the safety and welfare of pupils, their extra-curricular provision, and so on (Iftikhar Hussain has conducted research on this). But can inspectors help us overcome the greatest problem with exam performance monitoring, which is that it necessarily tells you what the school was like, rather than the current quality of teaching and learning.

Here we look at 804 secondary schools who were visited by Ofsted in 2011/12 and for whom we have sufficient performance data.

We explore whether there is any evidence that Ofsted inspectors are leading indicators of future changes in exam performance. In other words, through observing leadership and teaching of year groups still in the school during their visit, can they correctly identify schools on the cusp of a change in fortunes, for better or worse?

It would be unrealistic to expect Ofsted inspectors to identify all future changes in exam performance, not least because many occur by chance. So, we pit Ofsted against our own exam-inspector who makes judgements solely using the past two years of GCSE performance data.

Our exam-inspector rates the effectiveness of schools using a bundle of contextual value added (CVA) measures from 2010 and 2011. A CVA measure reports how well a school performs, given its pupil intake attainment and demographics. We use CVA measures in maths, English, 5 A*-C rate, best 8 GCSEs or equivalents and overall GCSE score.

Having extracted a single (principal components) factor to describe the past school performance overall, we assign the inspected schools a 1-4 rating using the same proportions as Ofsted assigned that year. The exam-inspector doesn’t aim to mirror what the Ofsted rating was in that year. Instead it makes a judgement solely based on past GCSE outcomes. This means their correlation is not particularly close.

The question for us is, when Ofsted makes a judgement that is more optimistic than our exam-inspector (green on the table) are these schools more likely to be on a positive future trajectory? And when Ofsted is more pessimistic than our exam-inspector (in pink), does the exam performance of the school indeed fall in following years?

Ofsted judgements are not leading indicators for future exam performance

We plot our inspected schools with the 3-year average exam performance that inspectors would have to hand at their visit against their future exam improvements (average 2012-14 minus average 2009-11 pass rates).

The green markers show schools where Ofsted was more positive in its judgement than our exam-inspector is. Many of these schools saw very large deteriorations in their pass rate after Ofsted visits. The pink markers show schools where Ofsted was more negative in its judgement than our exam-inspector is. Yet, many of these schools were on the cusp of huge improvements in their pass rate, suggesting the quality of teaching of existing pupils in the school was relatively high.

Overall, there is certainly no evidence here that Ofsted judgements reflect schools on the cusp of change.

Of course, Ofsted might argue that it is the inspection judgement itself that caused schools receiving relatively negative judgements to improve so much over the next few years. But we see the same pattern if we restrict our post-inspection analysis to the year 2011/12 when it is highly unlikely that
Ofsted would influence exam results since they would have been sat within months of the visit.

The chart below groups schools into the difference between 2012 and average 2009-11 pass rates, from those with deteriorating exam performance on the left to the greatest improvers on the right. It shows that Ofsted is not more positive than the exam inspector in circumstances where a school is about produce significantly improved results shortly after they leave.

Looking at schools on the cusp of change

We look further for evidence that Ofsted is a leading indicator by isolating schools who are experiencing significant changes in performance that are not likely to be due to cohort effects or chance events via a contextual value added measure of best 8 GCSE results.

57 of our 804 inspected schools appear to be on the cusp of a deterioration in performance: they have at least one positive, statistically significant CVA in years 2009-2011 and at least one negative, statistically significant CVA in years 2012-14. In the table below we see that, although exam-inspector and Ofsted inspectors do make quite different judgements on these 57 schools, neither seems more optimistic or pessimistic about the prospects of these schools.

Similarly, there are 80 inspected schools on the cusp of an improvement in their performance, with at least one negative, statistically significant CVA in the years 2009-2011 and at least one positive, statistically significant CVA in years 2012-14. Once again, the exam inspector and Ofsted make different judgements on these schools; and neither is a better leading indicator of these improvements.

What does an Ofsted judgement reflect?

We choose to operate an expensive, high-stakes inspection system in England. Given we now have other clear accountability mechanisms that use pupil test data, it is only right that we reflect on whether we know enough about the reliability, validity and efficacy of Ofsted to justify its cost.

Ofsted inspectors do, of course, observe many interesting activities taking place in schools. But where the quality of teaching and learning would appear to be better in the school than it was in the recent past (and so exam results are about to rise), Ofsted does not appear to spot this. Equally, where schools produce worse exam results shortly after inspectors leave, the Ofsted judgement is not likely to reflect this imminent deterioration in performance.

If Ofsted judgements cannot be shown to be a consistent indicator of past exam performance or a good leading indicator of changes in performance, then this does not necessarily mean that inspection judgements are highly subjective. However, it is important that Ofsted are clear exactly what it is that they intend to measure so that external researchers can evaluate whether they actually meet their remit.

**We first published this analysis in our report for secondary schools called Floors, tables and coasters: shifting the education furniture in England’s secondary schools.