In the wake of the publication of secondary school performance tables last month, there was a brief flurry of activity about ranking schools based on Progress 8 scores. This tweet from Tom Sherrington was a particular favourite:

In this blogpost, I’m going to use far more words and a few charts to say the same thing. I’ll also say why I don’t think we need a more precise measure.

By how much do schools differ?

Let’s start with Attainment 8.

In the chart below, I’ve plotted schools ranked by A8 score. I’ve altered the scale such that the axis is centred on the national average for state-funded mainstream schools of 47.4, and A8 scores have been divided by 10 to calculate an average grade per slot. This is so we can use the same axes as Progress 8 later.

Almost 1,600 out of the total of 3,165 schools have A8 average grades per entry between -0.5 and +0.5 (A8 scores between 42.4 and 52.4). In other words, they differ from the national average by at most half a grade per subject.

Now let’s do the same thing for P8 – see the chart below.

The key thing to note here is that the distribution is flatter. By removing differences in prior attainment, we reduce the amount of variation in attainment between schools. The number of schools between -0.5 and +0.5 increases to 2,300.

It can also be seen that the range of P8 scores for the top and bottom 10% of schools (particularly the top/bottom 5%) is greater than the middle 80% of the distribution. In other words, there is a small number of schools with particularly high (or low) scores.

Put another way, the difference in P8 scores for the top 300 schools is 1.3 and for the middle 300 schools it is 0.09. It is far harder to move up the rankings at the top or bottom end of the distribution – doing so involves a much greater increase in P8 scores.

What is a meaningful difference in P8 scores?

One reason why value added scores are converted into ranks by some is because they lack real world meaning. With the exception of the floor standard (-0.5) we don’t have any real sense of whether a score is educationally important or not.

There were over 300 schools with P8 scores between -0.05 and +0.05 – a difference of over 300 rank places (10% of schools) between the highest and lowest scoring of them. But what do these numbers mean?

Let’s say the score for School A was +0.05 and School B was -0.05. Taking the numbers at face value, one interpretation is that if you picked two pupils with the same KS2 attainment, the two pupils would have the same grades in seven of the subjects included in Attainment 8 but the pupil from School A would have one grade higher in one and only one subject than the pupil in School B.

Is this an educationally important difference?

Perhaps, but there are plenty of reasons why it might arise beyond the quality of education offered by a school, such as differences in qualifications offered and demographic characteristics.

Measurement error and chance also play a part. The attainment of pupils varies, even when taking account of prior attainment.

Here’s a simple demonstration of how chance could affect schools’ P8 scores.

Let’s imagine we draw a random sample of 160 pupils from the national population and calculate their average Progress 8 score.

What would it be? Zero? Or higher? Or lower?

The answer is that it depends on the sample drawn. We could draw a typical sample and so the score would be close to zero. Or one that was atypical, with a score that was much higher (or much lower).

Let’s imagine we draw repeated random samples of 160 pupils from the national population and make a note of the average P8 score for each. In the chart below I’ve drawn 10,000 samples and plotted the results. The scores are normally distributed and clustered around zero.[1]

A total of 90% of the sample had scores between -0.16 and 0.16. If these were real schools, they would be separated by 920 places in the rankings.

It is also worth re-iterating that schools with smaller cohorts will tend to have more extreme scores (either higher or lower) – in other words their scores are more variable.

Making better use of data

In summary, for a value added measure like Progress 8, which is aggregated from a number of subjects, there is not a great deal of difference between most schools. Only around 14% of the variation in pupil P8 scores is between schools. Even less when studio schools, university technical colleges and further education colleges are removed.

The result of this is that schools’ P8 scores will appear to jump around from year to year. This doesn’t really matter because the differences between the vast majority of them are tiny in comparison to the range of scores between pupils.

Plenty can be done to improve P8, such as taking context into account. But this will make differences between the majority of schools smaller still.

That said, the data would be good enough to do two things. Firstly, to identify a small group of schools where support may well be needed. Secondly, to examine how much variation there is within the system.

Given comparable outcomes, not all schools can improve. And not all schools can score above zero on a value added measure. But we can try to reduce variation in the system, whether it is between schools or between groups of pupils. This would be a better use of data than ranking schools from best to worst.

Want to stay up-to-date with the latest research from FFT Education Datalab? Sign up to Datalab’s mailing list to get notifications about new blogposts, or to receive the team’s half-termly newsletter.

Notes

  1. This uses the 2018 standard deviation in pupils’ P8 scores of 1.26.