Another attempt at a qualification-neutral Progress 8 measure

Progress 8 is the value added measure by which secondary schools are being judged, starting from the last academic year.

At Datalab we’re broadly supportive of it as a measure. But no measure of school performance is perfect, and P8 is no exception.

It doesn’t account for pupil background, and so favours schools with high percentages of pupils with English as an additional language (EAL) – who tend to make more progress than average between the ages of 11 and 16 – and disadvantages those with large percentages of non-EAL disadvantaged pupils – who conversely tend to make less progress.

It also doesn’t account for measurement error in prior attainment.

Ultimately, then, it’s just a useful descriptive statistic, not a good measure of school effectiveness.

Can P8 be improved upon?

Under P8, pupils’ grades in the various qualifications counted in performance tables are converted into points.

The scoring system tacitly assumes that all passes at the same grade represent the same level of challenge. However, Ofqual’s own reports on inter-subject comparability in GCSEs shows that this doesn’t appear to be the case at present (and may never have been the case).

On top of that, the equivalence of non-GCSE qualifications to GCSEs has to be established.

The net result is that some qualifications appear to be graded more (or less) severely than others.

Over the last few months we have been examining how schools’ qualification choices influence their P8 scores. We want to remove any incentive for schools to pursue qualifications simply because they are perceived to be less severely graded – or, more pejoratively, thought to be “easier”.

In a previous blogpost we looked at a different way of calculating P8, by first calculating value added in every individual subject. We concluded, however, that the approach was less optimal than unadulterated P8, although we think it could be improved with a bit more work.

In this blogpost, we take a different approach. Whereas grade C passes in all subjects in GCSEs scored five points in 2015/16 in the Department for Education (DfE) Progress 8 calculation, we are going to award different points to different subjects.

A different approach to comparability

Research has shown that GCSEs in modern foreign languages tend to be more severely graded than other subjects, therefore we will score a grade C in French more highly.

Instinctively this might feel wrong, but go with it for now. (Besides, there is worse to come.)

We stress that we are not implying that any subject is ‘easier’ (either to learn or to teach), we are just attempting to make the points scores used statistically comparable between subjects.

One of the main difficulties in establishing comparability between qualifications is that pupils (and schools) make different choices from the wide range available.

While there is an expectation that everyone takes GCSE English and maths, subsets of the national population of pupils enter other qualifications.

Some qualifications, like GCSE physics, will tend to be entered by higher attaining pupils. Conversely, the entry profile for GCSE core science tends to be lower attaining.

In the methodology for our new ‘qualification neutral Progress 8’ measure, the points scores we assign to different qualifications are chained to the distribution of entrants’ outcomes in GCSE English and maths, the two qualifications entered by almost all pupils. In other words, we make all subjects statistically comparable with English and maths.

The full methodology is explained here.

In short, though:

points scores are allocated to each grade for every qualification by comparing the results of pupils who achieved each grade in each subject to the maths and English results of these pupils;
the rest of the methodology is the same as the DfE’s calculation of A8 and P8.

What do the points scores awarded under this approach look like?

The table below shows the scores awarded to a selection of subjects under our new qualification neutral P8 approach.

Rescaled scores for a selection of subjects^[1]

The average score for each subject reflects its entry profile. As would be expected, the entry profile for maths is exactly average (2.75). Entrants in French and physics tend to be higher attaining, and those in core science and ECDL lower attaining.

Perhaps controversially, points are awarded for being entered but not achieving a grade.

Pupils who fail French, for example, are awarded 1.07 points – more than the 0.79 on offer for a grade F pass in English language. However this merely reflects the fact that pupils who fail French tend to be higher attaining than pupils who achieve grade F in English language.

Comparing P8 and our qualification neutral P8

The scatterplot shows Progress 8 for each school under a) the current DfE method and b) the rescaled score method proposed here^[2].

The floor standard under our qualification neutral approach is set at -0.3 as it captures the same number of schools as the DfE floor standard of -0.5.

The two sets of scores are highly correlated (r=0.98). The rank positions of schools change but not by very much. Based on provisional KS4 data for 2015/2016, our calculations suggest 262 schools would fall below the floor under both measures.

Which qualifications are counted in P8?

We also look at the subjects that are counted among the set of eight results used in the calculation of each pupils’ P8 score.

There is a certain amount of arbitrariness about the selection under the DfE methodology – if a pupil has five grade C passes that could be counted in the three open slots of P8 it does not matter which are counted, so three are arbitrarily selected both in the DfE approach and our approach.

That notwithstanding, the qualification neutral methodology generally leads to more entries in EBacc subjects and fewer entries in non-GCSEs being counted, compared to under unadulterated P8.

The percentage of entries in French and Spanish counted increases, for example.

By contrast, whereas almost all ECDL entries are counted under the DfE methodology, just 60% are counted under the qualification neutral alternative, where each ECDL grade is worth less relative to other subjects than is the case under the DfE methodology.

% of entries counted, most popular GCSEs

% of entries counted, most popular non-GCSEs

In conclusion

The rescaled score methodology presented here potentially offers a practical solution to the problem of comparability of grading between qualifications.

Awarding different points for the same grade in different subjects will be unpalatable to some, but ultimately we either have to live without perfect comparability or find a method of dealing with it.

The introduction of 1-9 grades presented an opportunity to achieve better comparability between different subjects at GCSE. Instead Ofqual chose to give precedence to ensuring comparability with legacy GCSEs. This is a sensible decision. There is no guarantee that comparability would be maintained and the problem of the equivalence of non-GCSEs remains.

Using pupils’ results in GCSE English and maths to create a standardised scale would also remove the need to use the proposed interim points for legacy GCSEs in 2017 and 2018. It would, however, place greater strain on the awarding process in those subjects.

The methodology has a few wrinkles that need ironing out though. Some might feel that it is undesirable to award points to pupils who failed. Additional work is required to score fairly those qualifications with small numbers of entrants (or used by small numbers of schools), including AS-Levels.

The most significant change compared to the current Progress 8 methodology is that the scores for different qualifications would not be known until the Key Stage 4 data has been gathered and processed.

On the one hand, this reduces the transparency of the measure: schools would not know the rules of the game until they have played it.

But on the other, perhaps it will encourage them just to play the teaching and learning game without the incentive to offer certain qualifications as performance enhancing supplements.

Want to stay up-to-date with the latest research from Education Datalab? Follow Education Datalab on Twitter to get all of our research as it comes out.

Notes

1. ECDL – European Computer Driving Licence – grades are Pass, Merit, Distinction and Starred Distinction.
2. It’s worth noting here that the two scores are not directly comparable. A one-unit change in the DfE scale notionally represents a grade at GCSE, whereas a one-unit change in the standardised scale represents one standard deviation in a latent measure of pupil ability.

One Comment

MAshley Mash 10 February, 2017 at 7:00 pm - Reply

Well, Progress 8 in 2016 has serious weaknesses. For my school we have been penalised for:

Not using the IGCSE in English, we used AQA. No comparison whatsoever….. 196,000 entries for the Igcse which has skewed the data in English bucket. The igcse has 40% exam compared to 60% for AQA. Now, our AQA results by question shows we were above both national and similar schools, yet our P8 was -0.3

We have never used the ECDL…..

We only entered 60 or 210 students for two sciences for strategic reasons…

We have our own behaviour centre on site that counts on our figures…. 12 students between -0.5 and -4.0 I have concluded that I should have permanently excluded the most vulnerable…

56% of students are PP, and predominately White working class….

HOWEVER, our contextual VA by department is above the NA and our Progress 8 is now -0.25 by CVA. So, I am now a coasting school… yet everything about my school is not coasting. We fight for every student. So I ask is it worth the fight in your opinion?