This is part of a series of posts from Datalab on how the 11-plus works in practice in Kent. Find the other posts in the series here.

All tests are unreliable to some extent, so a person’s score is partly a matter of chance. This means that for some, the decision to offer a grammar school place or not will be something of a lottery.

‘Evidence on the effects of selective educational systems’, Robert Coe and others for the Sutton Trust, 2008 [PDF]

The idea of a school entrance test is to sort children into how academically capable they are, so that schools do not need to cater for children with different educational needs.

With less than two hours of testing time, there will always be academically capable children who fail and less capable children who pass. In fact, no 11-plus test will ever sort children perfectly, even if we were to ask 10-year-olds to sit a test every day for a whole month.

Classification accuracy

For a second, let us imagine one small tweak to how the 11-plus system works currently.

Imagine if, alongside your letter stating whether your child had passed the 11-plus, the assessment companies gave you an additional piece of information – the probability that they have been misclassified by the test.

One parent might be told their child had passed, and yet the probability she should, in fact, have failed was 39%. Another would be told their child has failed, but the probability he should have passed was 47%.

This is related to something known in the education literature as classification accuracy – essentially the degree to which pass/fail allocations agree with those that would be based on examinees’ true score (the thing that, imperfectly, we are trying to assess).

For any test, classification at the pass mark tends to 50%. That might seem counterintuitive at first glance. But tests can be thought of as only covering a fraction of the total material that could be covered. So whether a child scores exactly one mark below the pass mark, or one mark above the pass mark is equally likely – a 50% chance of either.

The big unknown

In the case of the 11-plus for the group of children whose performance sits exactly at the pass mark this 50-50 chance shapes their identity and lives forever. And the 11-plus in Kent has a quirky characteristic that means a large proportion of test-takers have a high risk of being misclassified.

Rather than rely on a simple overall pass mark, under the Kent Test students must gain at least 320 overall and at least 106 on each of the reasoning, maths and English papers.

So there are four elements in which a child is being assessed as being either above, or below, the required standard.

In 2015, for example, 144 Kent Test-takers achieved exactly the overall pass mark of 320, but a total of 400 children would have failed if they had dropped just one mark on one of the three papers – this is 8% of those who pass the 11-plus. And for these children, for the element of the test which they were one mark off the pass mark, there was a 50% chance that they should actually have achieved the pass mark.

Assessing classification accuracy

While we know the classification accuracy for those just one mark away from passing or failing, we don’t know how quickly the accuracy improves as we look to children achieving slightly higher or lower scores than this. Without the candidate marks on individual test questions we cannot estimate this.

And while the commercial assessment companies that run 11-plus tests will routinely check the classification accuracy of their tests, as well as the classification consistency (the probability that a candidate would be classified the same over successive administrations of the test) they do not publish data on this.

There are some clues that the classification accuracy of the Kent Test could be quite low.

One way of looking at this is to compare performance in the 11-plus to performance in SATs taken just eight months later.

For example, the English element of Kent Test shows a correlation of 0.62 with reading and 0.60 with grammar, punctuation and spelling (GPS) at KS2; both the maths and the reasoning elements of the Kent Test are correlated at 0.68 with the maths KS2.

So, we can say that the Kent Test has low predictive validity for an academic test taken shortly afterwards.

To be clear, we are not suggesting that SATs tests are any more reliable – simply that we have two tests that claim to measure performance in similar domains (i.e. how good is a child at maths and English), and yet they frequently disagree.

The myth that the 11-plus separates children effectively into those who can and cannot benefit from the grammar school education is only sustained because we are not transparent about the extent to which the system must be misclassifying some children through this relatively short test.

Society needs to be confident that the most academically capable children will reliably pass the 11-plus, regardless of the particular test questions that are set on the day.

Publishing classification accuracy statistics would likely reveal the unreliability present in the 11-plus’s attempts to identify the ablest children from those in the top half of the distribution.

If each child learnt how confident we were that they have been correctly classified, then I suspect we would feel the 11-plus process would feel quite unjust.

Conclusion

No test classifies those who take it with perfect accuracy. But the data that we would need to be able to reach an informed conclusion on how accurately the Kent Test classifies children as either passing or failing is not currently made available by the assessment company responsible.

This is part of a series of posts from Datalab on how the 11-plus works in practice in Kent. Now read the next post in the series.