What is our prediction based on?

The prediction score is a score between 0 and 1 that is attributed to each student in the Dashboard. The closer the score is to 1, the more likely it is there is something worth investigating. It is calculated using Turnitin’s prediction algorithm, which uses Natural Language Processing (NLP) methods.

NLP is a subfield of Artificial Intelligence that is focused on enabling computers to understand and process human languages, to get computers closer to a human-level understanding of language.

So how does it work?

The prediction algorithm was trained on a set of labeled data containing student work by the same author and work by different authors. The machine-learning algorithm was trained on a very large labeled dataset across hundreds of linguistic features to learn what characteristics signify authorship and non-authorship. We then tested the trained algorithm on another large test/validation dataset to ensure we did not overtrain our model on the training data.

These linguistic features are often too complex to present as valuable data. For this reason we combine them into an easily digestible score that we can attribute to a student.

Is it accurate?

Our accuracy targets were based on research conducted by Deakin University on how well markers identify contract cheating when they are told to look for it. In that study, they reached a 62% sensitivity in identifying contract cheating. Our algorithm is tuned to have the same level of sensitivity (detection rate) in identifying different authors based on our prediction algorithm validation.

The more you and other institutions use Authorship for Investigators, the better the prediction model will become.

We never claim that a student has contract cheated, we simply recommend further investigation. It is up to the investigator to determine if there is enough evidence to make a contract cheating allegation.