AI did my homework

On Valentine’s Day, a non-profit research company called OpenAI, gifted us a paper with a blog post that rocked my world as an educator. OpenAI opens their blog post, Better Language Models and Their Implications, by stating:

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.

In other words, they had demonstrated how a language processing AI could learn, from millions of webpages, how to undertake written tasks (some of which are of reasonably high quality in terms of sense and coherence) without specifically being trained to do this via a supervised learning process. To understand the implications of the research, it is worth trying to get to grips with what supervised and unsupervised machine learning is (for someone like me this is a steep learning curve!).

One of my favourite (super challenging) publications on AI, Machine Learning For Humans, explains supervised learning like this:

In supervised learning problems, we start with a data set containing training examples with associated correct labels. For example, when learning to classify handwritten digits, a supervised learning algorithm takes thousands of pictures of handwritten digits along with labels containing the correct number each image represents. The algorithm will then learn the relationship between the images and their associated numbers, and apply that learned relationship to classify completely new images (without labels) that the machine hasn’t seen before. This is how you’re able to deposit a check by taking a picture with your phone!

The thing about the OpenAI language model is that it is a leading edge example of unsupervised machine learning, which is:

a branch of machine learning that learns from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data

Or as Brownlee eloquently explains:

Unsupervised learning is where you only have input data…and no corresponding output variables. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data. These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher (algorithm). Algorithms are left to their own devises to discover and present the interesting structure in the data.

Sure, getting your head around machine learning is tricky and I’m still getting there. The main thing about the OpenAI research on language models is that it used unsupervised learning. As the researcher state, it represents a ‘move towards more general (AI) systems which can perform many tasks – eventually without the need to manually create a training data set for each one.’ It is an initial step towards developing AI that can learn by itself in what the researchers call “the messiness of ‘language in the wild’’.

If you want to see examples how the model went at writing text in mini-essay style read the blog post and paper. While the AI wasn’t perfect by any means, it did produce fairly intelligible and logical text – perhaps akin to something a middle school student would write. I especially liked the more ‘creative’ task where the machine finished off a story about the discovery of unicorns – it was reasonably coherent and certainly entertaining.

The researchers suggest that the future of such generalist language models lies in AI writing assistants, better dialogue agents, speech recognition and unsupervised language translation. I know it’s terrible, but as an educator my mind went straight to more insidious uses such as undetectable plagiarism.

Imagine a future where an AI can respond to an assessment task by producing an original short answer, mini essay or short creative writing piece which is fairly coherent with a few writing and/or logic flaws – say at a pass-able level.  No two responses would be the same because the model would be able to learn to check against what it and other AI had already produced. Much energy and money has been put into plagiarism (academic integrity) education and tools in universities and, increasingly, secondary schools. As educators we actively teach students about the need to produce original work which demonstrates deep and synthesised knowledge, mastery and application of ideas, and a good grasp of academic literacy conventions. But what does originality and good writing mean when an AI can do a passable imitation of it?

No doubt when such generalist language models advance to commercial release, the whole idea of learning to elegantly express the knowledge and ideas you have gained as a learner, will be radically challenged. So much of traditional formative and summative assessment, for good or bad, relies on students producing written work (on computing devices). What happens when the machine is the author and you, as a teacher, cannot detect this? Will it matter?

Lest we return to an over-reliance on that great ‘sorting hat’, the handwritten exam, we will need to begin to more cleverly engage with how to use machine learning generated text in more critical ways. Now for teacher blue-sky-dreaming time.

Could we have students generate machine texts to critique and improve them, use them as the basis of a drafting process or as a starting point for research and creative work? In other words, use machine’s text to extend students into the analysing, evaluating and creative realms of Bloom’s revised taxonomy. Do we need to look much more carefully on how students can use verbal modes of communication to express their learning or actually give them the thinking room and time in the curriculum to create complex multi-media texts to represent or teach us about their learning?

This will involve cultivating a pedagogical disposition where the products of the machine are leveraged for advancing original human thinking. There will have pedagogical and curriculum implications beyond catch-cries of promoting 21st century learning or the converse reaction of removing devices from classrooms.

Of course students, of all ages, will want powerful reasons why they need to do higher order thinking in order to produce original texts. As educators, how convinced are we that writing ‘beyond the machine’ is necessary and can we convince students of this using compelling evidence and arguments?

Leaving aside the technical WOW factor of the research, perhaps the most telling part of it is in the section of the blog post that explains why the researchers are not releasing the entire code or the model. They explain, that in the era of deep fakes and other malicious uses for AI (automated phishing, trolling and fake news), they have decided to do be ethical in their code release approach:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of (the model) along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights. Nearly a year ago we wrote in the OpenAI Charter: “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” and we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time. This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.

This of course leads educators to begin to understand the immense challenges AI will bring in discerning what is real or true from what is not in an age where machines can be automated to produce deep (very realistic) fakes. What does critical thinking look like when you cannot discern is something on the internet is real? What does this mean for teaching digital literacy? Must we rely on machine learning to tell us that something has been produced by a machine? Will humans no longer be up to the task of researching and critically reflecting on the truth of something in this new machine age? One thing is for certain, this AI world is ripe for teaching about ethics and ethical decision-making, a vital part of school and tertiary education.

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s