Author Image

GPT-3 for diplomacy?

Katharina Höne
Published on September 24 2020
The text discusses the potential of GPT-3 in diplomacy, highlighting its capabilities in augmenting diplomatic tasks for efficiency but falling short in understanding complex diplomatic contexts due to its lack of contextual comprehension. While GPT-3 can mimic styles and genres well, it may not fully meet the nuanced requirements of diplomatic writing. The future may involve more specialized systems in diplomacy, like the Speech Generator developed by DiploFoundation, which aims to imitate human speech-writing processes using specific algorithms for tasks, providing a potential direction for the integration of AI in diplomatic practices.

The artificial intelligence (AI) Generative Pre-trained Transformer 3 (GPT-3) can write texts on any topic. OpenAI, the organisation that developed and released it as a beta version in June 2020, describes it as a general-purpose application for creating text, ‘allowing users to try it on virtually any English language task’. GPT-3 is a scaling up, by two orders of magnitude, of the previous model released by OpenAI, making it ‘the most powerful natural language processing (NLP) application available today’. 

The promises are greater accuracy and an improved ability to transfer things learned in one context to a different context. Overall, GPT-3 can mimic a variety of styles and genres, and in doing so, return texts that look very much like having been written by a human. The Guardian recently used it to write an article. So, what does this mean for diplomats whose daily work is steeped in the art and craft of language? 

Automated diplomacy?

When thinking through the use of AI for specific tasks and within specific professions, it is useful to distinguish between augmentation and automation. Augmentation describes a situation where parts of a task are taken over by a machine. Automation means that the whole process is taken over by a machine with extremely minimal, if any, human intervention. What can GPT-3 deliver in terms of augmented and automated diplomacy? 

Augmentation: Efficiency tools

OpenAI’s website includes a number of use cases that are also applicable to the work of diplomats. First, the company CaseText uses GPT-3 to search through legal documents and to facilitate litigations and presentations by lawyers. Similar applications in the area of international law are not hard to imagine, and have indeed already been suggested and tested (the Cognitive Trade Advisor is an example). Second, productivity tools that lead to better decisions could also be applied in the field of diplomatic practice. Third, ‘comprehension tools’, that provide quick summaries of long texts, might also eventually aid the work of diplomats. As these tools become more widely available and used, it is not far-fetched to suggest that diplomats will use them in their daily work, either as off-the-shelf productivity tools or as custom-build systems that take the specifics of the work of diplomats into account. With GPT-3 becoming available beyond the beta version, developing custom applications should move within easy reach. It’s also worth pointing out that the tools described here are nothing new, the difference being that GPT-3 is the latest and most powerful NLP tool available today. 

The promise associated with use cases like these is greater efficiency and productivity. While this resonates well in a business context, it resonates less when it comes to diplomatic practice. To be clear, ministries of foreign affairs are under budgetary constraints and have an obligation to use public money responsibly. It can also be an advantage to be faster and more efficient when doing research in preparation for a negotiation. However, finding an agreement or being successful in negotiating texts cannot be measured by these efficiency metrics. While greater efficiency can be an advantage for negotiators and can level the planning field for small and developing states, it does not win you the overall ‘battle’. 

Automation: Diplomatic writing tasks

GPT-3 delivers some interesting results on the basis of an initial short piece of text submitted to the system. It matches the tone and style and returns a text that is, more often than not, understandable and reasonable. More importantly, it is hard, if not impossible, to distinguish that text from a piece written by a human being. 

Therefore, we can assume that the system will be able to match the tone and style of a typical diplomatic speech, for example, those delivered at the opening of the UN General Assembly each year. It is also feasible that it will match certain positions and interests based on the initial short text submitted to it. If you give the system a speech by Prime Minister of New Zealand Jacinda Ardern, it will very likely return a text that believably sounds like a speech by her. If you give the system a speech by US President Donald Trump, it will very likely return a text that believably sounds like a speech by him. 

While such a text might be interesting as an initial suggestion or a general template, it will need a lot of editing and rewriting. Although we were not able to test GPT-3 ourselves, we assume that the text, also passable as having been written by a human being, will still miss the mark in the context of diplomatic practice. The following aspects are very likely missing: overall coherence; references to specific examples that are most useful in this context; references to historic moments important for an occasion; and an understanding of the relations between countries and how they should be reflected, often implicitly, in specific parts of the speech. 

The explanation for these doubts and potential shortcomings is simple: GPT-3 operates by mapping relationships between words without having an understanding of the meaning of the words. It’s great at predicting the next word in a sentence, but lacks understanding of the overall context. This explains the statement from Open AI that GPT-3’s ‘success generally varies depending on how complex the task is’. For these more complex tasks, human editors and writers are needed. For example, it’s also worth noting that, according to the editor’s note accompanying the Guardian article mentioned above, the article was a piece of augmented, not automated, journalism. Journalists selected and rearranged passages, and the article went through the usual editing process. An opinion piece also published in the Guardian suggested that 90% of the text generated by GPT-3 was discarded before editing. 

This is not to take away from the fact that GPT-3 is a huge accomplishment and a big step for these types of language processing AIs. It might serve as a way of making speech-writing quicker by already providing templates and useful suggestions. In this sense, it could work much like the autocomplete function in e-mail services and word processors. This brings us back to the automation-vs-augmentation question, and the, perhaps, reassuring knowledge that neither diplomats nor human speech-writers are likely to be replaced anytime soon. 

The way forward?

Without having tested GPT-3 ourselves, we cannot be sure, but the hunch is that more specialised systems are needed in the area of diplomacy. In a paper released by the mothers and fathers of GPT-3, it is suggested that relying on a more-text-more-computing-power approach will eventually come up against limits. With such an approach, the system becomes better and better at predicting the word most likely to appear next in a sentence. It does not, however, become better at keeping the next sentence or the text as a whole ‘in mind’ (for a detailed discussion of this point, see this article on GPT-3). For that, a different approach is needed. 

At DiploFoundation, as part of our AI humAInism project, we have experimented with how this different approach could look like in the field of diplomacy. Our own Speech Generator is meant as an illustration of what can be done and how it can be done. Diplomats working in the field of digital policy and cybersecurity will find it particularly interesting to experiment with. The Speech Generator allows for selecting an opinion on various key topics on the basis of which a speech is generated. 

In contrast to applications like GPT-3, we tried to mimic the human process of writing a speech by using smaller algorithms trained for specific tasks, such as an algorithm for  finding keywords and phrases (‘underlining’), an algorithm for recommending paragraphs on a specific topic, an algorithm for summarising paragraphs, etc. As our developer Jovan Njegic would say, ‘in this way, we try to form a system of interconnected algorithms, which imitate not the results of the writing process, but the human process of reasoning during speech-writing’. This also means that if a result is not appropriate, the user can go back and tweak the process. Our speech generator is an illustration, not a fully fledged application for diplomats, but it might just point us in the right future direction. 


cross-circle