On 21 September, DiploFoundation launched the humAInism Speech Generator as part of its humAInism project. By combining artificial intelligence (AI) algorithms and the expertise of Diplo’s cybersecurity team, this tool is meant to help diplomats and practitioners write speeches on the topic of cybersecurity. Given the research nature of the project, the main purpose of the generator was to explore various new AI technologies and examine their useability in the field of diplomacy. For this purpose, we used several state-of-the-art algorithms for the generator, with three main purposes. We use the DistilBERT language representation model to encode sentences into 512-dimensional vectors. After that, the approximate nearest neighbor search algorithm is used to compare vectors and calculate their similarity score according to their angular distance. For this purpose, we implemented the technology listed below. 1.1. DistilBERT model DistilBERT is a transformers model, smaller and faster than BERT (Bidirectional Encoder Representations from Transformers), which was pretrained on the same corpuses in a self-supervised fashion using the BERT base model as a teacher. This means it was pretrained on raw texts only, with no humans labelling them in any way (which is why it can use a lot of publicly available data), and through an automatic process generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained with three objectives. In this way, the model learns the same inner representation of the English language as its teacher model, while being faster for inference and downstream tasks. Reference: Sanh V et al. (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv, 1 March. Available at https://arxiv.org/abs/1910.01108 [accessed 20 September 2020]. 1.2. Approximate Nearest Neighbors Oh Yeah (Annoy) search algorithm Annoy is a C++ library with Python bindings which searches for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mapped into the memory so that many processes may share the same data. Reference: The Annoy Python module on GitHub. Available at https://github.com/spotify/annoy [accessed 20 September 2020]. For this task, we used models that were pretrained on Wikipedia (Wiki-40B) and the Explain Like I’m Five (ELI5) questions datasets. We applied models on our custom Diplo dataset, consisting of Diplo books and Internet Governance Forum (IGF) transcripts. The process of generating answers is done in two stages. The applied algorithms are listed below. 2.1. BERT The language representation model BERT is short for ‘Bidirectional Encoder Representations from Transformers’. Unlike recent language representation models, BERT is designed to pretrain deep bidirectional representations from an unlabeled text by jointly conditioning both left and right contexts in all layers. As a result, the pretrained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering (QA) and language inference, without substantial task-specific architecture modifications. Reference: Devlin J et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv, 11 October. Available at https://arxiv.org/abs/1810.04805 [accessed 20 September 2020]. 2.2. Faiss search algorithm The Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and the clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to those that possibly do not fit into the random-access memory (RAM). It also contains a supporting code for evaluation and parameter tuning. Faiss is written in C++, with complete wrappers for Python/NumPy. Some of the most useful algorithms are implemented on the graphics processing unit (GPU). Faiss was developed by Facebook Artificial Intelligence Research (FAIR). Reference: Faiss on GitHub. Available at https://github.com/facebookresearch/faiss/wiki [accessed 20 September 2020]. 2.3. BART BART is a denoising autoencoder for pretraining sequence-to-sequence models. It is trained by: (1) corrupting text with an arbitrary noising function, and (2) teaching a model to reconstruct the original text. It uses a standard tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalising BERT (due to the bidirectional encoder), GPT (with a left-to-right decoder), and many other more recent pretraining schemes. BART is particularly effective when fine-tuned for generating text, but it also works well for comprehension tasks. It matches the performance of RoBERTa, with comparable training resources on GLUE (General Language Understanding Evaluation) and SQuAD (Stanford Question Answering Dataset); and achieves new state-of-the-art results on a range of abstract dialogues, question answering, and summarisation tasks, with gains of up to 6 ROUGE (Recall-Oriented Understudy for Gisting Evaluation). BART also provides a 1.1 BLEU (bilingual evaluation understudy) increase over a back-translation system for machine translation, with only target language pretraining. Reference: Lewis M et al. (2019) BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv, 29 October. Available at https://arxiv.org/abs/1910.13461 [accessed 20 September 2020]. For the task of generating introductory sentences, we used a pretrained GPT-2 algorithm, which we fine-tuned on a dataset generated from the first three sentences of the UN General Debates Dataset (UN General Debates). Elements of the applied algorithm are listed below. 3.1. GPT-2 ‘GPT-2 is a large transformer-based language model trained using the simple task of predicting the next word in 40GB of high-quality text from the internet. This simple objective proves sufficient to train the model to learn a variety of tasks due to the diversity of the dataset. In addition to its incredible language generation capabilities, it is also capable of performing tasks like question answering, reading comprehension, summarisation, and translation. While GPT-2 does not beat the state-of-the-art in these tasks, its performance is impressive nonetheless considering that the model learns these tasks from raw text only.’ (Rajapakse, 2020) Reference: Radford A et al. Language Models are Unsupervised Multitask Learners. OpenAI. Available at https://arxiv.org/abs/1910.13461 [accessed 20 September 2020]. Rajapakse T (2020) Learning to Write: Language Generation With GPT-2. Medium, 27 April. Available at https://medium.com/swlh/learning-to-write-language-generation-with-gpt-2-2a13fa249024 [accessed 20 September 2020].
1. Semantic similarity search
2. Generation of long-form answers
3. Text generation