The consequences of Meta’s multilingual content moderation strategies

By Alicia Shepherd-Vega

About 8 million Ethiopians use the world’s most popular social media platform, Facebook, daily. Its use, of course, is confined to the parameters of their specific speech communities. In Ethiopia, there are some 86 languages spoken by the population of 120.3 million), but 2 (Amharic and Oromo) are spoken by two-thirds of the population. Amharic is the second most popular language.

Like most countries across the globe, the use of social media in Ethiopia is ubiquitous. What sets Ethiopia apart, though, as with many countries in the Global South, are the issues that arise with developments designed by the Global North for the Global North context. This perspective becomes apparent when one views social media usage from the angle of linguistics.

Content moderation and at-risk countries (ARCs)

Increased social media usage has recently engendered a proliferation of policy responses, particularly concerning content moderation. The situation is no different in Ethiopia. Increasingly, Ethiopians blame Meta and other tech giants for the rate and range within which conflict spreads across the country. For instance, Meta faces a lawsuit filed by the son of an Ethiopian academic, Mareg Amare, whose father was assassinated in November 2021. The lawsuit claims that Meta failed to delete life-threatening posts from the platform, categorised as hate speech against Mareg’s father. Meta, earlier, had assured the global public that a wide variety of context-sensitive strategies, tactics, and tools were used to moderate content on its platform. The strategies for this and other such promises was never published, until the leak of the so-called Facebook files, brought to the fore results of key studies conducted by Meta, such as the harmful effects experienced by users of Meta’s platforms, Facebook and Instagram.

Meta employees have also complained of human rights violations, including overexposure to traumatic content, including abuse, human trafficking, ethnic violence, organ selling, and pornography, without a safety net of employee mental health benefits. Earlier this year, workers at Sama, a subsidiary of Meta in Kenya, received a ruling from a local court that Meta must reinstate their jobs after dismissing them for complaints about working under these conditions and attempts to unionise. The court later ruled that the company is also responsible for their mental health, given their overexposure to violent content on the job.

The disparity in the application of content moderation strategies, tactics, and tools used by the tech giant is also a matter of concern. Crosscheck or XCheck, a quality control measure used by Facebook for high-profile accounts, for example, shields millions of VIPs, such as government officials, from the enforcement of established content moderation rules; on the flip side, inadequate safeguards on the platform have coincided with attacks on political dissidents. Hate speech is said to increase by some 300% amidst bloody riots. This is no surprise, given Facebook’s permissiveness in the sharing and recycling of fake news and plagiarised and radical content.

In the case of Ethiopia, the platform has catalysed conflict. In October 2021, Dejene Assefa, a political activist with over 120 million followers, called for supporters to pick up arms against the Tigrayan ethnic group. The post was shared about 900 times and received 2,000 reactions before it was taken down. During this period, it was reported that the federal army had also waged war against the Tigrayans because of an attack on its forces. Calls for an attack against the group proliferated on the platform, many of which were linked to violent occurrences. According to a former Google data scientist, the situation was reminiscent of what occurred in Rwanda in 1994. In another case, the deaths of 150 persons and the arrest of 2000 others coincided with the protests that ensued following the assassination of activist Hachalu Hundessa after he had campaigned on Facebook for better treatment of the Oromo ethnic group. The incident led to a further increase in hate speech on the platform, including from several diasporic groups. Consequently, Facebook translated its community standards into Amharic and Oromo for the first time.

In light of ongoing conflicts in Ethiopia, Facebook labelled the country a first tier ‘at risk country’, among others like the USA, India, and Brazil. ARCs are at risk of platform discourse inciting offline violence. As a safeguard, war rooms are usually set up to monitor network activities in these countries. For developing countries like Ethiopia, such privileges are not extended by Facebook. In fact, although the Facebook platform can facilitate 110 languages, it only can review 70. At the end of 2021, Ethiopia had no misinformation or hate speech classifiers and had the lowest completion rate for user reports on the platform. User reports help Meta identify problematic content. The problem here was that the interfaces used for such reports lacked local language support.

Languages are only added when a situation becomes openly and obviously untenable, as was the case in Ethiopia. It usually takes Facebook at least one year to introduce the most basic automated tools. By 2022, amidst the outcry for better moderation in Ethiopia, Facebook partnered with local moderation companies PesaCheck and AFP Fact Check and began moderating content in the two languages; however, only five persons were deployed to scan content posted by the 7 million Ethiopian users. Facebook principally uses automation for analysing content in Ethiopia.

AI and low-resource languages

AI tools are principally used for automatic content moderation. The company claims Generative AI in the form of Large Language Models (LLMs) is the most scalable and best suited for network-based systems like Facebook. These LLMs are developed via natural language processing (NLP), which allows the models to read and write texts like humans do. According to Meta, whether a model is trained in one or more languages, such as XLM-R and Few-Shot Learner, they are used to moderate over 90% of content on its platform, including content in languages on which the models have not been trained.

These LLMs train on enormous amounts of data from one or more languages. They identify patterns from higher-resourced languages in a process termed cross-lingual transfer, and apply these patterns to lower-resourced languages, to identify and process harmful content. Languages with a resource gap are languages that do not have high-quality digitised data available to train models. However, one challenge with monolingual and multilingual models is that they have consistently missed the mark on analysing violent content appropriately in English. The situation has been worse for other languages, particularly in the case of low-resource languages like Amharic and other Ethiopian languages.

AI models and network-based systems have the following limitations :

They rely on machine-translated texts, which sometimes contain errors and lack nuance.
Network effects are complex for developers, so it is sometimes difficult to identify, diagnose, or fix the problem when models fail.
They cannot produce the same quality of work in all languages. One size does not fit all.
They fail to account for the psycho-social context of local-language speakers, especially in high-risk situations.
They cannot parse the peculiarities of a lingua franca and apply them to specific dialects.
Machine language (ML) models depend on previously-seen features, which makes them easy to evade as humans can couch meaning in various forms.
NLP tools require clear, consistent definitions of the type of speech to be identified. This is difficult to ascertain from policy debates around content moderation and social media mining.
ML models reflect the bias in their training data.
The highest-performing models accessible today only achieve between 70%-75% accuracy rates, meaning one in every four posts will likely be treated inaccurately. Accuracy in ML is also subjective, as the measurement varies from developer to developer.
ML tools used to make subjective predictions, like whether someone might become radicalised, can be impossible to validate.

According to Natasha Duarte and Emma Llansó of the Centre of Democracy and Technology,

Today’s tools for automating social media content analysis have limited ability to parse the nuanced meaning of human communication, or to detect the intent or motivation of the speaker… without proper safeguards these tools can facilitate overboard censorship and a biased enforcement of laws and of platforms’ terms of service.

In essence, given that existing LLM models are proven to be ineffective in analysing human language on Facebook, should tech giants like Facebook be allowed to enforce platform policies around their use for content moderation, there is a risk of stymying free speech as well as the leakage of these ill-informed policies into national and international legal frameworks. According to Duarte and Llansó, this may lead to human rights and liberties violations.

Human languages and hate speech detection

The use and spread of hate speech are taken seriously by UN countries, as evidenced by General Assembly resolution A/res/59/309. Effective analysis of human language requires that fundamental tenets responsible for language formation and use be considered. Except for some African languages not yet thoroughly studied, most human languages are categorised into six main families: Indo-European, from which we have European languages like English and Spanish, and North American, South American, and some Asian languages. The other categories are Sino-Tibetan, Niger-Congo, Afro-Asiatic, Austronesian and Trans-New Guinea. The Ethiopian languages Oromo, Somali, and Afar fall within the Cushitic and Omotic subcategories of the Afro-Asiatic family, whereas Amharic falls within the Semitic subgroup of that family.

This primary level of linguistic distinction is crucial to understanding the differences in language patterns, be they phonemic, phonetic, morphological, syntactic or semantic. These variations, however, are minimal when compared with the variations brought about by social context, mood, tone, audience, demographics, and environmental factors, to name a few. Analysing human language in an online setting like Facebook becomes particularly complex, given its mainly text-based nature and the moderator’s inability to observe non-linguistic cues.

Variations in language are even more complex in the case of hate speech, given the role played by factors like intense emotions. Davidson et al. (2017) describe hate speech as ‘speech that targets disadvantaged social groups in a manner that is potentially harmful to them, … and in a way that can promote violence or social disorder’. It intends to be derogatory, humiliate or insult. To add to the complexity, hate speech and extremism are also often difficult to distinguish from other types of speech, such as political activism and news reporting. Hate speech can also be mistaken for offensive words. And offensive words can be used in non-offensive contexts such as music lyrics, taunting or gaming. Other factors such as gender, audience, ethnicity and race also play a vital role in deciphering the meaning behind language.

On the level of dialectology, parlance, such as slang, can be used as offensive language or hate speech, depending partly on whether it is directed at someone or not. For instance, ‘life’s a bi*ch’ is considered offensive language for some models, but it can be considered hate speech when directed at a person. Yet, hate speech does not always contain offensive words. Consider the words of Dejene Assefa in the case mentioned above, ‘the war is with those you grew up with, your neighbour… If you can rid your forest of these thorns… victory will be yours’. Slurs also, whether offensive or not, can emit hate. ‘They are foreign filth’ (containing non-offensive wording used for hate speech) and ‘White people need those weapons to defend themselves from the subhuman trash these spicks unleash on us’ provide examples. Overall, hate speech reflects our subjective biases. For instance, people tend to label racist and homophobic language as hate speech but sexist language as merely offensive. This also has implications for analysing language accurately. Who is the analyst? And in terms of models, whose data was the model trained on?

The complexities mentioned above are further compounded when translating or interpreting between languages. The probability of transliteration (translating words on their phonemic level) increases with machine-enabled translations such as Google Translate. With translations, misunderstanding grows across language families, particularly when one language does not contain the vocabulary, characters, conceptions, or cultural traits associated with the other language, an occurrence referred to by machine-learning engineers as the UNK problem.

Yet, from all indications, Facebook and other tech giants will invariably continue to experiment with using one LLM to moderate all languages on their platforms. For instance, this year, Google announced that its new speech model will encompass the world’s 1000 most spoken languages. Innovators are also trying to develop models to bridge the gap between human language and LLMs. Lesan, a Berlin-based startup, built the first general machine translation service for Tigrinya. It partners with Tigrinya-speaking communities to scan texts and build custom character recognition tools, which can turn the texts into machine-readable forms. The company also partnered with the Distributed AI Research Institute (DAIR) to develop an open-source tool for identifying languages spoken in Ethiopia and detecting harmful speech in them.

Conclusion

In cases like that of Ethiopia, it is best first to understand the broader system and paradigm at play. The situation is typical of the pull and push typical of a globalised world where changes in the developed world wittingly or unwittingly create a pull on the rest of the world, drawing them into spaces where they subsequently realise they do not fit. It is from the consequent discomfort that the push emerges. What is now evident is that the developers of the technology and the powers that sanctioned its use globally did not anticipate the peculiarities of this use case. Unfortunately, this is not atypical of an industry that embraces agility as a modus operandi.

It is, therefore, more critical now than ever that international mechanisms and frameworks, including a multistakeholder, cross-disciplinary approach to decision-making, be inculcated in public and private sector technological innovations at the local level, particularly in the case of rapidly scalable solutions emerging from the Global North. It is also essential that tech giants be held responsible for the equitable distribution within and across countries with the resources needed for the optimal implementation of safety protocols concerning content moderation. To this end, it would serve Facebook and other tech giants well to partner with startups like Lesan.

It is imperative that a sufficient quantity of qualified persons with on-the-job mental health benefits be engaged. to deal with the specific issue of analysing human languages, which still have innumerable unknowns and unknown unknowns. The use of AI and network-based systems can only be as effective as the humans behind the technologies and processes. Moreover, Facebook users will continue to adapt their use of language. It is reckless to anticipate that these models would be able to adjust to or predict all human adaptive strategies. And even if these models eventually can do so, the present and interim impact, as seen in Ethiopia and other countries, is far too costly in human rights and lives.

Finally, linguistics, like all disciplines and languages, is still evolving. It is irresponsible, therefore, to pin any, let alone all, languages down to one model without the foreknowledge of dire consequences.