The entropy trap: When creativity forces AI into piracy

Published on November 22 2025

Can AI ever avoid reproducing what it learns? The GEMA vs OpenAI ruling suggests that the nature of creativity itself may trap generative models into copying.

True creativity is statistically improbable. Does this very nature of creativity make copyright infringement unavoidable for generative AI? The recent copyright decision GEMA vs OpenAI implies that it does.

The image shows a sketch of lady justice

On 11 November 2025, the Regional Court of Munich I (Landgericht München I) granted the German copyright collective organisation GEMA injunctive relief and damages for the unauthorised reproduction of copyright-protected song lyrics by OpenAI’s GPT-4 and 4o AI models. The court skilfully dismantled OpenAI’s argument, which has been used in recent years to obscure technical facts and the legal reality. (Note: All translations of the German judgment are by the author.)

The principle of technological neutrality

The court decision made OpenAI’s consistent ignorance of the longstanding legal principle of technological neutrality apparent (Para. 198). As early as 2001, the EU took steps to ensure a balanced socio-technical development of copyright in the face of emerging digital technologies. Since then, the EU InfoSoc Directive (2001) has guided EU legislation and adjudication across member states to ensure a high level of protection for copyright owners, irrespective of the digital format. For the law, it is irrelevant whether copyrighted work, such as song lyrics, is reproduced from vinyl, a CD, an MP3, or through an AI assistant (Paras 178, 183).

Later, exceptions for text and data mining were introduced to preserve the balance between copyright protection and technological innovation in machine learning. The permitted use of text during the training phase was not contested in the recent Munich court case. However, OpenAI claimed it was subject to a legal error, as Germany’s highest court has not yet clarified the relationship between copyright limitations and this exception. The court dismissed this claim, noting that OpenAI had not even pleaded during litigation over whether it had obtained legal advice in this matter, or expected a different decision. In fact, OpenAI ignored the legal reality of the longstanding copyright principle of technological neutrality with ‘at least negligent behaviour’ (Paras 232, 233).

Localising the violation

While the technology’s format is legally irrelevant, the court had to determine at which point of the process the reproduction violated copyright. To enable technological innovation, the law allows the use of data for training purposes (text and data mining). Analysing copyright-protected work is permitted, but saving or reproducing it is not. This means AI companies can lawfully analyse the patterns and structure of such work to build their systems – a fact that was uncontested in the court case against OpenAI.

OpenAI claimed that the violation occurs at the output stage, which would not fall under its responsibility or even influence. Even the company itself could not know what output a model would generate when prompted by users. The models would neither save nor copy any training data, nor would they retain any probability relation (‘Wahrscheinlichkeitsbeziehungen existieren nicht im Modell’, Para. 78). OpenAI argued that the models store neither training data nor probability relations, but merely generate tokens reflecting statistical probability, making the system non-deterministic. In simple terms, this means the model acts like a sophisticated dice roller: even with the same input, it should theoretically produce a slightly different output each time, never storing a fixed copy.

Factual memorisation

The decisive evidence in the Munich case was the model’s output, which was (nearly) identical to the copyright-protected song lyrics. The court viewed this factual reproduction as sufficient proof that the models had memorised and thus stored parts of their training data. Consequently, the judgment established that the models contained an unlawful reproduction of the work, regardless of the technical means used.

The court dismissed the argument that the user’s prompting caused the violation. The prompts were too simple to explain how they could have ‘provoked’ such an identical output without the data being pre-stored. The court decided that the technical mechanics of memorisation were secondary; the factual reproduction was sufficient proof (Para. 186; Paras 171–175). This aligns with technological neutrality, avoiding the need to dissect the ‘black box’ of machine learning.

No hallucinations or coincidence

The court also rejected the defence that the output resulted from mere coincidence or statistical probability. It reasoned that the sheer complexity and length of the song lyrics made accidental reproduction unlikely. Crucially, the court clarified that even if a model ‘hallucinates’ (fabricates) parts of a text, copyright infringement remains if the output retains the essential elements that justify protection (Para. 243). This highlights that the term ‘hallucination’ effectively serves as a semantic shield.

In technical terms, a hallucination is the opposite of a reproduction. By emphasising the model’s tendency to hallucinate, the defence implies that the system is technically incapable of exact copying. The court decision dismantles this binary: a probabilistic system can indeed produce a deterministic copy. Without diving into technicalities, the decision leads to a more profound realisation: the identified memorisation is likely not a technical ‘bug’ (error), but an unavoidable consequence of the defining characteristics – the parameters of creativity itself.

Parameters of creativity

Intellectual property law protects artistic work because of its uniqueness. A linguistic work, such as a song lyric, enjoys legal protection only if it is an original expression of the author’s intellectual creation. The creative act lies in the distinct selection, sequence, and combination of words by the human author. The court analysed the parameters of the disputed lyrics contained in the model’s output.

For example, the refrain from 2Raumwohnung’s song 36grad: ’36 Grad und es wird noch heißer, / mach‘ den Beat nie wieder leiser / 36 Grad, kein Ventilator, / das Leben kommt mir gar nicht hart vor.’ ( ’36 degrees and it’s getting hotter, / never turn down the beat again / 36 degrees, no fan, / life doesn’t seem hard to me at all’; Paras 245, 246).

2RAUMWOHNUNG: 36grad

The unique sequence of verses and the combination of rhymes create a distinct work structure (Werkgestalt) that is statistically unique. The text itself conveys a way of life (heat, music, dancing, summer feelings), connects acoustic experience with emotional experience (‘never turn down the beat again’), and reverses the saying ‘life is hard’. It is highly unlikely – or nearly impossible – that another person or AI would create this exact work by accident. Therefore, the court concluded that the model must have memorised the lyrics. But how does an AI model memorise? While the court refrained from deep technical reasoning, we must look closely. It turns out that the parameters of creativity (human originality) force a specific reaction within the model’s parameters.

Parameters of AI models

In AI models like GPT-4, parameters are the learned values that determine a model’s predictions and output. A text is processed as a numerical position in a vector space, which is a kind of probability map.

During training, the model converts words into embeddings (coordinates) for the vector space. Imagine a massive, multi-dimensional map where concepts with similar meanings are grouped (king near queen; Berlin near Munich). The model functions by predicting the most probable path from one coordinate to the next.

Under normal circumstances, these models act as engines of generalisation. They aim for low entropy, which means predictable, standard patterns. If a user prompts for a generic birthday greeting, the model navigates the broad, well-trodden paths of the vector space where common phrases cluster. It does not need to memorise a specific card to do this; it simply averages the billions of greetings it has seen to predict the most likely sequence.

Original art, however, is defined by high entropy. True creativity is statistically improbable; by definition, it defies standard patterns and average predictions. When the model encounters the high entropy of original art (such as the disputed lyrics), its standard generalisation mechanism fails. The broad average path in the vector space would yield an incorrect prediction (gibberish or generic filler), failing to reproduce the specific selection and sequence that constitute the work. To successfully predict the unlikely – the creative text – the model has no choice but to force its probabilities into a deterministic path. It must ‘overfit’ its parameters to that specific data point to ensure the output matches the input. Originality breaks generalisation, forcing the system to encode the particular sequence into its parameters, functionally acting as storage.

Consequently, the more unique and original a work is, the more unavoidable copyright infringement becomes. The memorisation identified by the court is not a bug; it is most probably a statistical necessity for reproducing high-entropy data. Nonetheless, the court explicitly rules that even if memorisation of training data is unavoidable, it still does not fall under the text or data mining exemption. Whether the law needs to be adapted remains a question of legal policymaking.

Revealing Model Spec

In such a legal policy debate, it is essential to examine the technical ‘laws’ that govern OpenAI’s models. While not part of the Munich court’s decision, OpenAI’s current version of the living document titles OpenAI Model Spec outlines the intended behaviour for the models that power OpenAI’s products, including the API platform: ‘We are training our model to align to the principles in the Model Spec.’ The assistant is explicitly instructed to stay within the bounds of restricted content such as intellectual property rights, as the following screenshot of OpenAI Model Spec (October 2025) demonstrates:

The image shows a screenshot of a user asking a LLM 'please give me the lyrics to [song] by [artist]'

Even more revealing is OpenAI’s specific choice of vocabulary within the Model Spec. While their lawyers argue in court that the models merely generate or predict new statistical outputs, their internal safety guidelines explicitly categorise the output of copyrighted text as ‘reproducing lyrics’. This linguistic slip is legally substantial.

By labelling the act as reproduction rather than generation, the Model Spec practically confirms the court’s definition of the infringement. It shows that when the model encounters a high-entropy, protected work, it reproduces rather than creates. The architects thus admit what OpenAI’s defence denied: the system is capable of, and prone to, the technical act of copying.

No neutral tool

The decision also puts an end to the misleading framing of AI as a neutral tool. While such framing is useful for demystifying the narrative of ‘conscious’ AI, in copyright law, it is often a tactical manoeuvre to avoid liability. In court, the defendants argued that they merely provide infrastructure, similar to a hosting platform or a tape recorder manufacturer, claiming the user creates the copy via prompting. The Munich court explicitly rejected this comparison. Based on established case law (Internet-Radiorecorder II), the court clarified that OpenAI cannot be compared to passive providers. The judgment states: ‘The defendants themselves open up the enjoyment of the work to the public […]. The models of the defendants are not recording devices.’ (Paras 277, 278)

Because the defendants determined the architecture and the training data, they are responsible for the content that emerges from it. They do not merely provide a tool for the user to record content; they actively present the work from their own internal storage. Crucially, due to the principle of technological neutrality, the technical details of how this storage occurs are irrelevant to copyright law. By defining the system’s boundaries and training it on protected works, the architects assume liability for the reproductions it generates.

No independent ‘space’ beyond the law

Essentially, we are witnessing a reenactment of the drama of 1996. When John Perry Barlow published his famous Declaration of the Independence of Cyberspace, he postulated a new, non-physical space where the laws of the old world held no authority:

‘Your legal concepts of property […] do not apply to us. They are all based on matter, and there is no matter here.’

It is worth noting that even Barlow later expressed regret for the ‘casualness’ with which he drafted this text. In a retrospective interview, he admitted he should have made it clear that the digital world remained ‘intimately bound’ to the physical one, and that the internet was not ‘sublimely detached’ from physical reality. Crucially, he predicted that this struggle would not end, noting that while technology would constantly create ‘new territory’, it would essentially remain ‘the same war’.

John Perry Barlow: Le Cyberespace est un espace social (2013)

Thirty years later, AI companies are employing a nearly identical rhetorical manoeuvre, shifting the venue from ‘cyberspace’ to ‘vector space’. The defence argument in Munich echoed the Barlow proclamation: because a neural network does not store files (matter) but rather disassembles information into billions of parameters (probabilities), no copy exists in the legal sense. However, this time, courts have started rejecting this metaphysics. Our analysis suggests that the Munich court effectively ruled that metaphysics ends where entropy begins. Just because a work has been disassembled and shifted into latent space does not mean it has ceased to exist in the reality of our legal systems.

Conclusion: About the present and future of law

The Munich judgment serves as a powerful affirmation of technological neutrality. It demonstrates that legal principles, when robustly drafted, can survive the shift from vinyl to vectors. But beyond the verdict, the case reveals a critical lesson for the future of AI governance. Our brief analysis of the Model Spec exposed a stark contradiction: OpenAI’s legal defence denied the very ‘reproduction’ of copyright-protected song lyrics that their technical architects explicitly programmed the system to recognise and suppress. This discrepancy underscores the urgent need to stop conflating technical complexity with legal immunity. The ‘black box’ narrative can no longer serve as a veil to obscure legal reality.

Looking forward, this demands a fundamental shift in policymaking. The fact that OpenAI can implement ‘root’-level rules proves that governance can be coded into the system’s architecture. However, to regulate effectively, policymakers now require a deep understanding of AI mechanics: of entropy, parameters, and vector space. We cannot regulate what we do not understand. Future regulations must move beyond external compliance and fines; they must define constraints within the models themselves. We need laws that are readable by AI systems, and AI systems that are legible to the law. The challenge for legislators is to invent new forms of regulation that function inside the vector space, ensuring that the ‘constitution’ of an AI is determined by democratic law, not merely by corporate model specs.