In a rapidly evolving digital landscape, artificial intelligence (AI) is influencing most, if not all, industries. For businesses developing, offering and using AI solutions, navigating copyright issues should be of critical concern. AI model developers are hungry for more data (Data Foundation report). Creative businesses are concerned about how their work is being used in AI systems, as highlighted in the reporting surrounding the UK Government's AI and Copyright consultation.
The position on copyright ownership and permissions surrounding both the input data (works used to train models) and outputs (the response received from the AI model) remains uncertain across jurisdictions as laws struggle to keep pace with innovation.
The UK government has recently concluded a consultation on this topic. The consultation aims to tackle the legal uncertainties surrounding AI-generated content and copyright and balance the interests of AI developers against those of the creative industries.
This article examines the consultation’s proposed position, and compares the UK’s approach with the EU, US and Japan in the areas of text and data mining and transparency obligations. In a following article, we will separately tackle the linked point of ownership of AI generated works, and ways to mitigate risk from the perspective of each stakeholder.
When considering copyright implications, it's essential to understand how AI systems actually process and use training data. In our full article, we unpack the legal issues in detail and suggest a middle-ground approach. Without careful alignment, overly strict local rules may stifle AI innovation — while failing to prevent the use of protected content abroad.
Read the full deep-dive and our proposed solution here.
Generative AI systems are inherently opaque: billions of decision points (neurons) can be involved to generate an output from a given input, meaning even the developers cannot fully explain how specific inputs yield particular outputs. The systems involve highly intricate, non-linear interactions across many layers, making it currently impossible to trace individual decision-making steps.
It is therefore usually impossible to determine from outputs alone whether copyright works have been used in training. The only way to be sure is if the training data is open-source (which is very unusual (Pythia, a set of models developed by Eleuther AI for research purposes is one example). This has led to transparency obligations being contemplated by each of the UK, EU and US, to differing extents. Each seeks to balance the priorities of building trust, promoting innovation, and protecting rightsholders' interests.
In the UK, copyright arises automatically when an individual creates an original work and records it in a tangible form. Subject to exceptions, the UK’s copyright law grants rights holders control over how their works are used, including stopping others from copying the whole or a substantial part of the work and prohibiting others from making adaptations without permission. EU and US laws hold a similar position, requiring the "author's own intellectual creation" and "original work of authorship", respectively, for copyright to arise. US copyright law differs from the UK and EU by requiring copyright registration for further protections to be afforded to the author.*
Below, we compare and contrast some of the key issues in respect of AI and the current and prospective regulatory approaches.
*A word of warning - the position in the US is complicated by Executive Orders by President Trump, which have the effect of unravelling policies put in place by the Biden administration and requiring an AI action plan to be in place by 22 July 2025 (six months from the Order) with a view to cement America's position in the world as a leader in AI.
The current law in the UK allows text and data mining for non-commercial use.
The UK consultation introduces a proposed exception allowing AI developers to use works to train their models unless rights holders explicitly reserve their rights. In other words, it is an opt-out system. This rights-reservation model appears heavily inspired by the EU’s system but, subject to the outcomes of the consultation, may require greater transparency measures than the EU have imposed.
An interesting response to the UK consultation from Day One argues that a restrictive UK copyright regime would not protect UK content creators from having their content used for training in other jurisdictions. Instead, it argues that the practical effect would only be to prevent UK-based companies from training AI models domestically (as has already happened with Stability AI). The same training would happen in other, less restrictive jurisdictions, and so there is a risk of hindering UK innovation without actually protecting creators' content from being used in AI training.
The US opts for a different approach. The fair use doctrine is a key part of U.S. copyright law that lets people use limited portions of a work subject to copyright protection without needing permission from the copyright owner. The effect is that the fair use exception must be reviewed on a case-by-case basis to determine if it can apply. This flexible approach may allow for AI training but reduces legal certainty, as seen in ongoing cases New York Times vs. OpenAI, where the New York Times is alleging that the use of their content for training AI models has infringed copyright. Even when these cases are decided, there is likely to be an underlying risk of litigation.
Japan has maintained a clear exception for both commercial and non-commercial data analysis and machine learning uses under Article 30-4 of the Japanese Copyright Act since 2009, which was broadened to include TDM in 2018. This framework effectively distinguishes between the training process and generated outputs, providing legal certainty while maintaining output-based copyright protections. The Day One report argues that the UK should follow this model.
The Japanese system differentiates between using works subject to copyright protection in the 'development & training stage' versus the 'generation and utilisation stage'. While training on publicly accessible data is permitted, Japan still applies copyright infringement standards to AI-generated outputs, with the exception under Article 30-4 not being permitted if "enjoyment" from the original works is derived or if the generated works 'unreasonably prejudiced 'the interests of the copyright owner'. Like in the UK, Japan has a thriving creative industries and AI sector, and this is how they have chosen to strike a balance, while succeeding to maintain EU data adequacy status.
Feature | UK | EU | US | Japan |
Current Framework | Permits TDM for non-commercial research. | TDM for research and commercial use is allowed under the Digital Single Market Directive, with an opt-out for rights holders. | Broad flexibility under fair use doctrine, subject to case-by-case judicial interpretation. | Allows broad rights to use works subject to copyright protection for information analysis, including AI training. |
Changes | Propose that commercial use be permitted under a rights-reservation mechanism. | The EU AI Act will bring about changes in a stepped fashion (timeline). | AI Action Plan to be published in July 2025. | None at this moment, unless the non-binding JCO Paper on the subject is converted into guidelines or regulations. |
As above, it is not possible to trace the steps of the AI model to get to the output from the input, leading to the need for transparency of datasets used to train models.
The Day One report raises significant practical concerns about enforcing transparency obligations under an opt-out model. Currently, there is no effective, at-scale solution for monitoring compliance beyond imposing restrictive database monitoring rules.
Enforcement would be extremely challenging, especially if the infringing AI developer is based outside of the UK. This could create a system that penalises law-abiding UK-based companies while doing little to prevent AI development in other jurisdictions using the same material, unless the UK government were willing to take an extremely hard line of blocking entry to the market without fully evidencing compliance.
AI developers are likely to resist extensive transparency requirements for several reasons. First, disclosing detailed information about training datasets could expose proprietary business information and potentially compromise competitive advantages. Second, for models already trained on vast datasets, retrospective transparency may be technically impossible. Finally, complying with different transparency regimes across jurisdictions creates significant operational complexity and cost.
UK | EU | US | Japan | |
Measures | The consultation proposes requiring AI developers to disclose datasets used for training to build trust and enable rights holders to enforce their rights. | The EU AI Act mandates “sufficiently detailed” summaries of training datasets. | California enacted the Assembly Bill 2013 requiring high-level dataset disclosures. However, this has an effective date of 1 January 2026, so may be affected by the Trump administration's AI Action Plan. | No transparency requirements at this time, although the JCO Paper does discuss future rules and guidelines concerning the use of copyright works. |
The UK consultation attempts to strike a balance between AI innovation and creator rights by proposing an EU-style opt-out model, but the global nature of AI raises questions about its effectiveness. As Japan’s long-standing TDM exception demonstrates, a more permissive approach - where training data is broadly accessible but usage is controlled - may be more practical, enforceable, and ultimately pro-growth.
For many creators, this may seem unfair. The instinctive response to AI’s rapid rise is to demand stronger protections, given that these models pose an existential threat to many content-based industries. However, the hard truth is that restricting AI training altogether will not stop AI models from being developed elsewhere - only UK businesses and researchers will be disadvantaged if the framework is too restrictive.
Instead, a three-pronged approach may offer a more workable middle ground:
1. Free-use of publicly available data for AI training, but with clear exceptions on use and restrictions on how AI-generated outputs are used. A bold approach from the UK government could be to provide a refutable presumption that publicly available data forms part of AI developers' datasets. AI developers could disclose datasets if it did not, thus removing the need for compulsory disclosures.
2. Compulsory licensing and remuneration for AI training on private datasets, ensuring that publishers and content creators are paid when their works are specifically targeted.
3. Strict enforcement of copyright infringement in AI-generated outputs, ensuring that models do not simply regurgitate existing works or enable mass-scale copyright violations.
The downside of this approach is that the risk of litigation is pushed onto the user of the AI model, who could unknowingly infringe copyright. However, this risk could be mitigated by performing checks on created works, negotiation with AI developers and through insurance. If executed well, this approach could mitigate legal uncertainty, reduce litigation, and ensure the UK remains a competitive hub for AI development. However, such an approach would show an undeniable shift in wealth from the creative industry to that of AI.