Who Owns AI-Generated Content? Copyright, IP, and Legal Risks in the Age of Generative AI

Summary: In this article, Tetiana Kontariova explores how businesses and tech projects leveraging AI technologies can navigate key challenges — from the ownership of AI-generated works and the use of user-generated prompts and inputs, to the training of AI models and potential copyright infringements. She also offers practical strategies for responsible AI deployment that promote legal compliance and build user trust.

Authors:

Tatiana Kontariova

Associate

As generative AI tools like ChatGPT, Gemini, and open-source LLMs reshape how we create and consume content, they also raise urgent questions around copyright, data ownership, and intellectual property (IP). For AI startups, tech builders, Web2 and Web3 businesses alike, leveraging generative AI navigating these issues isn’t optional - it’s critical. In this article, Tetiana Kontariova unpacks the key legal risks of using and deploying AI-generated content, from the ownership of outputs and use of user-generated prompts, to the legality of training datasets. She also offers clear, actionable strategies to help technology, Web3 and AI-driven businesses stay compliant while building trust with users and regulators alike.

How Generative AI Is Transforming Tech & Web3 - and Raising Legal Questions

In a world driven by rapid technological progress, artificial intelligence (AI) and Large Language Models (LLMs) are evolving at an unprecedented pace. Their influence is spreading across every major industry, and Web3 is no exception. As AI becomes more advanced, it’s beginning to intersect with Web3 and the broader technology world, opening the door to smarter software, next-level protocols, more intuitive user experiences, and entirely new digital possibilities.

For instance, AI can now assist in creating complex graphics and designs, generating personalised NFT art, writing code, and even acting as a digital auditor by detecting software errors and vulnerabilities. On the analytical front, AI helps interpret the vast streams of data flowing through decentralized networks - enabling users to make better-informed decisions and fine-tune their on-chain strategies. As Sergey Ostrovskiy explores in his article, even more fascinating is the emergence of autonomous AI agents: bots that can generate thoughts, make independent decisions, and interact with others using their own blockchain wallets and social media accounts.

Thus, AI and technology, and especially Web3, are already transforming a wide range of industries - from finance to content creation and beyond. However, the ways in which these technologies acquire information and manage large datasets continue to raise important ethical and copyright concerns. Questions around data ownership, consent, and intellectual property remain at the forefront of the conversation, prompting the need for clear frameworks and responsible innovation as these technologies evolve.

Who Owns AI-Generated Content? Unpacking IP Rights in the Age of LLMs

Who really owns content created by AI? It’s a tricky question, sparking a plenty of debate. AI-generated content often involves a lot of contributors, such as developers, model trainers, end users, and of course, the AI systems themselves. But when it comes to the final result, it’s still unclear who - if anyone - actually owns it. If you’re working on building or maintaining generative AI tools, this question could directly affect your business model and its exposure to risks. In this article, we aim to look at the copyright question through your – business owner’s or founder’s – lens. In a field where innovation moves faster than regulation, knowing where you stand can make all the difference.

Basically, businesses, leveraging large language models (LLMs) and other AI tools into their products, should keep two key considerations in mind:

No Universal Rule on Ownership: There is no universally accepted legal position on who owns AI-generated content. The issue remains legally ambiguous and varies across jurisdictions. In most legal systems, copyright protection hinges on two key requirements: human authorship and a sufficient degree of originality. When content is produced entirely by AI, these criteria are often not satisfied. As a result, such outputs generally fall outside the scope of copyright protection and may be treated as part of the public domain, open for anyone to use. However, relying solely on this assumption seems a weak legal strategy, especially as the legal landscape continues to evolve. There may be cases where human-authored elements embedded within AI-generated content meet the threshold of originality and qualify for copyright protection.
Take a Proactive Approach: To mitigate uncertainty and reduce risk, especially when the law remains silent or offers no clear guidance, a more robust approach is to clearly address ownership issues through the documentation that governs the use of AI models, such as terms of use, licence agreements, usage policies, etc. Naturally, such provisions should be carefully worded and made subject to the applicable laws to ensure enforceability and compliance. This ensures transparency, aligns user expectations, and strengthens the legal foundation for responsible AI deployment.

Based on our observations and experience, most businesses leveraging large language models (LLMs) and AI tools in their products, including industry leaders like OpenAI and Google’s Gemini, do not claim ownership over the outputs generated by their models. However, while ownership of outputs may not be a priority, businesses typically seek to retain the right to collect and use user-generated prompts and resulting outputs for the purposes of training and refining their models. At this stage, the question of intellectual property (IP) ownership becomes especially significant - not only from a legal standpoint, but also in the broader context of building user trust and establishing ethical data practices in the development of AI technologies.

Let’s focus on two primary categories of data commonly used for training AI models:

User-Generated Prompts and Inputs: This category includes content provided by users when interacting with an AI tool. In most cases, AI projects rely on user warranties or confirmations, assuming that the user either owns the content or is authorized to use it. However, this reliance comes with risks, which we explore further below. To lawfully use user inputs for training purposes, users must explicitly grant you the respective rights - in legal terms, which means either providing a licence or assigning the ownership to this content. To be enforceable, these licence and assignment terms must be clearly drafted, legally sound, and compliant with applicable laws. Involving legal counsel at this stage is highly recommended.
AI-Generated Outputs: First, it is essential to determine who owns the output. If ownership is assigned to the user, then the same principle applies - the user must grant the respective licence for the use of any outputs in training. If, subject to applicable laws, the AI project retains ownership of the outputs, they may be freely used for any purpose. That said, this must be handled case by case, particularly where user inputs materially contribute to the generated output. To the extent that any user-generated input forms part of the AI-generated output, and the rights to that input were not, or cannot be, assigned, the use of that part of the output for training purposes may still require the user’s authorisation.

AI Training Data and Copyright: What You Can (and Can’t) Use

In most cases, copyright infringement in the AI context stems from using copyrighted materials in training datasets without proper authorisation, which can result in the generation of infringing outputs. To perform effectively and unlock their full potential, AI systems, especially large language models (LLMs), depend on access to vast and diverse datasets. This creates a tension between the need for rich training data and the legal boundaries of copyright protection.

First and foremost, it’s crucial to consider where your AI model sources training data. Typically, the AI-driven platforms use at least two primary data sources:

Publicly Available Content: It’s a common misconception that anything publicly accessible online is free to use for any purpose. However, tech giants, such as OpenAI, Google, and Meta, leveraged vast amounts of publicly available data to train their models. If the training data includes content freely available on the internet, this can raise serious legal issues when that content is protected by copyright and not licensed for such use. For instance, the New York Times sued Microsoft and OpenAI, alleging copyright infringement for using its content without permission to train AI models. While some content is shared under open source or other permissive licences, these come with specific terms - and not all of them allow for use in AI training. To avoid infringement risks, it’s essential to verify the licensing terms and, where in doubt, consult legal counsel to ensure the content can be lawfully used for training purposes.
User-generated Content: Another common source of training data is user-generated inputs, which presents a frequently overlooked legal risk in the development and deployment of AI technologies. Many businesses often attempt to shift liability for potential copyright infringements onto users. This is typically done by requiring users to warrant and confirm that they have the lawful right to submit content for specific usage, including AI model training. Yet, this method remains inherently unreliable. In practice, verifying the origin and authorship of user-submitted content is extremely challenging, and even infeasible. In other words, even when the user “grants” you the rights, you cannot verify whether they had the authority to grant those rights in the first place. As such, relying exclusively on user warranties without additional safeguards exposes businesses to potential liability for copyright infringement or misuse of copyrighted content.

Does “Fair Use” Protect Your Model from Infringement?

It’s worth noting that in some cases, the law permits the unlicensed use of copyrighted content under specific conditions. One notable example is the doctrine of “fair use” under the U.S. copyright law. In the lawsuit brought by The New York Times, OpenAI defended itself by arguing that training AI models on publicly available internet content qualifies as “fair use”. However, “fair use” doctrine is not absolute and relying on it requires a careful, case-by-case legal analysis. There are four key factors used to determine whether “fair use” applies:

Purpose and Character of the Use: Courts assess whether the use is commercial or for nonprofit educational purposes.
Nature of the Copyrighted Work: The distinction between factual and highly creative content matters. Fair use is more likely to apply to factual works than to expressive, creative ones.
Amount and Substantiality: How much of the original work is used, and how important is that portion? Even a small excerpt can weigh against fair use if it represents the core of the work. Transformative uses that add new expression or meaning are more likely to qualify as fair use.
Effect on the Market: Does the use harm the market value or potential licensing market for the original work? If a copyright owner can demonstrate that the use of their content for AI training undermines its commercial value or diminishes market demand, this will weigh heavily against a finding of fair use.

In short, while the doctrine of fair use may provide legal protection in certain cases, AI-driven projects should approach it with caution and seek legal advice before relying on it for any purposes.

To sum up, it is crucial to recognise that the risk of copyright infringement becomes significantly higher when infringing content is repeatedly processed during training or generation, especially when there is no effective mechanism to trace, identify, or remove such content, even after a potential violation has been detected. Therefore, AI-driven projects should ensure that their training databases and practices comply with applicable copyright laws.

Practical Legal Strategies for Deploying AI Tools Responsibly

Launching an AI-driven product involves more than just technical innovation - it requires careful legal, ethical, and operational planning. Below are some essential considerations to keep in mind to ensure your product is not only effective, but also compliant and sustainable:

Legal Strategy. Given the absence of a universal legal framework to clearly address key AI-related issues, such as ownership, privacy matters, AI-generated outputs usage for training purposes, etc., the guidance of legal experts is essential when structuring and launching AI-driven projects. There’s no simple answer to questions of ownership, data use, or copyright risk. Instead, what’s needed is a comprehensive, context-aware strategy that integrates legal safeguards into technical design, product development, and user policies from the outset.
User Documentation. Unfortunately, many businesses underestimate the risks associated with the absence of robust user-facing documentation. Well-drafted legal disclaimers, indemnity clauses, and clearly defined terms of use and usage policies are not just formalities - they provide a critical layer of legal protection and help allocate risk appropriately. Investing in comprehensive legal frameworks from the outset is essential to safeguard the business and ensure long-term compliance in the rapidly evolving AI landscape.
Ownership of User-Generated and AI-Generated Content. It is advisable to clearly define ownership rights over user-generated inputs and AI-generated outputs in the legal documentation from the outset, taking into account the project’s objectives, intended use cases, and the requirements of applicable laws. This ensures transparency, aligns user expectations, and strengthens the legal foundation for responsible AI deployment.
Technical Safeguards and Other Measures. Where technically feasible, it is advisable to implement robust technical safeguards and other measures that enable the verification of the origin of user-submitted content, facilitate effective and user-friendly mechanisms for reporting copyright infringements, and allow for the prompt removal of infringing material from training datasets upon request.

Final Thoughts and Key Takeaways

As the pace of AI innovation continues to accelerate, so do the legal, ethical, and compliance risks associated with how AI systems are trained, deployed, and commercialised. The blurred lines between human authorship, AI-generated content, and user-contributed data create unprecedented complexity — especially for Web3 projects, technology companies leveraging generative AI, and AI-native startups aiming to stay ahead of regulation while maintaining user trust.

Key takeaways for AI-driven projects include:

There’s no one-size-fits-all legal rule for AI content ownership - clarity must be built through carefully drafted terms and usage policies.
Using publicly available or user-generated data for training AI models involves real copyright risks, and relying solely on user warranties is not enough.
“Fair use” protections are limited and fact-specific - not a reliable fallback for large-scale AI training.
Proactive legal structuring, including clear licensing, documentation, and IP frameworks, is essential to building defensible, scalable AI products.

For founders, builders, and business leaders working with AI and Web3 technologies, legal compliance isn’t just a box to check - it’s part of a much broader strategic equation. Getting this right can protect not just your business - but the trust of your users, the integrity of your model, and the long-term viability of your innovation.

At Aurum, we’re here to support you at every stage - from strategic structuring to drafting the full suite of legal documentation. Our goal is to help you confidently navigate the complex and evolving legal landscape surrounding AI, ensuring your project is both innovative and compliant.