Empowering tomorrow’s leaders. Mission
Summary: In this article, Tetiana Kontariova explores how businesses and tech projects leveraging AI technologies can navigate key challenges — from the ownership of AI-generated works and the use of user-generated prompts and inputs, to the training of AI models and potential copyright infringements. She also offers practical strategies for responsible AI deployment that promote legal compliance and build user trust.
Associate
As generative AI tools like ChatGPT, Gemini, and open-source LLMs reshape how we create and consume content, they also raise urgent questions around copyright, data ownership, and intellectual property (IP). For AI startups, tech builders, Web2 and Web3 businesses alike, leveraging generative AI navigating these issues isn’t optional - it’s critical. In this article, Tetiana Kontariova unpacks the key legal risks of using and deploying AI-generated content, from the ownership of outputs and use of user-generated prompts, to the legality of training datasets. She also offers clear, actionable strategies to help technology, Web3 and AI-driven businesses stay compliant while building trust with users and regulators alike.
In a world driven by rapid technological progress, artificial intelligence (AI) and Large Language Models (LLMs) are evolving at an unprecedented pace. Their influence is spreading across every major industry, and Web3 is no exception. As AI becomes more advanced, it’s beginning to intersect with Web3 and the broader technology world, opening the door to smarter software, next-level protocols, more intuitive user experiences, and entirely new digital possibilities.
For instance, AI can now assist in creating complex graphics and designs, generating personalised NFT art, writing code, and even acting as a digital auditor by detecting software errors and vulnerabilities. On the analytical front, AI helps interpret the vast streams of data flowing through decentralized networks - enabling users to make better-informed decisions and fine-tune their on-chain strategies. As Sergey Ostrovskiy explores in his article, even more fascinating is the emergence of autonomous AI agents: bots that can generate thoughts, make independent decisions, and interact with others using their own blockchain wallets and social media accounts.
Thus, AI and technology, and especially Web3, are already transforming a wide range of industries - from finance to content creation and beyond. However, the ways in which these technologies acquire information and manage large datasets continue to raise important ethical and copyright concerns. Questions around data ownership, consent, and intellectual property remain at the forefront of the conversation, prompting the need for clear frameworks and responsible innovation as these technologies evolve.
Who really owns content created by AI? It’s a tricky question, sparking a plenty of debate. AI-generated content often involves a lot of contributors, such as developers, model trainers, end users, and of course, the AI systems themselves. But when it comes to the final result, it’s still unclear who - if anyone - actually owns it. If you’re working on building or maintaining generative AI tools, this question could directly affect your business model and its exposure to risks. In this article, we aim to look at the copyright question through your – business owner’s or founder’s – lens. In a field where innovation moves faster than regulation, knowing where you stand can make all the difference.
Basically, businesses, leveraging large language models (LLMs) and other AI tools into their products, should keep two key considerations in mind:
Based on our observations and experience, most businesses leveraging large language models (LLMs) and AI tools in their products, including industry leaders like OpenAI and Google’s Gemini, do not claim ownership over the outputs generated by their models. However, while ownership of outputs may not be a priority, businesses typically seek to retain the right to collect and use user-generated prompts and resulting outputs for the purposes of training and refining their models. At this stage, the question of intellectual property (IP) ownership becomes especially significant - not only from a legal standpoint, but also in the broader context of building user trust and establishing ethical data practices in the development of AI technologies.
Let’s focus on two primary categories of data commonly used for training AI models:
In most cases, copyright infringement in the AI context stems from using copyrighted materials in training datasets without proper authorisation, which can result in the generation of infringing outputs. To perform effectively and unlock their full potential, AI systems, especially large language models (LLMs), depend on access to vast and diverse datasets. This creates a tension between the need for rich training data and the legal boundaries of copyright protection.
First and foremost, it’s crucial to consider where your AI model sources training data. Typically, the AI-driven platforms use at least two primary data sources:
It’s worth noting that in some cases, the law permits the unlicensed use of copyrighted content under specific conditions. One notable example is the doctrine of “fair use” under the U.S. copyright law. In the lawsuit brought by The New York Times, OpenAI defended itself by arguing that training AI models on publicly available internet content qualifies as “fair use”. However, “fair use” doctrine is not absolute and relying on it requires a careful, case-by-case legal analysis. There are four key factors used to determine whether “fair use” applies:
In short, while the doctrine of fair use may provide legal protection in certain cases, AI-driven projects should approach it with caution and seek legal advice before relying on it for any purposes.
To sum up, it is crucial to recognise that the risk of copyright infringement becomes significantly higher when infringing content is repeatedly processed during training or generation, especially when there is no effective mechanism to trace, identify, or remove such content, even after a potential violation has been detected. Therefore, AI-driven projects should ensure that their training databases and practices comply with applicable copyright laws.
Launching an AI-driven product involves more than just technical innovation - it requires careful legal, ethical, and operational planning. Below are some essential considerations to keep in mind to ensure your product is not only effective, but also compliant and sustainable:
As the pace of AI innovation continues to accelerate, so do the legal, ethical, and compliance risks associated with how AI systems are trained, deployed, and commercialised. The blurred lines between human authorship, AI-generated content, and user-contributed data create unprecedented complexity — especially for Web3 projects, technology companies leveraging generative AI, and AI-native startups aiming to stay ahead of regulation while maintaining user trust.
Key takeaways for AI-driven projects include:
For founders, builders, and business leaders working with AI and Web3 technologies, legal compliance isn’t just a box to check - it’s part of a much broader strategic equation. Getting this right can protect not just your business - but the trust of your users, the integrity of your model, and the long-term viability of your innovation.
At Aurum, we’re here to support you at every stage - from strategic structuring to drafting the full suite of legal documentation. Our goal is to help you confidently navigate the complex and evolving legal landscape surrounding AI, ensuring your project is both innovative and compliant.
Associate partner
Junior Associate