GDPR Compliance for AI: Privacy Strategies for Traditional Business and Web3 Projects

Summary: In this article, Sofiia Shmyhol explores how AI businesses and Web3 projects leveraging AI can align with applicable data protection laws, including GDPR requirements that are widely regarded as a benchmark in the global regulatory landscape. We discuss challenges around personal data use, transparency, and data minimisation, and offer practical strategies for responsible AI development that builds trust and meets evolving regulatory standards.

Authors:

Sofiia Shmyhol

Junior Associate

Artificial Intelligence (AI) is no longer confined to the realm of futuristic fantasy — it has become an integral part of our digital lives. From voice assistants like Apple’s Siri and Amazon’s Alexa to recommendation algorithms on Netflix and Spotify. At the same time, beyond traditional business, Web3 ecosystem is also increasingly integrating AI-driven technologies and solutions to enhance user experiences, ranging from personalised content feeds and identity verification to governance automation, making GDPR compliance for AI critical.

These systems rely heavily on data. Particularly, machine learning (ML) and deep learning models require large datasets to identify patterns, make predictions, and drive decision-making. But this raises a critical question: what kind of data fuels these intelligent systems? More often than not, it’s personal data — information relating to identified or identifiable individuals. This includes personal data in AI: both direct identifiers (like names, email addresses, biometric data) and indirect identifiers (such as IP addresses, browsing behavior, or geolocation).

While AI can deliver immense value, its dependence on personal data has far-reaching implications for privacy, ethics, and regulation. The Stanford Institute for Human-Centered AI highlights that AI models increasingly infer traits beyond the scope of originally collected data. These inferences may be invisible to users and often lack transparency or user consent, heightening the risks for non-compliance with data protection laws. For projects targeting the European market, this particularly includes compliance with the EU’s General Data Protection Regulation (GDPR), which imposes strict standards for the handling and protection of personal data.

AI thrives on large, rich, and diverse datasets. Companies like Google and Meta gather data from user interactions, clicks, searches, and conversations. These data points feed into AI models that personalise content, ads, and services. AI-powered advertising tools can deduce political preferences or emotional states by analysing user interaction patterns, often without individuals realising how their data is being used.

Web3 projects add new layers of complexity. Although blockchain technology promotes transparency and decentralisation, it also introduces pseudonymity — where users interact through wallet addresses rather than real names. However, AI can still pierce this layer of anonymity by analysing blockchain behavior. For example:

Cross-Chain Activity Analysis: AI models trained on multi-chain datasets can correlate wallet addresses across blockchains. By examining token swaps, bridge activity, or staking behaviors, these models may deduce a user’s investment strategy, net worth range, or affiliations with particular communities.
Blockchain Gaming: In decentralised virtual environments, AI systems monitor in-game behavior, asset transactions, and social interactions. This allows for the creation of detailed user avatars that reflect personal interests, risk tolerance, or even psychological traits based on gameplay patterns.

These cases show that AI can undermine the very anonymity that Web3 promises. Simply by analysing patterns, AI models can de-pseudonymise users, revealing private attributes and compromising privacy expectations.

GDPR compliance for AI refers to ensuring that AI systems handle personal data in ways that align with the European Union’s General Data Protection Regulation, including principles like data minimisation, lawful processing, and transparency.

While privacy and data protection laws across jurisdictions generally share core principles, such as lawfulness, fairness, transparency, and accountability, GDPR provides one of the most detailed and influential frameworks. It is widely regarded as a benchmark in the global regulatory landscape, which is why we focus on its standards and requirements in this article. Though it predates many modern AI breakthroughs, GDPR principles remain highly relevant in regulating how personal data is collected, processed, and stored. AI-driven initiatives must ensure compliance not only with fundamental privacy principles, but also with the specific rights afforded to individuals under the applicable data protection laws. Integrating these standards into AI development and operations is essential to foster user trust, uphold privacy, and build long-term resilience in an evolving regulatory landscape.

Choosing the Right Legal Basis for AI Data Processing

In general, several lawful bases may justify the processing of personal data — including consent, contractual necessity, and legitimate interests — which are among the most commonly relied upon by Web3 and AI projects. Choosing the lawful basis for AI under GDPR is crucial for any AI project, especially as the ways AI systems use data may evolve over time. Projects can generally choose among these, depending on the context:

Consent, when users explicitly agree to the processing. While clear and user-oriented, consent can be withdrawn at any time. This poses serious challenges for AI systems that integrate personal data into training models, especially if that data cannot be fully removed. Continued use after withdrawal may violate the applicable data protection laws and result in serious legal and financial consequences, including regulatory penalties.
Contractual necessity, when processing is needed to perform certain undertakings or deliver a service users have signed up for. This can be a more appropriate basis for data processing than consent, but only applies when data use is strictly necessary for contract performance, such as, for illustration purposes, providing users with specific functionality or services. If personal data is later reused for model training or analytics beyond agreed undertakings or service delivery, this basis can no longer be relied upon.
Legitimate interest, when processing is justifiably necessary to support business operations, achieve specific objectives, or protect stakeholders’ rights. This basis is often used for purposes like improving and training AI models, analysing services and system performance, enhancing the user interface, etc. However, users have the right to object to processing based on a legitimate interest. In this case, unless the project can demonstrate a strong justification for further data processing that clearly outweighs the individual’s rights the use of their data must stop. The evaluation of whether compelling grounds justify further processing must be conducted on a case-by-case basis, considering the nature and sensitivity of the personal data involved. Justifications can be particularly difficult to establish when AI models process sensitive personal data. In such cases, reasons like maintaining system security or avoiding functional disruptions may be insufficient. Instead, a more substantial and broadly justified rationale is typically required — one that goes beyond protecting business interests and demonstrates a clear, overriding benefit in the public interest or the rights of other stakeholders.

Balancing AI Efficiency with Data Minimisation and Purpose Limitation

These principles are foundational to the GDPR and are frequently reflected in the data protection laws of other jurisdictions as well, but, meanwhile, conflict with the data-intensive nature of AI. The data minimisation principle mandates collecting only the data necessary for a specific purpose, while purpose limitation requires that data only be used for that defined reason.

Yet, AI systems thrive on broad, diverse datasets. For instance, an AI designed to personalise content might request not just your browsing history but also social connections, transaction data, and sentiment analysis. This can result in expanded use of data beyond its original purpose, where information is repurposed without proper justification.

For example, a DeFi protocol could start analysing wallet interactions to detect fraud. Later, the same data might be used to assign credit scores or gate access to liquidity pools. That’s why it is essential to clearly inform individuals whose data is being processed about all intended purposes at the time of collection. Moreover, any changes to the defined purposes must be communicated promptly. Doing so enables individuals to make informed decisions about the use of their data and to exercise their rights appropriately.

The Challenge of Data Erasure (Right to be Forgotten)

Individuals can request the deletion of their personal data under certain conditions (e.g., if the data is no longer necessary for the purpose it was collected). Once personal data is used to train an AI model, it may be embedded in the model parameters, making it challenging to isolate and delete.

Moreover, Web3’s premise of immutability complicates the usual privacy measures. Personal data, once recorded on a blockchain, cannot be trivially altered or deleted. If the data becomes part of an AI training dataset, data minimisation and user-initiated erasure can become particularly difficult to implement.

We previously explored potential strategies for addressing privacy challenges in blockchain context in the article "Blockchain VS Privacy: Are We on the Cusp of a Harmonious Coexistence or Still Facing an Impasse?", which may also be applicable to AI-driven solutions. To address these challenges, several approaches are being explored and developed:

Machine unlearning: Research is ongoing into techniques that allow selective data removal from trained models. A recent survey categorises unlearning algorithms into two main types: exact unlearning, which aims to completely remove the influence of specific data points (e.g., SISA training, MCMC unlearning), and approximate unlearning, which seeks to efficiently approximate this effect without full retraining (e.g., data partitioning).
Off-chain data storage: Sensitive information is stored off-chain with on-chain pointers or hashes.
Zero-knowledge proofs: Allow computations to occur without revealing underlying data, enhancing privacy while retaining functionality.

Projects should retain personal data only for as long as it is necessary for the intended purpose. Once an AI model has been trained, it may no longer be essential to maintain the full training dataset in its original, identifiable form. Implementing secure archival or anonymisation measures can significantly reduce both privacy risks and legal liabilities. In addition, role-based access controls should be enforced, ensuring that only authorised personnel with a specific, legitimate purpose can access personal data. These practices are critical in mitigating the risk of internal misuse and safeguarding against external data breaches.

AI systems are often described as "black boxes" because their inner workings can be complex and difficult to understand. However, for instance, under the GDPR, transparency is not optional. Individuals have the right to know what personal data is being collected, the purpose behind its use, and how automated decisions are made. This includes access to meaningful information about the logic, significance, and potential consequences of any automated processing.

AI projects should proactively document and disclose these details, ensuring that users can easily understand how their personal data is handled. Providing this transparency not only supports compliance but also builds greater trust and confidence in the project.

Using Personal Data to Train AI: Legal Risks and Best Practices

AI systems are predominantly trained to optimise predictive performance, personalisation, and operational efficiency. However, these training purposes can create significant tensions with data protection laws. The expansive nature of AI training, requiring large and diverse datasets, often risks conflicting with core privacy principles such as data minimisation and purpose limitation as discussed above.

For instance, the GDPR requires that personal data be collected and used only for specified, explicit, and legitimate purposes. However, AI models may embed personal data into their internal parameters during training, complicating efforts to ensure that subsequent uses of the model remain aligned with those original purposes — potentially leading to purpose creep and non-compliance with data protection laws.

Sometimes, businesses overlook the importance of disclosing their use of personal data for training AI models — a critical mistake. Using personal data for training purposes without explicitly informing users can undermine trust and violate core privacy principles, such as transparency, purpose limitation, and fairness. Therefore, projects using personal data for AI training should clearly communicate this purpose to individuals at the time of data collection, along with the applicable lawful basis — such as legitimate interest, when the aim is to improve and enhance the performance of AI models. In addition, emerging technical solutions — such as machine unlearning (as discussed above) and differential privacy, a mathematical framework that protects privacy by adding controlled noise to data analysis — offer promising pathways for enhancing compliance and reducing the exposure of personal data within AI systems.

As AI systems grow more sophisticated, so do the threats they face. The OECD highlights the importance of ensuring that AI systems are resilient to both current and emerging risks. AI-specific vulnerabilities, such as adversarial attacks — where small, carefully crafted changes to input data can mislead models into making incorrect predictions — underscore the need for strong cybersecurity measures. These protections must continuously evolve alongside AI technologies to safeguard both the integrity of the models and the privacy of individuals.

As AI technologies continue to advance worldwide, the need for interoperable and AI-specific regulations is becoming increasingly urgent. International organisations such as the OECD and the European Commission are actively developing common principles and frameworks to guide the responsible use of AI across jurisdictions. International cooperation is gaining momentum, with efforts underway to:

Develop AI-specific legislation;
Encourage cross-border regulatory collaboration;
Promote privacy-enhancing technologies (PETs);
Standardise algorithmic impact assessments.

For Web3 projects, the regulatory landscape is even more complex. Many protocols are governed by decentralised autonomous organisations (DAOs) composed of token holders dispersed across multiple jurisdictions. This raises significant questions about which legal standards apply and how compliance can be ensured in a decentralised, transnational context. These challenges further highlight the urgency of global coordination and harmonised AI governance to enable innovation while safeguarding individual rights and societal values.

Artificial Intelligence has the power to improve decisions, personalise user experiences, and make processes more efficient. In both Web2 and Web3 spaces, AI helps power smart search tools, automated support, decentralised voting, and financial predictions. But with this power comes responsibility, especially when it comes to protecting users' personal data.

Instead of seeing data protection laws as obstacles or burdensome requirements, projects can use them as valuable guides for building responsible and trustworthy services and products. Adhering to core principles such as transparency, data minimisation, and user control not only ensures legal compliance — it also strengthens user trust, enhances credibility, and contributes to more robust and resilient projects. Therefore, when designing and deploying AI-driven solutions, it is advisable to consider the following general recommendations: -Create Multi-Disciplinary Teams: Foster close collaboration between legal, technical, and operational teams to ensure compliance and accountability throughout the AI system’s lifecycle. Legal experts should be involved from the outset to draft compliant policies and contracts, assess regulatory risks, and provide ongoing guidance. At the same time, machine learning engineers and data scientists must understand the technical aspects of model training and be capable of implementing privacy-by-design solutions. Integrating these perspectives early and continuously helps bridge the gap between legal requirements and technical implementation.

Ongoing Monitoring, Auditing, and Overseeing: AI systems are not static, they often evolve with new data, model updates, or shifting business objectives. Periodic reviews and audits — technical, legal, and ethical — can identify areas of non-compliance or emergent risks. Structured governance frameworks that maintain version histories, document changes in model architectures, and track data usage can ensure projects remain prepared for regulatory oversight.
External Standards and Guidance: Many international organisations, including the OECD, the European Commission, and national data protection authorities, regularly update their guidance on AI and data protection. By proactively engaging with these standards, projects can stay abreast of evolving interpretations of privacy obligations, best practices for AI transparency, and the latest privacy-enhancing technologies.

As AI continues to grow, success will come to those who combine technical innovation with care for human values. By building privacy into their systems from the start and working across different teams, projects can create AI that’s not just smart, but also fair, safe, and trusted, and create GDPR-compliant, privacy-preserving AI systems both in traditional business and Web3.