How to Evaluate AI Vendors for GDPR Compliance: A Procurement Checklist
---
How to Evaluate AI Vendors for GDPR Compliance: A Procurement Checklist
By Wingston Sharon | December 2024
Every week I see EU organizations adopt AI tools with less due diligence than they'd apply to a new payroll provider. There's something about "it's just an API" that causes procurement teams to skip steps they would never skip for a traditional SaaS vendor.
The GDPR obligations don't change because the vendor calls their product an AI tool. If personal data touches it, Article 28 applies. And the state of most AI vendors' DPAs in late 2024 ranges from inadequate to genuinely dangerous for EU organizations that need to demonstrate compliance.
This is the checklist I work through when evaluating AI vendors for EU clients. It won't make a non-compliant vendor compliant, but it will tell you what you're actually dealing with before you sign the contract.
Before You Start: Clarify What "Personal Data" Enters the System
This sounds obvious, but many organizations haven't thought it through carefully. Personal data doesn't just mean names and emails. Under GDPR Article 4, personal data is any information relating to an identified or identifiable natural person.
In the context of AI tools, personal data can enter through:
- Direct input โ users pasting customer emails, HR records, or support tickets into an AI assistant
- Integrated workflows โ AI tools connected to CRMs, ticketing systems, or HR platforms via API
- Document processing โ AI that processes contracts, invoices, or correspondence
- Conversational context โ support chatbots that discuss specific customer issues
- Code analysis tools โ if your code includes customer identifiers, email addresses in tests, or PII in configuration
If you can confirm that absolutely no personal data will ever reach the vendor's infrastructure, the checklist below becomes shorter. In practice, that guarantee is very difficult to maintain operationally over time, so I recommend completing the full checklist regardless.
The Checklist
1. Data Processing Agreement
What to ask: Does the vendor offer a standard DPA? Is it available before you sign the main contract?
What to look for:
- A DPA that explicitly covers AI inference โ i.e., the processing that happens when you send a prompt and receive a response. Many legacy DPAs were written before AI APIs existed and cover only data storage, not inference.
- The DPA should identify the vendor as a data processor (not a data controller) for the personal data you submit. Some AI vendors attempt to claim joint-controller status, which has significant implications for your obligations.
- Article 28 requires the DPA to specify the subject-matter, duration, nature, and purpose of the processing, the type of personal data, and the categories of data subjects. Check whether the vendor's DPA actually fills in these fields for AI inference use cases, or leaves them blank with placeholders.
Red flag: DPAs that only exist as clickwrap addenda with no negotiation possible and that don't mention AI inference specifically.
2. Data Residency
What to ask: Where does inference happen? Where is input data processed and temporarily stored?
What to look for:
- Physical location of inference infrastructure โ EU/EEA data centres are preferred for minimizing cross-border transfer risk.
- Whether the vendor offers an EU-only processing guarantee (not just "we have EU data centres" but "your data never leaves the EU region even for logging, monitoring, or support").
- EU-based alternatives exist: Mistral AI processes in France, several smaller model serving companies operate entirely within the EU.
Honest note: "EU data centre" is not the same as "EU-sovereign." If the company is US-domiciled, CLOUD Act can reach data in EU data centres. Residency and sovereignty are distinct questions.
3. Model Training on Your Data
What to ask: Is the data I submit used to train or fine-tune models? By default? Can I opt out?
What to look for:
- Explicit opt-out (or opt-in) controls for training data usage
- Whether the default setting for API customers differs from consumer product users โ it often does, but don't assume
- How "training" is defined โ some vendors distinguish between base model training, RLHF, and fine-tuning, with different policies for each
The GDPR dimension: If personal data is used to train a model, you need a lawful basis for that processing. Legitimate interests or consent are the typical candidates, but neither is straightforward. Several EU data protection authorities have issued guidance on this โ the Italian DPA's 2023 investigation of ChatGPT and the Irish DPC's OpenAI investigation are instructive. This is an active area of enforcement, not settled law.
Red flag: Vendors who say training data usage is covered by their terms of service without a specific lawful basis analysis for GDPR purposes.
4. Subprocessor Chain
What to ask: Who are the vendor's subprocessors? Are they listed? How are you notified of changes?
What to look for:
- A published subprocessor list (ideally a URL that's contractually incorporated into the DPA)
- Notification period for new subprocessors โ GDPR Article 28(2) requires your prior specific or general authorization for subprocessors. General authorization is fine, but you must be notified of changes with enough time to object.
- Whether the AI model itself is a subprocessor. If you are using, say, a document processing SaaS that internally calls the OpenAI API, OpenAI is a subprocessor. Is OpenAI listed? What is the transfer mechanism for that subprocessing?
Practical reality: In 2024, the majority of AI-powered SaaS products use OpenAI, Anthropic, or Google as their underlying model provider. This means those companies are subprocessors of your vendor, and your data flows to them under a transfer mechanism you need to verify.
5. Inference Data Retention
What to ask: How long is my inference data โ prompts and completions โ retained? For what purposes?
What to look for:
- Clear retention periods for inference logs
- Distinction between retention for operational purposes (debugging, abuse detection) vs. retention for product improvement
- Whether you can request deletion of inference data
- Whether retention differs between free tiers and paid API access
Why this matters: If inference data containing personal data is retained for 30 days, and your data subjects exercise the right to erasure under GDPR Article 17, you need to be able to operationalize that erasure request through to the vendor's retained inference logs. Can the vendor support this? Most cannot today.
6. The Right to Erasure Problem
This deserves its own section because it is genuinely unresolved.
If personal data enters an LLM โ either through inference (the model processes it during a request) or through fine-tuning (the model is trained on it) โ how do you exercise erasure rights?
For inference-only processing: Erasure means deleting retained logs. This is technically possible if the vendor maintains per-customer logs and supports deletion. Many do not.
For training data: This is much harder. Once a model is trained on data, the parameters of the model reflect that training. There is no clean technical mechanism to remove a specific individual's data from a trained model's weights. The field of "machine unlearning" is an active research area but is not production-ready.
What to document: If you use an AI vendor for use cases that might involve personal data, document your erasure limitation in your ROPA (Records of Processing Activities) and in your DPIA if required. Do not promise data subjects erasure you cannot deliver.
7. Cross-Border Transfer Mechanism
What to ask: If data is processed outside the EU/EEA, what is the legal transfer mechanism?
What to look for:
- Standard Contractual Clauses (SCCs) โ the 2021 EU SCCs are the current standard
- Transfer Impact Assessments (TIAs) โ since Schrems II, SCCs alone are insufficient without a TIA showing that the destination country's law does not undermine the protections
- For US vendors: the EU-US Data Privacy Framework (DPF), which took effect in 2023, provides a mechanism but faces ongoing legal challenges. Treat DPF as a current valid mechanism with the caveat that it may be invalidated (as Safe Harbor and Privacy Shield were before it)
Red flag: Vendors who cite "Privacy Shield" โ that was invalidated in 2020. If a vendor's DPA still references Privacy Shield as a transfer mechanism, their legal team has not kept up with the last four years of data protection law.
Key Questions to Send to Vendors
When you contact a vendor's procurement team, these are the questions that will separate vendors who take compliance seriously from those who will send you a generic response:
- "Please provide your current DPA and subprocessor list."
- "Does your DPA specifically cover AI inference processing of personal data?"
- "By default, is API inference data used for model training? If yes, how do I opt out?"
- "Where does inference processing physically occur? Can you guarantee EU-only processing?"
- "What is your retention period for inference logs, and can these be deleted on request?"
- "What is your transfer mechanism for any processing outside the EU/EEA?"
- "How do you operationalize a GDPR Article 17 erasure request for data that appeared in inference logs?"
The quality of the responses to questions 6 and 7 in particular will tell you a lot. A vendor with a mature compliance program will have clear answers. A vendor who is not ready for EU enterprise procurement will be vague or route you back to their generic terms.
The Honest Reality
Most AI vendor DPAs I reviewed in 2024 are inadequate for strict GDPR Article 28 compliance. This does not necessarily mean you cannot use these vendors โ it means you need to:
- Document the compliance gaps you accepted and why
- Implement compensating controls where possible (data minimization โ don't send more personal data than necessary; pseudonymization before sending where feasible)
- Include the known gaps in your ROPA and any applicable DPIA
- Monitor the vendor's compliance posture โ this is an area of active development, and vendors who were inadequate in early 2024 may have improved by mid-2025
If you are in a regulated industry โ financial services, healthcare, critical infrastructure โ the bar for acceptable residual risk is higher, and the checklist above is a floor, not a ceiling.
If you are working through an AI procurement evaluation and want to talk through what you're finding, reach out at hello@agentosaurus.com.
Build This Infrastructure?
We help AI teams build sovereign GPU clouds and autonomous systems. Free 30-minute consultation. Fixed-price projects from โฌ5K.
Schedule Free Consultation