top of page

When AI Meets the DSAR: Opportunity, Risk, and the Uncomfortable Middle Ground

  • Apr 11
  • 9 min read

Updated: Apr 14

When AI Meets the DSAR: Opportunity, Risk, and the Uncomfortable Middle Ground
When AI Meets the DSAR: Opportunity, Risk, and the Uncomfortable Middle Ground

Asking your employer for a copy of "all my personal data" sounds simple enough. In practice, it's the compliance equivalent of asking someone to empty every drawer, folder, inbox, and backup tape in the building, then hand you a neatly organised package within a calendar month. For HR teams and Data Protection Officers, the Data Subject Access Request (DSAR) has become one of the most resource-intensive obligations under GDPR.


And the numbers are getting worse.


A DataGrail report noted a 246% increase in privacy requests between 2021 and 2023, while an EY Law survey found 60% of employers saw DSAR volumes rise in the past year alone. Our own research at Guardum (DPO Survey, 2023) puts the average cost of processing a single SAR for UK organisations at around £4,800. If you have a calculator handy, that means a mid-sized firm fielding 50 DSARs a year is looking at £240,000 in staff time alone, before you factor in the legal reviews, the IT support tickets, and the inevitable Friday afternoon where someone realises they forgot to check the archived Teams channels.


Consider what actually goes into it: employee data scattered across HR databases, payroll systems, email archives, Slack and Teams messages, performance review files, recruitment portals, and sometimes CCTV footage or building access logs. Structured data sitting alongside unstructured text. Multiple departments coordinating under a tight deadline.


It is, to borrow a phrase, like being asked to reassemble a shredded newspaper from six different recycling bins while someone stands behind you with a stopwatch.


So when agentic AI enters the conversation, the instinct is obvious. If an autonomous AI agent could log into these systems, run federated searches, retrieve relevant files, classify personal data, flag duplicates, suggest redactions, and compile everything into a dashboard, the weeks of manual effort might collapse into hours.


According to Deloitte, by 2027 half of companies using generative AI are expected to have pilot projects for agentic AI systems. The appeal is clear. The question is whether the appeal survives contact with reality.


What AI could actually do

I think there are several areas where AI, particularly LLM-driven agents, could genuinely transform DSAR processing.


The most immediate is data discovery and collection. An AI agent authorised to connect with enterprise systems could query the HR database, scan email archives, pull chat transcripts, and cross-reference project management tools, all from a single request. Modern LLMs have strong natural language search capabilities, and an agentic system can perform iterative searches: if initial results reference a project code, it searches that code across other systems to ensure nothing is missed. This alone could turn days of coordination into minutes.


Beyond discovery, there's classification and deduplication. DSAR responses often involve hundreds of files, many redundant or tangentially related. AI can help determine which documents genuinely contain the requester's personal data versus those where they're merely CC'd with no relevant information. It can spot duplicates stored across multiple systems and reduce the volume a human reviewer needs to examine.


Then there's redaction and anonymisation. Tools augmented by AI, especially around non-conforming patterns (as we're building at Contextul), could significantly improve the search and discovery of commercially sensitive information that traditional software struggles to interpret. This parallels how e-discovery tools already use AI to handle privileged information in legal cases.

And perhaps most ambitiously, an agentic AI system could handle workflow orchestration: upon receiving a DSAR, the agent plans the steps, verifies the requester's identity, notifies data owners to gather offline records, compiles digital records, and drafts the response letter. A well-designed agent could ensure no step is forgotten, which is precisely what humans are prone to in complex, multi-system processes.


Early case studies from vendors show DSAR response times dropping from weeks to hours after adopting AI-powered discovery tools. The grunt work gets automated. Human experts focus on oversight and judgment calls rather than inbox searches and PDF exports.


It sounds transformational. And in my view, it may well be. But here's where it gets uncomfortable.


Where it goes wrong

AI isn't a compliance officer. It's a pattern-recognition engine with the confidence of a barrister and the moral compass of a spreadsheet. That distinction matters enormously when you're handling personal data under regulatory obligation.


Misclassification is the first and most obvious risk. An AI might erroneously label a piece of data as "not personal" and exclude it, delivering an incomplete DSAR response. Or it pulls in data about someone with a similar name. Either way, the organisation is exposed: an incomplete response is a compliance violation, and sending someone else's data is a breach.


Hallucinations are the second, and in my honest opinion the most dangerous. LLMs sometimes generate text that is fluent, plausible, and entirely fabricated. In a DSAR context, imagine an AI summarising an employee's records and stating, "You were subject to a disciplinary action on March 12, 2022," when no such action exists. The employee sees this, panics, lodges a complaint. The organisation then has to explain that their AI invented a disciplinary record out of thin air. I will leave you to imagine how that conversation goes.


The EDPB-supported report on AI privacy risks (March 2025) notes that LLM-generated outputs may include inaccurate or sensitive information that leads to harm or misinformation. In compliance contexts, this is a live wire.


Data leakage is the third. If the AI tool uses cloud-based processing, personal data may be sent to a third-party provider and stored or tracked without appropriate safeguards. The EDPB report warns that retrieval or AI processing via external APIs could mean user queries and data are transmitted to third parties without knowledge or consent. In an HR setting, that could mean an employee's private information leaving the company's secure environment via an API call. Even within internal systems, an AI might present data to an unauthorised user if permissions aren't carefully configured.


And then there's the question of transparency. GDPR requires organisations to demonstrate how they searched for and compiled personal data. If a black-box AI says "here's what I found" without an audit trail, that accountability collapses. A data subject who challenges the completeness of their DSAR response is entitled to know what was searched, where, and how.


The deeper problems ahead

Beyond the immediate operational risks, I believe there are structural challenges that HR leaders and DPOs need to start thinking about now, before these systems become embedded.


Model memory and retention is one that worries me. LLMs and advanced agent systems often maintain conversation context, and some have forms of long-term memory or caching. The privacy risk here is worth pausing on: the AI stores sensitive personal data in its memory or logs beyond the necessary period. If an HR chatbot remembers details from a DSAR inquiry and that memory isn't wiped, another user interacting with the bot later could potentially access those details. The EDPB report explicitly flags that long-term storage of user data increases the risk of unauthorised access. Now, storage limitation is one of GDPR's core principles, and most privacy professionals could recite it in their sleep. But it was written for databases, not for systems that quietly remember things across conversations. AI's helpful "memory" feature can turn into a liability remarkably quickly if not tightly controlled.


Retrieval-Augmented Generation (RAG) risks are another. Many enterprise AI setups ground their answers by pulling information from internal knowledge bases. If that knowledge base contains sensitive personal data (which a DSAR system would, by definition), using RAG means that data is being accessed and potentially cached by the AI system. A poorly implemented retrieval component might grab more data than necessary, or pull records for a different person with similar keywords. The EDPB report cautions that using knowledge bases with personal data "without proper safeguards" is risky, and every integration point must be scrutinised for privacy compliance.


Security concentration is the one that should keep people up at night. An AI agent with access to all employee data for DSAR purposes is, in practice, a privileged user with root-level visibility across the entire estate. If that agent is compromised, manipulated, or misused, it becomes an insider threat capable of funnelling out vast quantities of personal information. Traditional systems at least required multiple steps and multiple people to get at everything. Giving a single AI agent the keys to every filing cabinet in the building and then hoping nobody picks its pocket is, I'd argue, a risk model that needs serious scrutiny before deployment.


Three scenarios worth considering

To ground all of this, consider three plausible scenarios.


Scenario 1: The Force Multiplier. ACME Corp receives a DSAR from a former employee. Their AI DSAR assistant crawls all internal systems in minutes: Workday profile, emails, chat logs, network drive files, and Jira entries. It presents HR with a categorised dashboard, duplicates removed, with a summary: "Found 2,340 emails, 120 documents, and 45 HR records. Common themes include performance evaluations, compensation, exit interview." The team reviews the findings, approves the AI's suggested redactions with minor tweaks, and delivers the response well within deadline. The DPO documented the process and kept the AI's activity logs. Work that used to take days was done in hours.


Scenario 2: The Cautionary Tale. DataFine LLC uses a generative AI tool to summarise performance feedback for a DSAR response. The prompt is too broad. The model mixes in fabricated feedback statements that sound plausible but were never actually written by anyone at DataFine. Worse, it confuses two employees with similar names and includes lines from the wrong person's review. The requesting employee receives the summary, spots comments they've never heard before ("Who called me 'lacking leadership skills'?! This is nowhere in my file."), and lodges a complaint. DataFine has inadvertently breached another employee's privacy and provided false information to the requester. Nobody was hacked. No policy was violated at the point of setup. The system did exactly what it was designed to do. Which is sort of the point.


Scenario 3: The Balanced Approach. TechSolutions Inc. uses an AI platform with a human-in-the-loop design. The AI agent collects data from various systems but operates in read-only mode and cannot send data externally or directly to the requester. An HR privacy specialist reviews the AI-collected dataset, applies contextual judgment the AI might miss, and double-checks every suggested redaction. The AI drafts a cover letter using a legal-approved template, which the specialist edits for tone and accuracy. The DPO approves the initial search parameters and reviews the final packet before release. The AI keeps a complete activity log documenting which systems were queried, when, and by whom. The DSAR is met with time to spare, and the process is controlled and documented throughout.


I think most organisations should be aiming squarely at Scenario 3. This is, co-incidentally, what DiscoveryManager™ will end up looking like.


What to do about it

So, assuming you'd rather avoid becoming the subject of your own case study, here's where I'd start.


Conduct a DPIA before deployment. Treat any AI implementation in DSAR processing as a potentially high-risk processing activity under GDPR Article 35. Evaluate how the AI will use personal data, what could go wrong, and document how you'll address those risks. Involve your DPO early.


Minimise data exposure. Only feed the AI what's necessary. Pseudonymise or anonymise where possible. Configure the platform to not store prompts or outputs beyond the session. If the AI uses a knowledge base, segregate and encrypt it.


Vet your vendors thoroughly. Look for on-premise or EU-based hosting. Read the fine print on whether the provider uses your data to improve their models. Include contractual clauses binding the AI provider to GDPR standards as a data processor. Regularly audit their compliance.

Implement access controls and audit trails. The AI should operate on least privilege with read-only access where feasible. Log every action. Maintain a risk register for the deployment. If regulators ask, you need to demonstrate you have a handle on the AI's operations and associated risks.


Keep humans in the loop. This is the big one. The final review of any DSAR package should be done by a person. Treat AI outputs as suggestions. Spot-check results against expectations. If the AI says it found 100 files and you expected 110, ask where the other ten went. For anything involving performance feedback, disciplinary records, or sensitive categories of data, verify every line. One can only marvel at how often this advice needs repeating.


Plan for errors. Develop an incident response plan specific to AI handling of personal data. Set thresholds for when to escalate. Treat incidents as opportunities to update your risk assessment. The EDPB report encourages continuous monitoring and review of risk controls, and I believe this iterative approach is the only sustainable one.


The uncomfortable bit

The role of HR and DPOs is going to change. Rather than manually executing every step of DSAR fulfilment, they'll increasingly design, supervise, and audit AI-driven processes. In my view, this is a good thing. The worry is when someone confuses "automated" with "sorted" and wanders off for a coffee while the AI marks its own homework.


The responsibility to uphold data privacy cannot be delegated entirely to machines. An AI agent that can autonomously gather and summarise personal data can just as easily misstep and cause a breach if not properly governed. The same power that makes it useful is precisely what makes it dangerous.


Organisations that get this right will invest in privacy-preserving AI design, build internal knowledge to configure these tools safely, and treat regulatory compliance as a continuous discipline rather than a one-off checkbox. Regulators are increasingly savvy about AI. Showing that your company adopted privacy by design in its AI deployments will matter in an audit or investigation.


The future of DSAR handling is a blend of AI-driven efficiency and human-driven integrity. Organisations that strike this balance will meet regulatory requirements and could set a benchmark for trust in the workplace. Those that don't...... well, Scenario 2 is always waiting.

If your organisation is exploring how to set up a safe, AI-enabled environment that puts privacy first, we'd welcome the conversation. We offer bespoke consultancy helping SMEs and enterprise firms handle data safely without compromising on the power and business benefits of agentic AI.


You can reach us at robert@contextul.io. Or visit privacymgr.com for a demo.


Comments


bottom of page