Prompt Injection: Are Legal-tech Investigations Safe During The AI Boom?

Article Insights

Linda Sheehan’s articles from ENS are most popular:

within Technology topic(s)
in United States
with readers working within the Healthcare, Media & Information and Law Firm industries

ENS are most popular:

within Technology, Accounting and Audit and Insurance topic(s)
with Senior Company Executives and HR

On 18 March 2026, Oasis Security published 'Claudy Day'. A vulnerability chain in Anthropic's Claude AI. Unlike attacks on proprietary in-house tools requiring custom exploits, Claudy Day used only Claude's built-in capabilities to silently exfiltrate user conversation history. They noted basic sessions were vulnerable and enterprise configurations faced higher risk.

In another recent incident, Codewall, an artificial intelligence ("AI") security startup that deploys autonomous AI agents to stress-test corporate infrastructure, reportedly breached a global consulting firm’s internal AI platform in under two hours.

In both cases, the prompt injection vulnerabilities were responsibly disclosed and patched before public announcement. These incidents emphasise a sobering truth: even sophisticated organisations with substantial security investment remain vulnerable to AI-expedited attacks.

AI is now embedded in legal operations, compliance and investigations. Large Language Models (“LLMs”) process high volumes of sensitive content, including emails, documents and legal summaries. Their strengths also create vulnerabilities. Security researchers consistently rank prompt injection among the top ten risks for modern AI deployments, with the Open Worldwide Application Security Project ("OWASP") listing it as the number one threat in its Top 10 for LLM Applications.

What is prompt injection?

Prompt injection is a social-engineering attack on an LLM in which an attacker causes the model to ignore its original instructions and execute malicious commands instead. The goal is often to extract private enterprise data or manipulate outputs. Common forms include direct injection (overriding live system prompts), indirect injection (embedding malicious instructions in training data), and stored injection (planting persistent instructions in application memory).

Why this matters for legal operations, compliance and investigation teams

When using AI, legal teams must manage risks such as hallucinations, bias, prompt manipulation and model-driven data exfiltration. In South Africa, AI-related breaches may trigger obligations under the Protection of Personal Information Act (“POPIA”) and could constitute offences under the Cybercrimes Act. Courts, regulators and other third parties increasingly address the challenges of AI-assisted workflows handling sensitive and privileged material in day-to-day legal work.

Prompt injection poses a concrete threat to the sensitive material that legal teams handle daily. The Codewall incident underscores the following key points:

Escalating risk of deploying AI without securing critical endpoints: Codewall's AI security research revealed basic security vulnerabilities that had persisted for over two years. Codewall's offensive AI agent autonomously selected the consultancy firm as a target and exploited Application Programming Interface ("API") vulnerabilities and Structured Query Language ("SQL") injection flaws.
Access to intellectual property, personal data and more: The breach allegedly exposed 46.5 million internal chat messages, more than 728,000 files (including PDFs, Excel and PowerPoint documents), 57,000 user accounts and 95 system prompts, all of which were read and writable.
System prompts as "Crown Jewel" assets: Codewall's agent gained access to the AI tool's system prompts - the instructions that defined how it answered questions, what guardrails it followed, how it cited sources, and what it refused to do.

A successful prompt injection can directly undermine the security, accuracy and defensibility of in-house legal workflows. The consequences extend beyond data breach triggers: a tainted review could compromise litigation, trigger regulatory sanction or waive privilege protection.

Once a poisoned prompt is injected, it could instruct the model to:

Misclassify documents by excluding key evidence, fabricating timelines, or ignoring bad actor behaviour, resulting in flawed case assessments and missed liability exposures.
Taint privilege reviews by failing to identify legally privileged communications, inadvertently disclosing attorney-client advice to opposing counsel or regulators and potentially waiving privilege over entire subject matters.
Extract confidential information from integrated systems (client data, deal terms, settlement figures, litigation positions) for exfiltration to third parties or use in insider trading and competitive intelligence.
Manipulate communications sent externally or silently delete critical files and audit trails, potentially constituting obstruction of justice or professional conduct violations.

Real-world vulnerability

Early LLM users and cybersecurity researchers have attempted to manipulate ChatGPT in numerous ways to leak confidential and proprietary information. In one widely reported incident, a user convinced ChatGPT to reveal Windows activation keys through "emotional" manipulation, claiming his grandmother used to recite them as lullabies. While the example may seem whimsical, it illustrates a serious vulnerability: LLMs can be socially engineered to bypass their safeguards.

In December 2025, OpenAI acknowledged that prompt injection attacks against its ChatGPT Atlas (search feature) browser may never be fully solved. This stark acknowledgment from one of the world's leading AI developers confirms that prompt injection must be treated as a persistent operational risk requiring layered defences, not a problem with a single technical fix.

Practical guidance for legal teams

The speed of AI adoption in legal work creates a paradox: the same integrations that boost productivity also expand the attack surface. As AI tools gain access to more data sources (emails, documents, matter management systems, regulatory databases) each new connection becomes a potential entry point for malicious instructions. The solution is not to abandon AI, but to deploy it with appropriate safeguards.

Below are practical steps that legal teams can take to protect their workflows.
When selecting or deploying AI-enabled legal technology, ensure the platform incorporates robust defensive features. Key technical safeguards to look for include:

Input Validation – The use of prompt templates, delimiters and sanitation allow for input control, separation of user input from system instructions, isolation of data and stripping input data to remove executable code and hidden text.
Advanced Filtering – Technologies and techniques such as prompt shields (real-time input detection of hidden instructions), spotlighting (distinguishing between trusted and untrusted instructions), and context-aware filters allow for further protection against prompt injections and execution of malicious instructions.

Beyond vendor selection, legal teams must establish operational protocols that reduce exposure to prompt injection risks.

Operational safeguards for legal workflows

Privilege-safe configurations are foundational. When deploying AI tools, apply role-based access controls and data minimisation to carefully define which data sets the models can access. Scope access to what is strictly necessary for the matter at hand, limiting damage if an injection succeeds. Maintain immutable logs of prompts, tool calls and outputs to support audit, anomaly detection and incident response. These logs may prove critical when a court or regulator requires demonstrable defensibility.
People remain the critical line of defence. Train staff to verify sources, treat unfamiliar files with caution, and simulate attacks to surface weaknesses proactively. Everyone using AI tools should know how to recognise suspicious model behaviour, when to escalate, and how to preserve relevant logs for investigation. Regular tabletop exercises help ensure these protocols become second nature.
Constrain what AI tools can do by establishing clear usage policies. Define permissible actions for each workflow and prohibit blanket capabilities such as unrestricted file exports or bulk email forwarding. Ensure that users understand these boundaries. A single malicious instruction could escalate into a catastrophic, machine-executed outcome that unfolds before anyone notices.

Technical controls and compliance frameworks

Given the stochastic nature of generative AI, no single control can fully prevent prompt injection. OWASP and other authorities recommend a layered, defence in depth architecture combining input/output controls, access restrictions, monitoring, and system hardening:

Input and output controls. Validation and sanitisation to filter adversarial patterns, structured prompts separating system instructions from user data, and output monitoring to detect prompt injection artefacts such as system-prompt leakage or policy violations.
Access controls and human oversight. Least privilege access ensuring LLM applications operate with minimum permissions, human-in-the-loop controls for high-risk actions such as executing code or accessing sensitive datasets, and segregation of external content so RAG pipelines cannot override trusted instructions.
Monitoring and testing (before Codewall does it for you). Adversarial testing and red team exercises against known attack patterns, behavioural and anomaly monitoring to detect unusual instruction structures or API call patterns that standard filters may miss.
System hardening. Containment focused design with sandboxed execution environments and validated outputs, system prompt protection using read-only databases, comprehensive audit logging, and regular security posture evaluation prior to AI deployment.
Compliance frameworks such as NIST AI RMF 1.0, ISO/IEC 42001:2023 and GDPR Article 32 increasingly require organisations to address prompt injection risks through formal policies, risk assessments and appropriate technical safeguards. In South Africa, POPIA section 19 requires "appropriate, reasonable technical and organisational measures" for AI systems processing personal information, whilst the

Cybercrimes Act criminalises unlawful data interception (section 5) and cyber extortion (section 10).
Legal teams should work with information security and compliance functions to ensure these requirements are reflected in vendor contracts, internal policies and audit programmes.

Ethical deployment and governance considerations

Organisations should be transparent about AI limitations, ensure human oversight of sensitive workflows and conduct proportionality assessments. The responsible use of AI in legal contexts demands not only technical safeguards but also governance frameworks that preserve professional accountability and employee and public trust. Board-level accountability for IT governance remains critical. Deploying AI without appropriate oversight creates legal, reputational, commercial and financial risk.

Conclusion: Security maturity must keep pace with AI adoption

AI adoption in legal operations should not be halted. The efficiency gains are significant and the competitive pressures real. But security maturity must accelerate to match the pace of deployment.

With disciplined architecture, least-privilege access, strict behavioural boundaries and vigilant monitoring, organisations can harness the power of LLMs while keeping attackers at bay.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]