David S. Kemp

What Your AI Forgets Mid-Sentence — And What to Do About It

2026-03-29T00:00:00.000Z

Syntheia published a useful piece this week on what they call "context rot" — the family of failures that occur when a large language model processes more text than it can reliably attend to. Their diagnosis is sharp: LLMs degrade silently on long documents, and the law firm's traditional quality-assurance architecture is not calibrated to catch the resulting errors. I agree with most of their analysis, but I want to take it further and offer solutions.

In this post, I explain the mechanics of context windows in terms aimed at the practicing lawyer, and then I propose concrete strategies to work within those constraints.

The context window, explained without jargon

Every LLM has a context window — the total amount of text it can hold in working memory for a single exchange. That window includes everything: the system instructions that tell the model how to behave, whatever documents you have uploaded or pasted in, the full history of your conversation, and the model's own response. All of it competes for the same finite space.

Context windows are measured in tokens, roughly three-quarters of a word in English. A "200,000-token context window" means roughly 150,000 words across all inputs combined, in a single conversation turn. That sounds enormous until you consider that a single commercial loan agreement can run 80,000 words and a due diligence data room can contain millions. For reference, the Claude system instruction alone — which is necessarily part of every conversation with Claude — can easily run to tens of thousands of tokens.

The critical point, and the one that most marketing materials omit, is that the advertised context window and the effective context window are not the same thing. NVIDIA's RULER benchmark tested models on the kind of complex reasoning tasks that legal work demands, and found that effective performance sits at roughly 50 to 65 percent of the advertised token limit. A model with a 200,000-token window performs reliably on about 100,000 to 130,000 tokens of actual input. The number on the box is not the number that governs your work.

How the degradation works

The research literature identifies several distinct failure modes. They are worth understanding individually, because each one suggests a different mitigation strategy.

Positional bias. The Stanford "Lost in the Middle" research (Liu et al., TACL 2024) demonstrated that LLMs attend most strongly to text at the beginning and end of their input. In multi-document question answering, accuracy dropped by roughly 30 percentage points — from approximately 75% to approximately 45% — when relevant information moved from the first position to the middle of the context. In a 200-page agreement, the provisions that matter most are rarely on page one or page 200.

Volume-dependent reasoning decay. Du et al. (2025) isolated an even more troubling finding: reasoning accuracy degrades as context length increases even when the model has perfect access to all relevant information. They tested this by padding relevant text with whitespace (minimally distracting filler that should not confuse the model) and observed performance drops of up to 85 percent. The sheer volume of input makes the model a worse reasoner, independent of whether the right answer is present.

Conversation history displacement. When a conversation exceeds the context window, something has to go. In most current implementations, including Anthropic's Claude and OpenAI's ChatGPT, the system preserves the system prompt and truncates the oldest conversation turns first. Some platforms summarize rather than drop the earlier exchanges, though that introduces its own fidelity problems. The practical result is the same: the model loses track of what you discussed earlier in the session. The analytical framework you established, the specific issues you flagged, the constraints you set three exchanges ago, all of it becomes inaccessible. In custom or middleware implementations, the system prompt itself may also be at risk, though the major providers now treat it as pinned content.

Compression artifacts. Summarizing a document before feeding it to the model, a common workaround for length limitations, introduces its own errors. Compression algorithms often strip language that appears formulaic or repetitive, but legal documents are dense with formulaic language that carries substantive weight. "Subject to," "notwithstanding the foregoing," "except as provided in Section K": these phrases distinguish an absolute obligation from a qualified one. Pagnoni et al. (NAACL 2021) found that over 80 percent of summaries produced by the neural models evaluated contained factual errors, concentrated precisely in conditional and qualifying language. Current models perform better on standard summarization benchmarks, but the specific vulnerability to legal qualifying language persists because it is structural. Compression algorithms are designed to remove redundancy, and legal qualifiers are designed to look redundant while doing essential work.

These failure modes share a symptom: the output looks complete. It is well-formatted, internally coherent, and confident. Nothing about it signals that a substantial portion of the source material was functionally ignored. That is what distinguishes context rot from the more familiar hallucination problem, and what makes it harder to catch in review.

What to do about it

What follows are concrete approaches, ordered from simplest to most involved, that any lawyer can implement today.

1. One task, one conversation

This is probably the single highest-value habit change available to a non-technical user. Every AI conversation accumulates context: your prior messages, the model's prior responses, uploaded documents, session instructions. As the conversation grows, the model's effective reasoning capacity shrinks. Old instructions interfere with current tasks. Prior assumptions bleed into new analysis. The context fills with material that was useful ten exchanges ago and is now dead weight, what researchers call context pollution.

The fix is simple: start a new conversation for each discrete task. Do not use the same session to summarize a lease, then draft a demand letter, then review an indemnification clause. Each of those deserves a clean context window, and starting a new conversation is free, while the accuracy cost of a polluted one is invisible until something goes wrong.

I call this the OTOC rule — one task, one conversation. That's not to discourage iterative prompting. Iterative refinement of a single work product is still a single task and is an effective use of an LLM. Revising a draft and then pivoting to an unrelated analysis in the same session is two tasks crammed into one window — increasing the risk of context rot.

2. Write a durable task specification

The OTOC rule creates a practical problem: if every task gets a fresh conversation, you lose the background context the model needs to do good work. The overarching objectives, the governing law, the deal structure, the specific issues you care about — all of that vanishes when you close the session.

The solution is to write a reusable task specification: a short document (a few hundred words is usually sufficient) that captures the stable context for a project. Think of it as a briefing memo for the model. It should include the matter description, the governing jurisdiction, the relevant parties, the specific analytical framework you want applied, and any constraints or preferences that should carry across sessions.

You paste this specification at the top of each new conversation, or, even better, preserve it as its own file to attach as input. The model reads it fresh every time, without the accumulated noise of prior exchanges. This is the complement to the OTOC rule: it lets you start clean without starting ignorant. Some tools (Anthropic's Claude Projects feature, for instance) let you attach persistent instructions to a project workspace that automatically prepopulate every conversation. If your platform supports it, use it.

3. Chunk your documents before the model reads them

If positional bias causes the model to lose track of middle-document content, and if volume alone degrades reasoning quality, then the logical response is to feed the model smaller, task-relevant segments rather than entire documents.

For a 200-page credit agreement, do not upload the entire file and ask the model to "review it." Instead, consider breaking the document into its component sections (representations and warranties, covenants, events of default, definitions, schedules) and submit each section in a separate conversation (applying the OTOC rule) with a targeted question. "Identify all financial covenants in the following section and flag any that use a trailing-twelve-month measurement period" will produce dramatically better results than "review this agreement and summarize the key terms."

One important caveat: legal documents are dense with internal cross-references (defined terms, conditions qualified by other sections, carve-outs incorporated by reference). When you chunk, you sever those links. The model analyzing the covenants will not know that a defined term in Article I changes the meaning of a financial ratio test, or that a carve-out in Schedule 3 qualifies an obligation in Section 12. The practical mitigation is to always include the definitions section (or at minimum the relevant defined terms) alongside whatever substantive section you are analyzing.

Manual chunking is labor-intensive, but the labor is front-loaded and predictable. It converts one unreliable pass over an entire document into multiple reliable passes over bounded sections. The lawyer stitches the analysis back together, which is the level at which human judgment should operate regardless of whether AI is involved. For high-stakes tasks, the benefit of minimizing AI errors through manual chunking far outweighs the burden.

4. Use chain-of-thought prompting to structure the model's reasoning

Chain-of-thought prompting means explicitly instructing the model to reason through intermediate steps before reaching a conclusion. Instead of asking "Does Section 7.2 conflict with Schedule B?", you ask: "First, extract the operative language of Section 7.2 and state its requirements. Then extract the relevant provisions of Schedule B. Then identify any inconsistencies between them. Then state your conclusion."

This matters for context management because it forces the model to surface the textual evidence it is relying on before it reasons over that evidence. If the model skips a provision, you will see the gap in the intermediate step, before it gets papered over by a confident-sounding conclusion. Du et al. (2025) found that a simple version of this approach, prompting the model to recite the retrieved evidence before solving the problem, mitigated much of the performance loss caused by long contexts. The technique works because it forces the model to move relevant information into a high-attention position (the most recent output) before it reasons about it.

For legal work, chain-of-thought prompting also functions as a transparency mechanism. A model that shows its intermediate reasoning produces work product that a supervising lawyer can actually verify, because the intermediate steps expose the gaps that a polished final conclusion would conceal.

5. Place critical information strategically

The "Lost in the Middle" research has a direct practical corollary: put the most important content where the model pays the most attention. That means the beginning and end of your input, not the middle.

If you are asking the model to analyze a specific clause in the context of a larger document section, place the target clause at the top of your prompt, followed by the surrounding context, and then restate the analytical question at the end. If you are using a task specification (Strategy 2), put it at the top. If you have specific instructions about format or analytical framework, repeat them at the bottom. The worst arrangement, and the one most people default to, is pasting a large document and then typing the question at the bottom, burying the analytical instructions in a low-attention position.

6. Verify in a separate conversation, not the one that produced the work

This follows directly from the OTOC rule. Generation and verification are different tasks, and they belong in different conversations.

When you ask the model to check its own work in the same session, the entire prior exchange sits in the context window: the assumptions, the omissions, the analytical choices the model made on its first pass. All of it exerts influence on the verification. A model reviewing its own conclusions is structurally biased toward confirming them, the equivalent of asking the same reviewer to read the same draft a second time and expecting fresh insight.

A de novo review in a fresh conversation eliminates that problem. Paste or upload the relevant source text and the model's output into a clean session. Ask: "Does this analysis accurately and completely reflect the source material? Identify every section of the source you relied on and quote the language supporting each conclusion." The new session has no prior commitments pulling it toward agreement. It is structurally analogous to the mid-level reviewing the junior's draft — fresh eyes on the same source.

A necessary warning: the model can fabricate quotations even in a clean session. It may generate text that looks like a verbatim extract but is actually a paraphrase, a conflation of multiple provisions, or an outright invention. The verification step itself requires verification — you must check the model's quoted language against the source document. That is additional work, but it is targeted work: instead of re-reading 200 pages looking for problems you do not know to expect, you are checking specific passages the model claims to have relied on. The de novo framing does not eliminate the need for human verification, but it gives you a structurally honest starting point for it.

The underlying principle

Every strategy above is a variation on a single idea: give the model less to think about, and tell it more precisely what to think about it. That runs against the grain of how most people use these tools. The natural instinct is to dump everything into the conversation and let the AI sort it out, and the marketing encourages exactly that — "upload your entire contract," "ask anything about your documents." The context window numbers are designed to suggest the model can handle it all.

It can, in the sense that it will produce output. What it cannot do — reliably, on long documents, under token pressure — is produce output accurate enough to stake a client's interests on. The strategies in this post are all ways of closing that gap: structuring the input so the model's actual capabilities match the demands of the task. The work is unglamorous — writing briefing documents for a machine, manually splitting PDFs, running the same analysis twice in separate sessions. But it maps directly onto skills lawyers already have. Scoping a task, preparing materials for review, verifying work product against source documents — these are not new professional obligations. They are existing ones, applied to a new tool.

This post draws on Liu et al., Lost in the Middle: How Language Models Use Long Contexts (TACL 2024); Du et al., Context Length Alone Hurts LLM Performance Despite Perfect Retrieval (EMNLP 2025); NVIDIA's RULER benchmark (2024); and Pagnoni et al., Understanding Factuality in Abstractive Summarization with FRANK (NAACL 2021). Anthropic's context window documentation and context management guidance informed the discussion of conversation history displacement. For context on the data-handling and compliance dimensions of AI tool selection, see prior entries in this series on consumer-versus-commercial data handling, API compliance architecture, and the duty to counsel clients about AI privilege risks.

You Probably Have a Duty to Warn Your Clients About ChatGPT

2026-03-27T00:00:00.000Z

I have written previously about what United States v. Heppner held and what it got wrong, and about why moving to an API does not, by itself, constitute a compliance strategy. This post turns to a different audience: not organizations choosing AI tools, but practicing lawyers whose clients are already using them.

The core question is straightforward. Heppner established — on reasoning I have criticized but that is now on the books — that a client who feeds privileged materials into a consumer AI platform may forfeit the privilege over those materials. That is now a known hazard. And when a known hazard exists that threatens the integrity of the attorney-client relationship, existing rules of professional conduct impose obligations on the lawyer — not just the client.

No ethics rule says "warn your client about ChatGPT." But the obligation to do something very close to that is already embedded in the structure of Model Rules 1.1, 1.4, and 1.6, and their state counterparts. Heppner did not create that duty, but it did make the duty impossible to ignore.

A brief recap of what Heppner did

I covered the decision in detail in this prior post, so I will keep this short. Bradley Heppner, a criminal defendant, used consumer Claude to analyze his legal exposure and develop defense theories after receiving a grand jury subpoena and learning he was a target of a federal investigation. He did this on his own, without his lawyers' knowledge or direction. Judge Rakoff of the S.D.N.Y. held the resulting documents were protected by neither the attorney-client privilege nor the work product doctrine — because Claude is not a lawyer, because Anthropic's consumer terms did not support a reasonable expectation of confidentiality, and because counsel had not directed the AI use.

Two things from the opinion matter for this post. First, Judge Rakoff observed that had counsel directed Heppner to use Claude, the tool "might arguably be said to have functioned in a manner akin to a highly trained professional who may act as a lawyer's agent within the protection of the attorney-client privilege" — a reference to the Kovel doctrine. That dictum rewards attorney supervision and penalizes its absence. Second, the privilege was lost in part because Heppner's lawyers never told him — one way or the other — anything about using AI tools in connection with his case.

The NYSBA's post-Heppner commentary drew the practical conclusion quickly: attorneys should "include robust disclaimers and warnings in engagement letters and email signatures alerting clients to the risks of using AI platforms in connection with their legal matters." That is a reasonable starting point. But I think the duty runs deeper than engagement-letter boilerplate, and that existing ethics rules already require it.

The rules that get you there

Three Model Rules, read together, create an affirmative obligation to advise clients about AI-related privilege risks — even though none of them mentions AI by name.

Competence: Rule 1.1

Model Rule 1.1 requires lawyers to provide competent representation, defined as "the legal knowledge, skill, thoroughness and preparation reasonably necessary for the representation." Since 2012, Comment 8 has specified that competence includes keeping "abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology." Forty states have now adopted this language or its equivalent.

After Heppner, the "relevant technology" a competent lawyer must understand includes consumer AI tools — not how to use them, but how they handle data and what the legal consequences of client use might be. A lawyer who does not know that consumer chatbot terms permit the provider to retain, train on, and disclose user inputs is missing knowledge that is now directly relevant to protecting the privilege. The duty of competence is not limited to a lawyer's own work product. It encompasses the "thoroughness and preparation" needed to protect the attorney-client relationship from erosion by foreseeable client conduct.

Communication: Rule 1.4

Model Rule 1.4(b) requires that a lawyer "explain a matter to the extent reasonably necessary to permit the client to make informed decisions regarding the representation." This is generally understood to encompass not just the substance of legal advice but the conditions under which the privilege protecting it might be forfeited. A client who does not know that pasting counsel's memorandum into ChatGPT may destroy the privilege over that memorandum has not been equipped to make an informed decision about managing privileged information.

The critical feature of Rule 1.4 is that it operates prospectively. The duty to communicate is a duty to give clients the information they need before they act — not a post-hoc damage-control obligation. After Heppner, the relevant information includes the fact that consumer AI use can waive the privilege.

Confidentiality: Rule 1.6

Model Rule 1.6(c) provides that a lawyer "shall make reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client." The operative word is "reasonable," and what counts as reasonable changes as risks become known.

State bars have interpreted this provision to require affirmative steps — not just reactive ones — when digital communications create confidentiality risks. The principle is not new; all that is new is the specific threat: a client's use of a consumer AI platform is precisely the kind of inadvertent disclosure that Rule 1.6(c) was designed to address.

The state-level picture

The ABA's Formal Opinion 512, issued in July 2024, was the first comprehensive ABA guidance on generative AI in legal practice. It addressed competence, confidentiality, communication, candor, supervisory duties, and fees — all through the lens of existing Model Rules applied to AI. Formal Opinion 512 focused primarily on a lawyer's own use of AI tools, but its analysis of the confidentiality obligations under Rules 1.6 and 1.4 applies with equal force when the risk comes from the client's conduct rather than the lawyer's.

The New York City Bar's Formal Opinion 2024-5 addressed generative AI in legal practice directly, and Formal Opinion 2025-6 extended the analysis to AI tools used to record and transcribe client conversations — a context in which the duty to counsel clients about confidentiality implications is made explicit. California's State Bar has published practical guidance on generative AI grounded in the same competence and confidentiality obligations.

None of these authorities squarely addresses the specific scenario Heppner presented: a client, acting on his own, feeding privileged materials into a consumer chatbot. But they establish the framework within which that scenario falls. If a lawyer has a duty of technological competence that includes understanding AI data handling, a duty to communicate information necessary for informed decisions about the representation, and a duty to take reasonable steps to prevent inadvertent disclosure — then the obligation to warn a client about the privilege risks of consumer AI use follows from the conjunction of all three.

What "reasonable" looks like

Not every representation carries the same risk. The obligation to advise clients about AI-related privilege risks should be calibrated — as professional duties always are — to the circumstances.

The nature of the matter. A client facing a federal investigation, complex litigation, or a regulatory proceeding is more likely to receive extensive privileged communications and more acutely harmed by their disclosure. In high-stakes representations, the duty to counsel clients about AI risks should be treated as near-mandatory and documented. Routine advisory work still carries the obligation, but its urgency is proportional to the exposure.

The sophistication of the client. Sophisticated institutional clients with in-house counsel may understand the risk without detailed instruction. Individual clients, small business owners, and people facing their first serious legal proceeding probably do not. Heppner illustrates the gap precisely: the defendant was fluent enough to use Claude effectively but apparently had no appreciation of the legal consequences. Technological fluency and legal sophistication are not the same thing, and lawyers should resist treating them as interchangeable.

The attorney's reasonable belief about client conduct. A lawyer who knows or should know that a client is likely to use AI tools in connection with the matter — because the client has mentioned doing so, because the client works in a tech-forward industry, or simply because generative AI has become most people's first tool for understanding complex documents — bears a heightened responsibility to address the risk explicitly. This is not speculative. Consumer AI adoption has reached the point where assuming a client will not use these tools requires more justification than assuming they will.

These factors interact. A sophisticated client in a high-stakes criminal matter presents a different risk profile than a sophisticated client in a routine transaction. An unsophisticated client in any matter of consequence probably requires explicit, plain-language AI counseling as a baseline.

The structural remedy worth considering

Warning clients not to use consumer AI to understand their legal matters is, as a practical matter, unlikely to be fully effective. The impulse that drove Heppner to Claude is deeply human: complex legal advice is hard to understand, and AI tools offer an immediately accessible way to work through it. Telling clients not to do something genuinely useful — without offering an alternative — is an instruction destined to be ignored.

The more constructive path is to give clients a safe way to do what they are going to do anyway. Enterprise-grade AI deployments — tools operating under commercial terms that contractually prohibit the provider from retaining or training on user inputs — can be configured within a firm-controlled environment with appropriate confidentiality protections. A client who uses a firm-provided, privilege-preserving AI tool to work through counsel's advice is in a fundamentally different position than a client who pastes that advice into a consumer chatbot governed by terms that reserve broad data-use rights.

Judge Rakoff's Kovel dictum points in this direction. The court distinguished between unsupervised client use of a public AI platform and a hypothetical in which counsel directed the AI use. A firm-provided, counsel-supervised AI environment — deployed under commercial terms, subject to confidentiality agreements, and offered as part of the representation — positions the tool more like the Kovel professional the court described than the public chatbot it rejected. The privilege analysis is not guaranteed, but the structural argument is considerably stronger.

This is not a small undertaking, and I do not suggest it is costless. But the alternative — relying on engagement-letter warnings while clients continue to use consumer AI tools unsupervised — is a posture that grows harder to defend as the risk becomes more widely known.

Where this leaves practicing lawyers

Heppner did not create a new professional obligation. What it did was train a spotlight on one that already existed. The duty of competence requires understanding how consumer AI tools handle data. The duty of communication requires informing clients about risks to the privilege before those risks materialize. The duty of confidentiality requires reasonable efforts to prevent inadvertent disclosure. Together, these rules establish an obligation — variable in its intensity, sensitive to context, but real — to advise clients about the privilege risks of consumer AI use.

This post draws on the ABA Model Rules of Professional Conduct, ABA Formal Opinion 512, the New York City Bar's Formal Opinions 2024-5 and 2025-6, the NYSBA's post-Heppner commentary, and Judge Rakoff's written opinion in United States v. Heppner. The California State Bar's Generative AI Practical Guidance provides additional state-level context. The consumer-versus-commercial data-handling comparison referenced throughout is detailed in a prior post.

The API Is Not a Compliance Strategy

2026-03-23T00:00:00.000Z

In my last post, I walked through the consumer-versus-commercial divide in how major LLM providers handle data — and why that divide carries real legal consequences after the Southern District of New York's decision in United States v. Heppner. The takeaway was that consumer AI products operate under terms that were not designed with legal privilege, confidentiality, or regulatory compliance in mind.

A reasonable follow-up question is: What about the API?

If the consumer chatbot is the problem, the thinking goes, then switching to API access should be the solution. And there is something to that. API tiers offered by OpenAI, Anthropic, and Google operate under fundamentally different data-handling regimes than their consumer counterparts — regimes that are, by almost every measure, more protective of user data. But "more protective" is not the same thing as "compliant," and the distinction matters more than many organizations seem to realize.

What the API actually changes

The previous post compared consumer and commercial tiers in detail for Anthropic's Claude. The same structural divide exists across providers, and the API sits squarely on the commercial side. Here is what that means in practice.

Anthropic's commercial API retains input and output logs for seven days — far shorter than the consumer tier's retention windows — and does not use customer content for model training. Enterprise accounts can negotiate Zero Data Retention, under which inputs and outputs are processed in real time and not stored at all. OpenAI's API retains data for 30 days for abuse monitoring but does not use it for model training, and offers Zero Data Retention for eligible endpoints. Google's Vertex AI operates under a Cloud Data Processing Addendum with contractually defined retention and no training use. In each case, the API provider acts as a data processor rather than a data controller, meaning the customer — not the provider — determines the purposes and means of processing.

These are meaningful differences. A consumer chatbot conversation may be retained for months or years, used to train future models, and governed by a privacy policy the user never read. An API call, properly configured, may leave no trace on the provider's systems at all. For anyone whose data-handling concerns begin and end with "I don't want my inputs in someone else's training set," the API is a substantial improvement.

But regulatory compliance does not begin and end there.

Why the API is not enough

Every major regulatory framework governing sensitive data — FERPA, HIPAA, state student-privacy laws, professional-conduct rules — imposes obligations that go well beyond what the API's data-handling defaults can address. The API solves one problem (provider-side data retention and training) while leaving most of the compliance architecture untouched.

Consider what a framework like HIPAA actually requires. A covered entity processing protected health information through an API must execute a Business Associate Agreement with the provider. That BAA must specify permissible uses and disclosures, require the provider to implement administrative, physical, and technical safeguards, and establish breach-notification obligations. The API's zero-retention default is a helpful technical control, but it does not substitute for the BAA itself. And the BAA, once signed, typically imposes configuration requirements — specific endpoints, disabled features, audit logging — that the organization must affirmatively implement and maintain.

FERPA presents a parallel structure. An educational institution using an API to process student education records must establish that the provider qualifies under the "school official" exception, which requires a written agreement specifying the provider's function, its relationship to the institution's use of the data, and the institution's direct control over the data's use. The API's default against training on customer data is necessary but not sufficient — the institution still needs the agreement, the access controls, and the governance to ensure that student records do not flow into the API in ways the agreement does not contemplate.

The pattern repeats across regulatory contexts. State biometric-privacy statutes require informed consent and retention schedules that no API default can satisfy. Professional-conduct rules governing lawyer confidentiality — sharpened considerably by Heppner — demand not just favorable vendor terms but documented due diligence, competence in evaluating the technology, and ongoing supervisory obligations. An API key does not discharge any of those duties.

The architectural gap

There is a subtler problem that the "just use the API" approach tends to obscure. When an organization integrates an LLM through an API, the API handles the model-inference layer: data goes in, a response comes back, and the provider's data-handling policies govern what happens on their end. But most real-world deployments involve considerably more than a single API call.

Data passes through preprocessing pipelines, prompt templates, logging systems, vector databases, retrieval-augmented generation stores, and output caches — all of which sit on the customer's side of the line. The API provider's zero-retention commitment says nothing about what happens in those layers. An organization can use a zero-retention API and still retain every input and output indefinitely in its own infrastructure, expose sensitive data through poorly secured retrieval stores, or inadvertently log protected information in application-level monitoring.

This is the architectural gap that a provider-side compliance posture cannot close. The API governs data handling at the model layer. Regulatory compliance governs data handling end to end.

What "more protective" actually means

None of this is an argument against using the API. The data-handling improvements are real, and for many use cases they represent the minimum viable starting point for responsible deployment. An organization that uses the consumer chatbot for work involving sensitive data has a serious problem. An organization that uses the API has a less serious problem — but it still has a problem if the API is the beginning and end of its compliance strategy.

The useful framing is not "consumer versus API" as a binary compliance decision. It is "API as a necessary but insufficient component of a compliance architecture." The API provides a defensible data-handling posture at the provider layer. Everything else — the agreements, the access controls, the internal data governance, the training, the monitoring, the documentation — remains the organization's responsibility.

For institutions and professionals operating under regulatory constraints, the practical question is not whether to use the API. It is whether you have built the rest of the compliance architecture around it — and whether you can demonstrate that you have if someone asks.

Provider-specific data-handling policies referenced in this post draw on the same sources cited in the previous post, supplemented by Anthropic's Privacy Center, OpenAI's API data usage documentation, and Google's Vertex AI data governance documentation. Compliance obligations vary by jurisdiction, regulatory framework, and organizational context. Consult qualified counsel for guidance specific to your situation.

Your AI Conversations Are Not Confidential — And a Federal Court Just Said So

2026-03-20T00:00:00.000Z

On February 10, 2026, Judge Jed Rakoff of the Southern District of New York ruled from the bench in United States v. Heppner that documents a criminal defendant generated using the consumer version of Anthropic's Claude were protected by neither the attorney-client privilege nor the work product doctrine. A week later, he issued a written opinion calling it a matter of "nationwide" first impression.

I think parts of the court's reasoning are wrong — or at least underdeveloped — in ways that matter. But the opinion landed on a real problem. Lawyers, clients, and judges are making consequential decisions about AI tools without fully understanding how those tools handle data. Heppner is worth examining less for the doctrine it announces than for the knowledge gap it reveals.

This post lays out what happened in Heppner, explains what I think the opinion gets right and wrong, and then walks through what Anthropic's data-handling policies actually say across Claude's consumer and commercial tiers — the very policies the court relied on but did not examine closely. The same structural divide exists across every major LLM provider, and the legal implications extend well beyond this one case.

What Heppner held

Bradley Heppner, the founder and former CEO of Beneficient, a financial services company, faces a five-count federal indictment for securities fraud, wire fraud, conspiracy, making false statements to auditors, and falsification of records — charges arising from an alleged scheme to defraud investors in the publicly traded company GWG Holdings through self-dealing transactions involving Beneficient. After receiving a grand jury subpoena and learning he was a target of the investigation, but before his November 2025 arrest, Heppner used the consumer version of Claude to analyze his legal exposure and develop defense theories. When federal agents executed a search warrant at his home, they seized numerous documents and electronic devices. Defense counsel later identified approximately thirty-one of the seized materials as AI-generated documents. The government moved for a ruling that the documents were not privileged; Heppner resisted, invoking attorney-client privilege and the work product doctrine.

Judge Rakoff rejected both claims on multiple grounds. On privilege, the court articulated three independent reasons for denial:

First, Claude is not an attorney. It has no law license, owes no fiduciary duties, and cannot form an attorney-client relationship. Privilege requires a "trusting human relationship" with "a licensed professional" — and an AI tool is not one.

Second, Heppner had no reasonable expectation of confidentiality. The court pointed to Anthropic's privacy policy, which disclosed that user inputs and outputs could be used for model training and disclosed to third parties, including government authorities.

Third — which the court acknowledged "perhaps presents a closer call" — Heppner did not communicate with Claude for the purpose of obtaining legal advice from an attorney. Claude's terms of service disclaim providing legal advice, and Heppner's lawyers neither directed nor supervised his use of the tool. The court noted that had counsel directed Heppner to use Claude, it might have "functioned in a manner akin to a highly trained professional" who could act within the privilege under the Kovel doctrine — but because Heppner acted on his own, the question was whether he intended to obtain legal advice from Claude, and Claude disclaims providing it.

On work product, defense counsel conceded that Heppner created the documents "of his own volition" and that the legal team "did not direct" him to use Claude. The court held that materials not prepared by or at the behest of counsel do not qualify as work product — expressly disagreeing with Shih v. Petal Card, Inc., 565 F. Supp. 3d 557 (S.D.N.Y. 2021), which recognized work product protection for a party's own litigation-preparation materials regardless of attorney direction.

Where I think the reasoning falters

The first and third grounds — no attorney-client relationship, no communication for the purpose of obtaining legal advice from an attorney — are each independently sufficient to defeat the privilege claim. An AI tool is not a lawyer, and Heppner was not seeking legal advice from an attorney when he typed queries into Claude. Full stop.

The work product holding is correct on these facts — defense counsel conceded that Heppner acted without direction — but the court's reasoning adopted a narrower view of the doctrine than the weight of authority supports. The traditional Second Circuit formulation protects "materials prepared by or at the behest of counsel in anticipation of litigation or for trial," but the civil analog, Fed. R. Civ. P. 26(b)(3)(A), protects materials prepared "by or for another party or its representative" — language broad enough to cover a party acting on its own initiative. The court's express rejection of Shih on this point signals that the question remains open, and future courts should not treat Heppner's narrow formulation as settled.

The confidentiality analysis in the second ground is where things get shaky, and it is the part of the opinion that has generated the most commentary — and the most anxiety.

Judge Rakoff treated Anthropic's consumer privacy policy as establishing that Heppner could have "no reasonable expectation of confidentiality" in his AI conversations. But the court's analysis has significant gaps. The opinion cited an archived version of Anthropic's privacy policy dated February 2025 — a version that predated the August 2025 consumer terms update giving users the ability to control model training. Because Heppner used Claude in 2025 before his November arrest, his conversations may have been governed by either the old or the new terms depending on when they occurred. The court never asked what version of the terms governed Heppner's use, whether he had opted out of training, or what his actual settings were. It treated the broadest possible reading of the consumer terms as conclusive without examining what the user actually agreed to or configured.

This matters because the confidentiality holding — which was not necessary to the result — is the part of the opinion most likely to be cited broadly. And it rests on an incomplete factual record. As the policy comparison below demonstrates, Anthropic's consumer terms create meaningfully different data-handling regimes depending on whether a user has opted in or out of model training. The court did not grapple with that distinction.

There is also a subtler problem. The opinion conflates a platform's contractual permission to use data with the practical likelihood that any human will ever see it. Consumer AI privacy policies reserve broad rights, but the actual probability of a specific conversation being reviewed by a person — absent a safety flag or legal process — is vanishingly low. Whether that distinction should matter for privilege purposes is a genuinely hard question. Heppner does not engage with it.

None of this means the opinion is unimportant. It is the first federal decision to address AI and privilege head-on, and it will shape how courts and litigants think about these issues going forward. But its broadest holding — that consumer AI use necessarily destroys confidentiality — rests on reasoning that future courts should scrutinize carefully.

What the case gets right: a knowledge problem

Where Heppner is most valuable is as a signal. Whatever one thinks of the doctrinal analysis, the case exposes a widespread failure to understand how consumer AI tools handle data. Heppner apparently did not know — or did not care — that his AI conversations were governed by terms that reserved broad data-use rights for the platform provider. His lawyers did not anticipate that their client's independent AI use would create a discovery problem. And the court itself did not dig into the specific settings or tier the defendant used.

This is not an isolated failure. Most lawyers I talk to cannot articulate the difference between a consumer and enterprise AI deployment. Most clients do not read privacy policies. And most courts have not yet had to think carefully about how AI data handling intersects with privilege doctrine.

Heppner should change that — not because its reasoning is airtight, but because it demonstrates what happens when no one in the room understands the technology well enough to ask the right questions.

What Anthropic's policies actually say

Since Heppner turned on Anthropic's terms, this is the right place to start. I went through Anthropic's published policies — the Consumer Terms of Service, the Commercial Terms of Service, the Privacy Policy, and the Privacy Center — to compare what Claude's consumer and commercial tiers actually promise. What follows is a synthesis of that research.

The core divide: consumer terms vs. commercial terms

Anthropic's policies split along two fundamental lines: Consumer Terms (Free, Pro, Max) and Commercial Terms (Team, Enterprise, API, Education, Government). This distinction — not the price paid — determines virtually every data right the user holds. The Commercial Terms state explicitly: "Services under these Terms are not for consumer use. Our consumer offerings (e.g., Claude.ai) are governed by our Consumer Terms of Service instead."

This means a Pro or Max subscriber paying $20 or $100 per month operates under the same legal framework as a free user. Paying more buys additional model access and features, but it does not change how Anthropic treats your data.

Model training: the sharpest divide

For Free, Pro, and Max users, Anthropic may use conversations to train its models. In August 2025, Anthropic updated its consumer terms to give users the ability to control whether their data would be used for model training. Existing users had until October 8, 2025, to accept the new terms and select their preference. The operative contractual language states that Anthropic may use user materials for model training "unless users opt out" — placing the default in Anthropic's favor — though Anthropic's own blog post announcing the change described it as "allowing users on Claude Free, Pro, and Max plans to opt-in for data usage," framing the default in the opposite direction. The tension between the legal text and the public announcement underscores the difficulty of determining any individual user's training status based on the terms alone. Opting out remains available through Claude's settings.

For Team, Enterprise, API, and Education/Government users, Anthropic contractually prohibits itself from training on customer content. The Commercial Terms are unambiguous: "Anthropic may not train models on Customer Content from Services" — with no exceptions and no reliance on user-level toggles.

Data retention: a 60× gap

Retention periods are directly tied to training status for consumer plans, creating a striking disparity:

Consumer users who have opted in to training (or failed to opt out) face retention of up to five years for de-identified conversation data. Consumer users who have opted out see their conversations retained for 30 days before deletion. In either case, content flagged for safety or policy violations can be retained for up to seven years, regardless of the user's training preference.

On the commercial side, API input and output logs are retained for seven days. Enterprise accounts default to 30 days, with the option to negotiate Zero Data Retention — under which inputs and outputs are processed in real time and not stored at all. No consumer plan, regardless of price, offers true zero retention.

Data ownership and IP

The Commercial Terms contain an unusually strong ownership clause absent from the consumer terms. They provide that the customer "retains all rights to its Inputs, and owns its Outputs," that "Anthropic disclaims any rights it receives to the Customer Content under these Terms," and that Anthropic "hereby assigns to Customer its right, title and interest (if any) in and to Outputs."

Consumer users have no equivalent contractual assignment. Under the consumer framework, Anthropic holds a license to use inputs and outputs for model improvement unless the user opts out.

Data controller vs. data processor

This distinction carries significant weight under GDPR and analogous privacy regimes. For consumer plans, Anthropic acts as the data controller — it determines the purposes and means of processing user data. For Enterprise and API accounts, Anthropic functions as a data processor operating under a Data Processing Addendum, with the commercial customer serving as the controller.

The practical consequence: a consumer user's data is governed by Anthropic's privacy choices. An enterprise customer's data is governed by the customer's own policies, with Anthropic acting under instruction.

Employee access and confidentiality

For consumer plans, Anthropic employees may access conversations only if the user explicitly consents via feedback, or if access is required for Usage Policy enforcement — in which case only the Trust & Safety team may view content on a need-to-know basis.

For commercial plans, customer content is contractually designated as Confidential Information under the Commercial Terms. Anthropic may use it only to exercise its rights under the contract and must protect it with at least the same care it applies to its own confidential information.

Two further protections — Zero Data Retention and HIPAA Business Associate Agreements — are available exclusively on commercial tiers. Under ZDR, inputs and outputs are not stored; the sole exception is User Safety classifier results retained for Usage Policy enforcement. A BAA imposes specific configuration requirements and excludes certain features (web search, for instance, falls outside BAA coverage). Neither protection is available on any consumer plan at any price point.

The comparison distills to a structural reality: consumer Claude users — whether free or paying $100 per month — operate under terms that allow Anthropic to train on their data by default, retain it for up to five years, and act as the data controller with broad discretion. Commercial Claude users operate under a contractual regime that prohibits model training, treats their content as confidential information, assigns them ownership of outputs, and offers zero-retention options.

The pattern holds across providers

Anthropic's tiered structure is not an outlier. OpenAI's ChatGPT follows the same pattern. On Free and Plus plans, OpenAI's Data Usage for Consumer Services FAQ states that it "may use" consumer content to improve its models unless the user disables training — while retaining the right to log interactions for safety and abuse monitoring regardless. On Edu and Enterprise plans, OpenAI commits not to train on business data, provides admin-controlled retention windows, and offers Zero Data Retention and configurable data residency.

The structural divide is the same: consumer terms grant the provider broad data-use rights with an opt-out toggle; commercial terms prohibit model training by contract and give the customer control over retention, residency, and access. Google's Gemini, Meta's Llama-based offerings, and other major LLM providers follow similar patterns. The consumer-versus-commercial distinction is an industry-wide architectural choice, not a quirk of any single provider.

This matters for the Heppner analysis because the court's reasoning — resting on the provider's privacy policy and terms of service — would apply with equal force to any consumer LLM deployment, not just Claude.

What this means going forward

Heppner will be cited for the proposition that consumer AI conversations are not confidential. That proposition is probably too broad as stated — it ignores user training preferences, conflates contractual permission with practical disclosure risk, and was not necessary to the holding. But it captures something real: consumer AI platforms operate under terms that were not designed with legal privilege in mind, and users who rely on those platforms for sensitive work are taking risks they may not understand.

The practical response is not to avoid AI tools. It is to understand what you are agreeing to when you use them — and to recognize that paying for a subscription does not, by itself, change the legal framework governing your data. For lawyers, that means learning the difference between consumer and commercial deployments and advising clients accordingly. For organizations, it means treating AI procurement as a legal risk question, not just an IT question. And for courts, it means doing the factual work that Heppner did not: examining the specific terms, settings, and tier a user actually employed before concluding that confidentiality has been waived.

The gap between consumer and commercial AI products is wide, it is well-documented, and it is consistent across every major provider. The problem is not that the information is unavailable. The problem is that almost nobody — lawyers, clients, and judges included — reads it.

The Anthropic policy comparison in this post draws on Anthropic's Consumer Terms of Service, Commercial Terms announcement, consumer terms and privacy policy update, and Privacy Center. OpenAI policy references draw on the Data Usage FAQ, platform documentation, and privacy policy.