№ 002 · · 6 min read

When Your Training Data Becomes Your Liability

The privacy debt hiding inside every fine-tuned model

The pitch is always the same. We have proprietary data. We fine-tune a model on it. The model becomes better than anything our competitors can buy. Competitive moat, locked in.

The pitch is not wrong. Fine-tuning on domain-specific data does produce better models for domain-specific tasks. I have seen it work. I have helped build systems where it worked.

What the pitch leaves out is the liability.

What a model actually remembers

Large language models do not store data the way a database does. They do not have a table called customer_records that you can query or delete. What they have is a set of weights — billions of floating-point numbers — that encode statistical patterns learned from training data.

The problem is that these patterns are not perfectly abstract. Under the right conditions, a model can be prompted to reproduce specific training examples. This is called memorisation, and it is not a bug. It is a feature of how these models learn.

The research on this is unambiguous. Carlini et al. demonstrated in 2021 that GPT-2 could be prompted to reproduce verbatim text from its training set, including personally identifiable information. Subsequent work has shown that larger models memorise more, not less, and that fine-tuned models are particularly susceptible because the fine-tuning process reinforces the patterns in the fine-tuning data.

The three failure modes I see most often

Failure mode 1: Fine-tuning on raw customer data without a privacy review.

This is the most common. An organisation has a corpus of customer support tickets, internal documents, or transaction records. Someone proposes fine-tuning a model on it to improve performance. The proposal goes through a technical review. It does not go through a privacy review. The model ships.

Six months later, a red team exercise — or a curious user — discovers that the model will reproduce customer names, account numbers, or medical information when prompted in specific ways.

Failure mode 2: Treating the model as an anonymised artefact.

The reasoning goes: We trained on customer data, but the model itself is just weights. Weights are not personal data. Therefore, we can share the model freely.

This reasoning is wrong. The Article 29 Working Party (now the EDPB) has consistently held that data is personal if it can be used to identify an individual, directly or indirectly. If a model can be prompted to reproduce personal data, the model is — at minimum — a privacy risk that requires assessment.

Failure mode 3: No data lineage for training sets.

You cannot delete what you cannot find. If your training pipeline does not record which records went into which model version, you cannot respond to a right-to-erasure request in any meaningful way. You cannot demonstrate compliance. You cannot audit.

What privacy-aware ML actually requires

This is not a theoretical framework. These are the controls I implement when building ML pipelines for clients who operate under GDPR.

Before training:

  • Data minimisation audit: does the model actually need this field? Phone numbers, email addresses, and names are almost never necessary for model performance.
  • Consent and legal basis review: under what legal basis was this data collected? Does that basis extend to use for model training?
  • Differential privacy assessment: for high-sensitivity data, consider training with differential privacy guarantees. The performance cost is real but often acceptable.

During training:

  • Training data versioning: every training run should record exactly which records were used, with checksums. This is the foundation of any right-to-erasure response.
  • Canary tokens: inject synthetic records with unique identifiers into the training set. If the model reproduces them, you have a memorisation problem.
  • Gradient clipping: reduces the influence of any single training example, limiting memorisation.

After training:

  • Membership inference testing: use membership inference attacks to estimate how much the model has memorised. Tools like ML Privacy Meter make this practical.
  • Red team prompting: systematically attempt to extract training data through adversarial prompts before deployment.
  • Model card with data provenance: document what data was used, under what legal basis, and what privacy controls were applied.
# Example: canary token injection
import hashlib, secrets

def inject_canary(record: dict, canary_id: str) -> dict:
    """
    Inject a unique canary token into a training record.
    Log the mapping so you can test for memorisation later.
    """
    canary_value = f"CANARY-{canary_id}-{secrets.token_hex(8)}"
    record["_canary"] = canary_value
    # Log: canary_id → canary_value → record_id
    return record

The right to erasure problem

This is the one that keeps privacy engineers up at night.

GDPR Article 17 gives data subjects the right to have their personal data erased. The standard response to an erasure request is: delete the record from the database, purge it from backups within the retention period, done.

What do you do when the data is encoded in a model’s weights?

The honest answer is that there is no perfect solution. The practical options are:

  1. Retrain from scratch without the subject’s data. Expensive. Sometimes the only defensible option.
  2. Machine unlearning — techniques that attempt to remove the influence of specific training examples without full retraining. The research is promising; the production-ready tooling is not there yet.
  3. Model versioning with retention policies — maintain model versions tied to training data versions, retire models when the underlying data is subject to erasure requests.

Option 3 is what most organisations end up doing. It requires the data lineage infrastructure I described above. Without it, you cannot even know which model versions are affected by a given erasure request.

The competitive moat argument, revisited

Fine-tuning on proprietary data is still a competitive advantage. I am not arguing against it. I am arguing that the advantage needs to be priced against the liability.

The liability is not just regulatory. It is reputational. It is operational. It is the cost of the red team exercise that finds the problem before your customers do, versus the cost of the incident response when they find it first.

Privacy-aware ML is not slower or less effective than privacy-unaware ML. It is more disciplined. The organisations that build the discipline now will not be the ones explaining to a supervisory authority in 2027 why their model was reproducing customer data.

The ones who skip it will.