Almawave S.p.A.

09/23/2025 | Press release | Distributed by Public on 09/23/2025 07:56

PME and PAE: How to protect sensitive data in LLMs

PME and PAE: How to protect sensitive data in LLMs

Artificial Intelligence

23 September 2025

What if your email address was embedded deep within AI's memory? Or your personal information wasn't just stored in the cloud but woven into an AI's neural network, making it available for extraction?

While not so common, this is a reality we can no longer ignore. Large Language Models (LLMs) like GPThave the potential to unintentionally memorize personal information (PII) during their training. This includes sensitive data such as email addresses, phone numbers, and even login URLs.

In fact, many studies have shown attackscanpull specific training data from the model by using particular prompts.

However, this doesn't mean we should fear or abandon AI. Similar to how early concerns over technologies such as smartphones and their data privacy have been addressed with more robust security features, AI too can be managed responsibly.

Fortunately, AI can be taught not to release sensitive information. The question then becomes: once an LLM learns something private, how do we make it forget - without the need for complete retraining?

In this blog, we'll take a closer look at two potential solutions: PAE (Private Association Editing)and PME (Private Memorization Editing) and explore how they could solve this privacy dilemma.

What is memorization in LLMs?

In AI, memorizationrefers to the model's ability to not just identify patterns, but to retain specific sequences of dataverbatim.

This goes beyond learning general trends - it means the model can recall exact details, like email addresses, phone numbers, or even URLs, that may have appeared in the data it was trained on. While this isn't an intentional feature of LLMs, it happens as a byproduct of training on large, complex datasets.

In addition,if a model processes a detailed user query or a series of interactions, it may inadvertently "memorize" certain private information, leaving it vulnerable to future retrieval.

While memorization of sensitive data presents a privacy risk, it is also a challenge that can be addressed through proper safeguards and techniques.

The aim isn't to eliminate the benefits of AI but to ensure that these models can be used effectively without compromising user privacy.

This raises the question: which techniques can be applied to address this?

Existing approaches

One approach to erasing sensitive data from an AI model is model unlearning, which involves retraining the model to forget specific data.

However, this process is expensive and can be highly disruptive.

Not only does it require significant computational resources, but it can also lead to a degradation in model performance, as fine-tuning might inadvertently alter the model's general capabilities or introduce new biases.

Beyond cost and performance issues, model unlearning can leave residual sensitive data, introduce new biases, and is impractical at scale. It also often requires access to original training data-which can introduce new biases or contain more sensitive data. It also carries high environmental costs.

PAE: Streamlined data relationship solution

PAE (Private Association Editing)is a data refinement technique designed to modifyassociations between data points.

What makes PAE stand out is its precision: a single edit can have far-reaching effects, potentially protecting multiple individuals by severing unwanted or unnecessary associations.

This capability is especially useful in situations where quick and focused changes areneeded, and it works best when there is a direct, explicit association between data points (namely pieces of information).

However, PAE does have its limitations.

For example, it can't always target deeply ingrained sequences or patterns of data that are "memorized" by systems over time.

Moreover, PAE may not catch data leaks particularly when the relationship between data points isn't explicitly defined.

But what if we tackled the memory itself, and not just the associations?

This is because while PAE offers a cleaner way to 'unlink' associations, it doesn't address what happens when sensitive information is directly encoded - verbatim - into the model's weights. That's where PME comes in.

Introducing PME: Private MemorizationEditing

What Is PME?

Building on PAE, PME-or Private Memorization Editing-takes privacy protection to the next level. While PAE edits explicit associations between data points, PME directly targets the internal memory of a model, surgically removing sensitive information such as:

  • Email addresses
  • Phone numbers
  • API keys
  • Personally identifiable information (PII)

PME works by editing the internal memory of the model, surgically removing these sensitive sequences while preserving the overall performance of the model. It's a precise tool that ensures privacy without compromising the model's capabilities.

Why is PME so effective?

PME's effectiveness comes from its precision and efficiency. Here are the main benefits:

  • Precision: PME edits only the parts of the model responsible for memorizing private data, leaving the rest of the model untouched.
  • Generalizable: PME can target private data even when there's no clear association between data points.
  • Robust: PMEis resistant to Training Data Extraction (TDE)attacks, even when faced with long or complex prompts.
  • Efficient: PME doesn'trequire full retraining, making it a fast, scalable solution that keeps the model's general abilities intact.

Use cases for PAE and PME

Both algorithms are primarily used to prevent the model from generating sensitive information.

The key difference between the two lies in the type of data being provided.

  • PAE is typically applied when there is a direct relationship between the user's data and the data to be removed
  • PME can mask sensitive information in longer, more complex prompts without the need for explicit associations.

For instance, when a user notices personal data in a model's output, they can request its removal, and PAE or PMEwill intervene to prevent further exposure.

Does PME replace PAE?

PMEdoes not replace PAE; rather, it complements it. These are two distinct approaches used in different scenarios:

  • PAEis best used when there's a clear association between a user's data and the information to be removed. It helps prevent the generation of sensitive data for a specific person.
  • PME, on the other hand, is designed for situations where no explicit association exists between the data and the prompt, such as during the inference phase when sensitive information like IBAN codes or email text may be generated.

While PMEcould potentially replace PAEin the future, there is not yet sufficient evidence to support this.

Ensuring transparency and safety: Velvet's ethical approach

Velvetis Almawave's family of multilingual Large Language Model(LLM), designed to address privacy concerns while maintaining high performance across diverse applications. The PME approach has been tested on Velvet 2B, and 14B for custom solutions for our customers.

Velvet goes beyond typical model development by prioritizing privacy and ethical responsibility.

To ensure compliance with global standards, Velvet's development is monitored by two independent entities, verifying its ethical adherence to OECD and WHO guidelines.

This supervision guarantees that Velvet operates with principles of transparency, fairness, and safety, upholding the highest standards of privacy protection. The model integrates advanced memory-editing techniques, ensuring sensitive information is safeguarded without compromising its functionality or overall performance.

Learn more about Almawave's ethical AI .

Discover Velvet
Almawave S.p.A. published this content on September 23, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 23, 2025 at 13:56 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]