How to defend against Prompt Leaking: strategies and tips for securing generative artificial intelligence models.


In the vast world of artificial intelligence, challenges and threats arise that require our attention and protection. Today, we explore a delicate and crucial aspect: ‘prompt leaking’, a hacking technique that threatens the security of information stored in generative artificial intelligence models. Through this investigation, we will discover how to defend ourselves against this subtle threat that creeps in between the folds of prompts, opening unwanted doors.

The Journey into the World of Prompts:

Imagine the prompt as a magic key that opens the door to a world of words and knowledge. However, as in any fairy tale, this key can fall into the wrong hands, turning into a source of violence. In our year together, we have shared experiences and discoveries, but now we enter into territory where caution is our guide.

Protect the Prompt Frame:

To preserve the safety of our journey, it is essential to put in place robust defensive measures. Prompt-based defences, careful monitoring of output and the use of control models prove crucial. However, we must be aware that no defensive measure offers 100 per cent security in the current generation of chatbots and artificial intelligence models.

The Thin Line between Prompt Injection and Prompt Leaking:

The distinction between ‘prompt injection’ and ‘prompt leaking’ is crucial to fully understanding the threats we face. While the former seeks to influence the response of the model, the latter aims to extract sensitive and confidential secrets from the model itself, including the prompts used.

Exploring Differences:

Through concrete examples, let us dive into the differences between ‘prompt injection’ and ‘prompt leaking’. The former could be compared to an action in which an attacker inserts malicious instructions to obtain specific answers. The second represents the extraction of sensitive information directly from prompts, compromising the confidentiality of vital data.

Treasury Protection:

We reveal together what we jealously guard: sensitive prompts, customised training data, unique algorithms and architectures. Protecting this information is crucial for the security and competitiveness of companies that rely on artificial intelligence.


Thus we conclude our exploration of ‘prompt leaking’, a journey through the finer folds of artificial intelligence security. Like any journey, it requires awareness, preparation and a commitment to protect the hidden treasure behind each prompt. On this anniversary, we strengthen our determination to navigate wisely through the challenges ahead.


Remember that the secret information vital to a model that must be defended against attacks includes:

  • Sensitive and Proprietary Prompts: The specific prompts used by a company to obtain desired results from artificial intelligence models. These prompts may contain proprietary know-how and confidential information that must be protected to preserve the company’s competitiveness.
  • Customised training data: If a model has been trained on customised or proprietary data, it is vital to protect such data to prevent the leakage of sensitive or confidential information.
  • Algorithms and custom architectures: Customised algorithms and architectures developed by a company to improve model performance must be defended to preserve innovation and competitive advantage.
  • Sensitive information generated by the model: Sensitive or confidential information that could be generated as output from the model and must be protected against unauthorised disclosure.

Protecting this information is crucial to preserving the security and competitiveness of a company.

Two concrete examples

Here are two examples illustrating the differences between ‘prompt injection’ and ‘prompt leaking’:

  • Prompt Injection: An example of a prompt injection could be an attacker entering a prompt asking an artificial intelligence model to generate details of an illegal operation, bypassing the model’s rules and restrictions. For example, the attacker might ask the model to describe in detail how to perform a theft or malicious action, causing it to generate an inappropriate response.
  • Prompt Leaking: In the case of prompt leaking, an example would be the extraction of sensitive or confidential prompts from the answers of a model. For instance, if a company uses a specific prompt as part of its proprietary know-how, prompt leaking could compromise the confidentiality of that information, as an attacker could extract the prompt itself from the model’s answers.

These examples show how prompt injection aims to influence the response of the model, while prompt leaking attempts to extract sensitive or confidential information from the model itself, including the prompts used.

Related topics


digital humanism franco bagaglia
digital humanism franco bagaglia

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top