Prompt Injection: A Simple Explanation

Anshul TiwariAnshul Tiwari
1 min read

This is security vulnerability that targets AI and machine learning system.Here malicious prompt manipulates model's behavior.Its aims to get sensitive information or executing unauthorized instructions.

Types of Prompt Injections:

  1. Direct Prompt Injection: Here the attacker directly provides the malicious prompt to the AI model.

  2. Indirect Prompt Injection: The attacker hides the malicious prompt by embedding it in web pages, posts or images.

  3. Prompt Leaking: The attackers crafts a prompt to override original system instructions.

  4. Jail-breaking: The attacker writes a prompt that convinces th AI model to disregard its built-in safeguards and restrictions.

Consequences:

  • Leak confidential information

  • Generate malware

  • Manipulate AI to perform unauthorized actions

  • Spreading misinformation

Mitigation strategies:

  • Input filtering

  • Output filtering

  • Instructing model to adhere to its original rules and restrictions.

  • Role based Control: Limit what a model can do on the basis of user using the model.

  • Layered Security: Having multi-layered security that includes monitoring, logging and anomaly detection at early stages.

0
Subscribe to my newsletter

Read articles from Anshul Tiwari directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anshul Tiwari
Anshul Tiwari