Prompt Injection: A Simple Explanation
This is security vulnerability that targets AI and machine learning system.Here malicious prompt manipulates model's behavior.Its aims to get sensitive information or executing unauthorized instructions.
Types of Prompt Injections:
Direct Prompt Injection: Here the attacker directly provides the malicious prompt to the AI model.
Indirect Prompt Injection: The attacker hides the malicious prompt by embedding it in web pages, posts or images.
Prompt Leaking: The attackers crafts a prompt to override original system instructions.
Jail-breaking: The attacker writes a prompt that convinces th AI model to disregard its built-in safeguards and restrictions.
Consequences:
Leak confidential information
Generate malware
Manipulate AI to perform unauthorized actions
Spreading misinformation
Mitigation strategies:
Input filtering
Output filtering
Instructing model to adhere to its original rules and restrictions.
Role based Control: Limit what a model can do on the basis of user using the model.
Layered Security: Having multi-layered security that includes monitoring, logging and anomaly detection at early stages.
Subscribe to my newsletter
Read articles from Anshul Tiwari directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by