We used this script to collect a dataset of 5k+ LLM jailbreaks

Alexander MiaAlexander Mia
3 min read

In our latest project, we dived on an exciting journey to gather a large dataset of over 5,000 instances of language model (LLM) jailbreaks. This dataset was crucial for our research in understanding the nuances of how LLMs can be manipulated or bypassed. Collecting data at such a scale presents unique challenges, especially when the data is scattered across various online platforms like Twitter and Reddit. Here, we share our experience and the innovative solution we devised using Python to overcome these obstacles.

The Challenge of Collecting Data from Social Media

Social media platforms are treasure troves of information, with users frequently sharing valuable data in the form of text and images. However, our initial data collection revealed a significant hurdle: approximately 40% of the prompts we aimed to collect were shared as images. This posed a problem because our automated tools were primarily designed to parse text data. Extracting text from images required a different approach, one that could accurately perform Optical Character Recognition (OCR).

Our Python-based Solution

To tackle this challenge, we turned to Python, a language known for its powerful libraries for data scraping and image processing. Our solution involved the following steps:

  1. Image Downloading: We used the requests library to fetch images containing prompts from their URLs.

  2. Optical Character Recognition (OCR): With the images downloaded, we employed pytesseract, a Python wrapper for Google's Tesseract-OCR Engine, to extract text from the images.

  3. Data Cleaning and Processing: Extracting text from images isn’t flawless. We encountered issues such as errors in the text, the need to split the text into sentences or chunks for further processing, and the task of checking for typos.

The core of our script looked something like this:

import requests
from PIL import Image
from io import BytesIO
import pytesseract

# URL of the image
image_url = "https://pbs.twimg.com/media/GJonki_asAALYGT?format=png&name=medium"

# Download the image from the URL
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
# Use pytesseract to do OCR on the image
text = pytesseract.image_to_string(image)

Overcoming Challenges

Despite the efficiency of our script, we faced several challenges that required creative solutions:

  • Fixing Errors and Typos: OCR is not perfect and often introduces errors or misses typos in the extracted text. We implemented additional steps in our pipeline to parse the OCR output, correct errors, and validate the text against common spelling and grammar rules.

  • Handling Partial Text: Many times, the text from one image was part of a larger message spread across multiple images. We developed a method to merge texts by identifying overlapping segments or common parts between them.

  • Detecting Duplicates: A dataset with duplicates is less valuable and can skew research results. We incorporated algorithms to detect and remove duplicate entries, ensuring the uniqueness of our dataset.

Conclusion

Building a dataset of over 5,000 LLM jailbreaks was no small feat. The challenges we encountered, particularly with data collection from image-based sources, taught us valuable lessons in data processing and automation. Our Python script, leveraging the power of OCR and careful data cleaning, proved to be a crucial asset in our project. This experience not only enriched our technical skills but also set a foundation for future research in the fascinating world of language model jailbreaks. Through perseverance and innovation, we were able to overcome the hurdles and achieve our goal, demonstrating the importance of adaptive problem-solving in the realm of data science.

https://github.com/lastlayer/last_layer

0
Subscribe to my newsletter

Read articles from Alexander Mia directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alexander Mia
Alexander Mia