Run 100B+ Language Models at Home with Petals
Large language models like GPT-3 are huge and require a large of GPUs to run on. But what if we could all run 100+B Language models right on our computers?
Petals is here to help! With its BitTorrent-style approach, you can now run language models with over 100 billion parameters at home. Petals allow you to generate text using distributed BLOOM and fine-tune it for your own tasks. In addition, Petals makes fine-tuning and inference up to 10 times faster than offloading.
Table of Contents
- How Petals Works
- Increase Petals Capacity with Your Own GPU
- More Examples and Tutorials
- Privacy and Security
How Petals Works
Petals works by running large language models like BLOOM-176B collaboratively. You only need to load a small part of the model, and then team up with others to run inference or fine-tuning. Inference runs at approximately 1 second per step, making it perfect for chatbots and other interactive apps. You can also employ any fine-tuning and sampling methods by executing custom paths through the model or accessing its hidden states.
Here’s an example of how to use Petals to generate text and fine-tune a model:
from petals import DistributedBloomForCausalLM
model = DistributedBloomForCausalLM.from_pretrained("bigscience/bloom-petals", tuning_mode="ptune", pre_seq_len=16)
Embeddings & prompts are on your device, BLOOM blocks are distributed across the Internet
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"] outputs = model.generate(inputs, max_new_tokens=5) print(tokenizer.decode(outputs[0])) # A cat sat on a mat...
Fine-tuning (updates only prompts or adapters hosted locally)
optimizer = torch.optim.AdamW(model.parameters()) for input_ids, labels in data_loader: outputs = model.forward(input_ids) loss = cross_entropy(outputs.logits, labels) optimizer.zero_grad() loss.backward() optimizer.step()
Increase Petals Capacity with Your Own GPU
To increase Petals’ capacity, you can easily connect your own GPU. Simply install PyTorch and Petals, and then run the server. Or, use our GPU-enabled Docker image to run the server.
# In an Anaconda env conda install pytorch cudatoolkit=11.3 -c pytorch pip install -U petals python -m petals.cli.run_server bigscience/bloom-petals
Or using our GPU-enabled Docker image
sudo docker run --net host --ipc host --gpus all --volume petals-cache:/cache --rm \ learningathome/petals:main python -m petals.cli.run_server bigscience/bloom-petals
More Examples and Tutorials
Petals has plenty of examples and tutorials available, including a chatbot web app and a tutorial on launching your own swarm.
- Chatbot web app: link, source code
- Training a personified chatbot: notebook
- Fine-tuning BLOOM for text semantic classification: notebook
- Launching your own swarm: tutorial
- Running a custom foundation model: tutorial
Privacy and Security
Please note that the Petals public swarm is designed for research and academic use only. Do not use the public swarm to process sensitive data, as it is technically possible for peers serving model layers to recover input data and model outputs or modify them in a malicious way. Instead, you can set up a private Petals swarm with trusted individuals and organizations.
Additionally, be sure to check out the model’s terms of use, risks, and limitations before building an application that runs a language model with Petals.
Subscribe to my newsletter
Read articles from Harish Garg directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by