Prompt Comparison app


One of my favorite ways to learn how large-language models behave is to drop the same prompt into two different models and watch the discrepancies unfold. A few weeks ago I kept switching tabs between ChatGPT and local models (LM Studio), but the process felt clunky. So I built a micro-tool that does the comparison for me—and it runs entirely on Hugging Face.
Although it only compares open source models and not the latest foundation models, it still is useful for me to compare differences between the open source models, at least the lightweight versions.
Why I Built It
Rapid prompt engineering
Seeing two answers side-by-side helps me decide which wording and styling.
Qualitative benchmarking
Formal evaluation metrics are great, but a quick visual gut-check on tone and factuality often saves me time. I like get a ‘feel’ for each model.
How It Works under the Hood
Gradio front-end
A single Textbox for the prompt and two response panes keep the UI friction-free.
huggingface-hub’s InferenceClient
Instead of loading giant weights locally, the app makes chat_completion calls to
mistralai/Mistral-7B-Instruct-v0.2
meta-llama/Llama-2-7b-chat-hf
from huggingface_hub import InferenceClient
client = InferenceClient("mistralai/Mistral-7B-Instruct-v0.2", token=HF_TOKEN)
resp = client.chat_completion(
messages=[{"role": "user", "content": prompt}],
max_tokens=200).choices[0].message["content"]
Using the Tool
Paste a prompt – e.g. “Explain the second law of thermodynamics in plain English.”
Hit “Submit.”
Compare the Mistral answer (usually concise and instructional) with LLaMA-2’s (often more conversational).
Observations
My personal observations experimenting with the two models:
Stylistic flavor – Mistral tends to jump straight into bullet points; LLaMA-2 sprinkles more transition words.
Output – Mistral packs more information than LLaMA-2.
Check it out
➡️ Prompt Comparison Tool on HuggingFace
Feel free to ping me with improvements or bug reports. Happy prompting!
Subscribe to my newsletter
Read articles from Donald Tucker directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Donald Tucker
Donald Tucker
I am an Industrial Engineer utilizing the power of python to gain deeper insights in data. I am currently learning Deep learning with TensorFlow