One of my favorite ways to learn how large-language models behave is to drop the same prompt into two different models and watch the discrepancies unfold. A few weeks ago I kept switching tabs between ChatGPT and local models (LM Studio), but the process felt clunky. So I built a micro-tool that does the comparison for me—and it runs entirely on Hugging Face.

Although it only compares open source models and not the latest foundation models, it still is useful for me to compare differences between the open source models, at least the lightweight versions.

Why I Built It

Rapid prompt engineering

Seeing two answers side-by-side helps me decide which wording and styling.
Qualitative benchmarking

Formal evaluation metrics are great, but a quick visual gut-check on tone and factuality often saves me time. I like get a ‘feel’ for each model.

How It Works under the Hood

Gradio front-end

A single Textbox for the prompt and two response panes keep the UI friction-free.
huggingface-hub’s InferenceClient

Instead of loading giant weights locally, the app makes chat_completion calls to
- mistralai/Mistral-7B-Instruct-v0.2
- meta-llama/Llama-2-7b-chat-hf

from huggingface_hub import InferenceClient

client = InferenceClient("mistralai/Mistral-7B-Instruct-v0.2", token=HF_TOKEN)
resp = client.chat_completion(
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200).choices[0].message["content"]

Using the Tool

Paste a prompt – e.g. “Explain the second law of thermodynamics in plain English.”
Hit “Submit.”

Compare the Mistral answer (usually concise and instructional) with LLaMA-2’s (often more conversational).

Observations

My personal observations experimenting with the two models:

Stylistic flavor – Mistral tends to jump straight into bullet points; LLaMA-2 sprinkles more transition words.
Output – Mistral packs more information than LLaMA-2.

Check it out

➡️ Prompt Comparison Tool on HuggingFace

Feel free to ping me with improvements or bug reports. Happy prompting!

Prompt Comparison app

Table of contents