Introduction

In this article we will explore performing inference on GGUF models with Llama.cpp using the Llamasharp nuget package. It sounds like it should take longer than it actually does.

GGUF models are probably one of the easiest models to work with, both using python and C#. If your goal is to use C# to integrate a local model into an existing development or if you want to build something from the ground up. No need to search to far, Llamasharp has all the basic functionality you require.

We'll just use a new console application in Visual Studio for the sake of not overcomplicating this.

Step 1: Install LLaMaSharp

First, you need to install the LLaMaSharp library. LLaMaSharp is a cross-platform library that allows you to run LLaMA/LLAVA models (and others) on your local device. To install LLaMaSharp.

Use the Nuget package manager:

Use the command:
```
 dotnet add package LLaMaSharp
```

Step 2: Install LLaMaSharp.Backend.Cpu

Next, install the LLaMaSharp.Backend.Cpu package, which provides the necessary backend to run LLaMaSharp using only the CPU. Follow these steps:

Again either throught the Nuget package manager:

Or using the command:

 dotnet add package LLaMaSharp.Backend.Cpu

Step 3: Select a GGUF Model

You can use almost any GGUF model with LLaMaSharp, but here are some models that I have tested and found to be efficient in terms of speed and accuracy:

There are plenty more i have used, but these were the most speed/accuracy prominent ones. Don't expect miracles either, you won't be running GPT4o on a CPU anytime soon. But this is nifty for certain use cases.

Step 4: Implement the code

Now that you have installed the necessary packages and selected your GGUF model, it's time to implement the code. Below is an example of how to set up and run a model using LLaMaSharp and the selected backend.

Create a new C# project (if you haven't already):

 dotnet new console -n LLaMaSharpExample
 cd LLaMaSharpExample

Implement the code in your Program.cs file:

 using LLama.Common;
 using LLama;

 string modelPath = @"<Your Model Path>"; // change it to your own model path.

 var parameters = new ModelParams(modelPath)
 {
     ContextSize = 1024, // The longest length of chat as memory.
     GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
 };
 using var model = LLamaWeights.LoadFromFile(parameters);
 using var context = model.CreateContext(parameters);
 var executor = new InteractiveExecutor(context);

 // Add chat histories as prompt to tell AI how to act.
 var chatHistory = new ChatHistory();
 chatHistory.AddMessage(AuthorRole.System, "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.");
 chatHistory.AddMessage(AuthorRole.User, "Hello, Bob.");
 chatHistory.AddMessage(AuthorRole.Assistant, "Hello. How may I help you today?");

 ChatSession session = new(executor, chatHistory);

 InferenceParams inferenceParams = new InferenceParams()
 {
     MaxTokens = 256, // No more than 256 tokens should appear in answer. Remove it if antiprompt is enough for control.
     AntiPrompts = new List<string> { "User:" } // Stop generation once antiprompts appear.
 };

 Console.ForegroundColor = ConsoleColor.Yellow;
 Console.Write("The chat session has started.\nUser: ");
 Console.ForegroundColor = ConsoleColor.Green;
 string userInput = Console.ReadLine() ?? "";

 while (userInput != "exit")
 {
     await foreach ( // Generate the response streamingly.
         var text
         in session.ChatAsync(
             new ChatHistory.Message(AuthorRole.User, userInput),
             inferenceParams))
     {
         Console.ForegroundColor = ConsoleColor.White;
         Console.Write(text);
     }
     Console.ForegroundColor = ConsoleColor.Green;
     userInput = Console.ReadLine() ?? "";
 }

Step 5: AI away

Congratulations! You've successfully set up your console application. Now, it's time to put your model to the test. Run your application, ask some questions, and explore the power of AI. Let your creativity flow and see what amazing insights and ideas your model can generate.

Local LLM's with .Net

Table of contents