Docker AI models are available on Windows laptops now

Docker recently revealed a new feature of Docker Desktop: The Docker Model Runner. Initially it was only available on Mac and Windows with Nvidia GPUs, but with the upcoming release 4.42 of Docker Desktop, also Windows laptops with Qualcomm / ARM GPUs are supported. As I have such a laptop and got a chance to test a pre-release version of 4.42, I could finally join in on the fun.

The TL;DR

If you have a compatible laptop and Docker Desktop 4.42, here’s an example of what you can do:

Note that you have to enable the Model Runner either through Settings -> Beta Features where you also have to enable the GPU-backed inference. Enabling the Model Runner itself is also possible from the CLI with docker desktop enable model-runner, but enabling the GPU support only works from the UI for now. Also note that only models with a tag *-Q4_0 make use of it, something I learned only afterwards :)

The details: interacting with a model

As you can see, I used a SmolLM2 model with 135 million parameters, so it’s a very lightweight model. This is visible through the docker command: I executed docker model run ai/smollm2:135M-Q2_K, so the smollm2 indicates the model’s name and the 135M shows that it is the 135M parameter variation. With a simple docker model run, you can get an interactive chat session with the model. If you want to learn more about that, I recommend the official Docker docs.

To see which models are available, you can go to the AI section of the Docker Hub. There, you will find an ever-expanding list of models, each with a description, the different variations and some performance benchmarks to help you decide which model to use.

The details: using it in a .NET Blazor application

While that is a nice demo case, we probably want to use it for application development. Therefore, I have also created a small example of how it works from a .NET application using the OpenAI package. The most relevant part is probably the endpoint where the model runner is reachable or at least it took me the longest to figure that out and put it correctly into my code. It is available from within a container at http://model-runner.docker.internal or from the host at http://localhost:11434. If you want a different port, you can also change that using docker desktop enable model-runner --tcp <port> or through the settings UI. For example to query for the available models, you could make an HTTP call like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET http://localhost:11434/engines/llama.cpp/v1/models

{
  "object": "list",
  "data": [
    {
      "id": "ai/smollm2:135M-Q2_K",
      "object": "model",
      "created": 1745936469,
      "owned_by": "docker"
    },
    {
      "id": "ai/deepseek-r1-distill-llama",
      "object": "model",
      "created": 1742905580,
      "owned_by": "docker"
    }
  ]
}

To integrate this into a simple Blazor application that allows us to chat with the models, we can configure the OpenAI client to use that endpoint. I am (of course ;)) developing this in a devcontainer, so I can use the internal URL e.g. like this

1
2
3
4
5
6
7
8
builder.Services.AddSingleton(sp =>
{
    var options = new OpenAIClientOptions
    {
        Endpoint = new Uri("http://model-runner.docker.internal/engines/llama.cpp/v1"),
    };
    return new OpenAIClient(new ApiKeyCredential("unused"), options);
});

When we want to have a chat interaction with the model, we use something like this:

1
2
3
4
5
6
7
8
9
10
11
12
var chatClient = OpenAIClient.GetChatClient(selectedModel);          
cancellationTokenSource = new CancellationTokenSource();

var messages = new[] { ChatMessage.CreateUserMessage(userInput) };
status = "Generating response...";	
await foreach (var message in chatClient.CompleteChatStreamingAsync(messages, null, cancellationTokenSource.Token))
{
    foreach (var update in message.ContentUpdate) {
        response += update.Text;
    }
    StateHasChanged();
}

You can check the full result on this Github repo and run it in a devcontainer. It should look like this, depending on which models you have pulled:

I hope this gives you an idea of why I’m so excited about this new Docker Desktop capability! Give it a try once it is officially released and let me know how you like it.