Ollama on Raspberry Pi
As part of getting into SLM for small AI devices, we are going to look at inference speed on the Raspberry Pi. Mostly six models are of interest:
- tinyllama
- tinydolphin
- phi3
- smallthinker
- granite3.1-moe
- llama3.2:1b
Each model has a small number of parameters, to make sure we get usable speeds for inference.
We will get speed figures on simple inference for each of them.
In a previous post, we installed Arch Linux on the raspberry pi (make sure to have a look at Running Arch on Pi ).
Today, we are going to install Ollama, to run LLMs, and compare the speed of different small models, with suitable speed for local inference.
Let’s start and install Ollama on the Pi, by executing the provided script:
curl -fsSL https://ollama.com/install.sh | sh
Testing the models
Pulling the models, is quite time consuming on such a small device, especially with slow SD card writing speeds, but pretty straight forward.
Now let’s see how each of these behaves.
Tinyllama
$ ollama pull tinyllama
$ ollama show tinyllama
Model
architecture llama
parameters 1.1B
context length 2048
embedding length 2048
quantization Q4_0
Parameters
stop "<|system|>"
stop "<|user|>"
stop "<|assistant|>"
stop "</s>"
System
You are a helpful AI assistant.
$ ollama run tinyllama "who is super mario?" --verbose
Super Mario (Japanese: スーパーマリオ, Used in English as “Super Mario”, pronounced “Supa Mairyo”) is a series of platform games created by Nintendo and developed and published by Nintendo for the NES. The series began with Super Mario Bros., which was released in 1985 in Japan and 1986 in North America. Since then, there have been countless iterations of the Super Mario franchise, including various spin-offs, remakes, and collaborations with other games or companies, as well as multiple ports to different consoles. Super Mario Bros. (NES) | Wii | Nintendo 3DS | Nintendo Switch | Super Smash Bros. Melee (GameCube) | Super Smash Bros. Brawl (Wii U/Switch) | Super Smash Bros. Ultimate (Switch) | The Legend of Zelda: Ocarina of Time (SNES) | Super Smash Bros. Melee (N64) | The Legend of Zelda: A Link to the Past (SNES) | Super Smash Bros. 64 (GB) | Super Smash Bros. Ultimate (Switch) | Splatoon (Wii U/Switch) | Pokémon Let’s Go Pikachu! (Nintendo Switch) | Pokémon Let’s Go Eevee! (Nintendo Switch)
total duration: 53.459051422s
load duration: 27.865441224s
prompt eval count: 40 token(s)
prompt eval duration: 2.281s
prompt eval rate: 17.54 tokens/s
eval count: 317 token(s)
eval duration: 23.311s
eval rate: 13.60 tokens/s
Tinydolphin
$ ollama pull tinydolphin
$ ollama show tinydolphin
Model
architecture llama
parameters 1.1B
context length 4096
embedding length 2048
quantization Q4_0
Parameters
stop "<|im_start|>"
stop "<|im_end|>"
System
You are Dolphin, a helpful AI assistant.
License
Apache License
Version 2.0, January 2004
Then, the same mario query.
$ ollama run tinydolphin "who is super mario?" --verbose
Super Mario is an iconic video game character created by Shigeru Miyamoto for the NES console in 1993. He was designed to be the protagonist of The Legend of Zelda series and later became one of the most popular characters in gaming history. Super Mario has been a global phenomenon since its release, inspiring numerous sequels, remakes, and spin-offs, as well as numerous fan creations and adaptations across different platforms.
total duration: 38.553242359s
load duration: 29.361005446s
prompt eval count: 35 token(s)
prompt eval duration: 1.987s
prompt eval rate: 17.61 tokens/s
eval count: 101 token(s)
eval duration: 7.203s
eval rate: 14.02 tokens/s
phi3
$ ollama pull phi3
$ ollama show phi3
Model
architecture phi3
parameters 3.8B
context length 131072
embedding length 3072
quantization Q4_0
Parameters
stop "<|end|>"
stop "<|user|>"
stop "<|assistant|>"
License
Microsoft.
Copyright (c) Microsoft Corporation.
The model gives a substancially better answer at the cost of time.
$ ollama run phi3 "who is super mario?" --verbose
Super Mario refers to the character “Mario,” who has become an iconic figure in video games, particularly within the Nintendo franchise. Super Mario encompasses multiple iterations and characters that have evolved over time but always maintains his core identity as a plumber-like hero who fights against villains like Bowser to rescue Princess Peach from captivity and save the Mushroom Kingdom in various adventures, starting with “Super Mario Bros.” for the Nintendo Entertainment System (NES).
Mario has become one of gaming’ enduring figures due to his relatable character design, playable nature that encourages player participation through direct control over a heroine journeying into dungeons and other fantastical realms. His evolution from mere platformer in the original “Super Mario Bros.” series reflects how characters can transcend their initial forms—Mario’s persona has expanded to include different gameplay mechanics, alternate universes (like Super Smash Bros.), movies, merchandise and even theme parks.
Despite the evolution of his character across games like “Super Mario 3D World,” which features a mix-and-match system allowing players’ Mii avatars to represent different iterations of Mario (such as Bowser or Yoshi), Super Mario remains fundamentally recognizable—the plumber on a mission, armed with power-ups and often accompanied by his brother Luigi.
Super Mario is not just about the character himself but also represents an enduring legacy in gaming culture that has inspired countless players of all ages across generations to explore virtual worlds together as teamwork and perseverance are core elements throughout Super Mario’s adventures.
total duration: 3m23.698619251s
load duration: 1m40.8436407s
prompt eval count: 15 token(s)
prompt eval duration: 3.205s
prompt eval rate: 4.68 tokens/s
eval count: 366 token(s)
eval duration: 1m39.648s
eval rate: 3.67 tokens/s
Table Summary
ordered by eval speed rates
LLM | arch | size | params | eval rate |
---|---|---|---|---|
smollm:135m | llama | 459 MB | 134.52M | 60.57 tokens/s |
smollm2:135m | llama | 643 MB | 134.52M | 30.92 tokens/s |
granite3.1-moe | qwen2 | 2.9 GB | 3.3B | 14.34 tokens/s |
tinydolphin | llama | 636 MB | 1.1B | 14.02 tokens/s |
tinyllama | llama | 637 MB | 1.1B | 13.60 tokens/s |
deepseek-r1:1.5b | qwen2 | 1.6 GB | 1.8B | 9.21 tokens/s |
llama3.2:1b | llama | 2.2 GB | 1.2B | 7.64 tokens/s |
falcon3:1b | llama | 1.8 GB | 1.7B | 7.06 tokens/s |
gemma2:2b | gemma2 | 3.1 GB | 2.6B | 5.08 tokens/s |
phi3 | phi3 | 2.2 GB | 3.8B | 3.67 tokens/s |
smallthinker | qwen2 | 3.6 GB | 3.4B | 3.17 tokens/s |
Conclusion
The two llama-based smollm and smollm2 are giving pretty good speed on the raspberry, and could be of great use for local inference on edge devices.