Contents

Ollama on Raspberry Pi

As part of getting into SLM for small AI devices, we are going to look at inference speed on the Raspberry Pi. Mostly six models are of interest:

  • tinyllama
  • tinydolphin
  • phi3
  • smallthinker
  • granite3.1-moe
  • llama3.2:1b

Each model has a small number of parameters, to make sure we get usable speeds for inference.

We will get speed figures on simple inference for each of them.

In a previous post, we installed Arch Linux on the raspberry pi (make sure to have a look at Running Arch on Pi ).

Today, we are going to install Ollama, to run LLMs, and compare the speed of different small models, with suitable speed for local inference.

Let’s start and install Ollama on the Pi, by executing the provided script:


curl -fsSL https://ollama.com/install.sh | sh

Testing the models

Pulling the models, is quite time consuming on such a small device, especially with slow SD card writing speeds, but pretty straight forward.

Now let’s see how each of these behaves.

Tinyllama

$ ollama pull tinyllama

$ ollama show tinyllama
  Model
    architecture        llama    
    parameters          1.1B     
    context length      2048     
    embedding length    2048     
    quantization        Q4_0     

  Parameters
    stop    "<|system|>"       
    stop    "<|user|>"         
    stop    "<|assistant|>"    
    stop    "</s>"             

  System
    You are a helpful AI assistant.    
$ ollama run tinyllama "who is super mario?" --verbose 

Super Mario (Japanese: スーパーマリオ, Used in English as “Super Mario”, pronounced “Supa Mairyo”) is a series of platform games created by Nintendo and developed and published by Nintendo for the NES. The series began with Super Mario Bros., which was released in 1985 in Japan and 1986 in North America. Since then, there have been countless iterations of the Super Mario franchise, including various spin-offs, remakes, and collaborations with other games or companies, as well as multiple ports to different consoles. Super Mario Bros. (NES) | Wii | Nintendo 3DS | Nintendo Switch | Super Smash Bros. Melee (GameCube) | Super Smash Bros. Brawl (Wii U/Switch) | Super Smash Bros. Ultimate (Switch) | The Legend of Zelda: Ocarina of Time (SNES) | Super Smash Bros. Melee (N64) | The Legend of Zelda: A Link to the Past (SNES) | Super Smash Bros. 64 (GB) | Super Smash Bros. Ultimate (Switch) | Splatoon (Wii U/Switch) | Pokémon Let’s Go Pikachu! (Nintendo Switch) | Pokémon Let’s Go Eevee! (Nintendo Switch)

total duration:       53.459051422s
load duration:        27.865441224s
prompt eval count:    40 token(s)
prompt eval duration: 2.281s
prompt eval rate:     17.54 tokens/s
eval count:           317 token(s)
eval duration:        23.311s
eval rate:            13.60 tokens/s

Tinydolphin

$ ollama pull tinydolphin

$ ollama show tinydolphin
  Model
    architecture        llama    
    parameters          1.1B     
    context length      4096     
    embedding length    2048     
    quantization        Q4_0     

  Parameters
    stop    "<|im_start|>"    
    stop    "<|im_end|>"      

  System
    You are Dolphin, a helpful AI assistant.    

  License
    Apache License               
    Version 2.0, January 2004 

Then, the same mario query.

$ ollama run tinydolphin "who is super mario?" --verbose 

Super Mario is an iconic video game character created by Shigeru Miyamoto for the NES console in 1993. He was designed to be the protagonist of The Legend of Zelda series and later became one of the most popular characters in gaming history. Super Mario has been a global phenomenon since its release, inspiring numerous sequels, remakes, and spin-offs, as well as numerous fan creations and adaptations across different platforms.

total duration:       38.553242359s
load duration:        29.361005446s
prompt eval count:    35 token(s)
prompt eval duration: 1.987s
prompt eval rate:     17.61 tokens/s
eval count:           101 token(s)
eval duration:        7.203s
eval rate:            14.02 tokens/s

phi3

$ ollama pull phi3

$ ollama show phi3
  Model
    architecture        phi3      
    parameters          3.8B      
    context length      131072    
    embedding length    3072      
    quantization        Q4_0      

  Parameters
    stop    "<|end|>"          
    stop    "<|user|>"         
    stop    "<|assistant|>"    

  License
    Microsoft.                              
    Copyright (c) Microsoft Corporation.    

The model gives a substancially better answer at the cost of time.

$ ollama run phi3 "who is super mario?" --verbose 

Super Mario refers to the character “Mario,” who has become an iconic figure in video games, particularly within the Nintendo franchise. Super Mario encompasses multiple iterations and characters that have evolved over time but always maintains his core identity as a plumber-like hero who fights against villains like Bowser to rescue Princess Peach from captivity and save the Mushroom Kingdom in various adventures, starting with “Super Mario Bros.” for the Nintendo Entertainment System (NES).

Mario has become one of gaming’ enduring figures due to his relatable character design, playable nature that encourages player participation through direct control over a heroine journeying into dungeons and other fantastical realms. His evolution from mere platformer in the original “Super Mario Bros.” series reflects how characters can transcend their initial forms—Mario’s persona has expanded to include different gameplay mechanics, alternate universes (like Super Smash Bros.), movies, merchandise and even theme parks.

Despite the evolution of his character across games like “Super Mario 3D World,” which features a mix-and-match system allowing players’ Mii avatars to represent different iterations of Mario (such as Bowser or Yoshi), Super Mario remains fundamentally recognizable—the plumber on a mission, armed with power-ups and often accompanied by his brother Luigi.

Super Mario is not just about the character himself but also represents an enduring legacy in gaming culture that has inspired countless players of all ages across generations to explore virtual worlds together as teamwork and perseverance are core elements throughout Super Mario’s adventures.

total duration:       3m23.698619251s
load duration:        1m40.8436407s
prompt eval count:    15 token(s)
prompt eval duration: 3.205s
prompt eval rate:     4.68 tokens/s
eval count:           366 token(s)
eval duration:        1m39.648s
eval rate:            3.67 tokens/s

Table Summary

ordered by eval speed rates

LLM arch size params eval rate
smollm:135m llama 459 MB 134.52M 60.57 tokens/s
smollm2:135m llama 643 MB 134.52M 30.92 tokens/s
granite3.1-moe qwen2 2.9 GB 3.3B 14.34 tokens/s
tinydolphin llama 636 MB 1.1B 14.02 tokens/s
tinyllama llama 637 MB 1.1B 13.60 tokens/s
deepseek-r1:1.5b qwen2 1.6 GB 1.8B 9.21 tokens/s
llama3.2:1b llama 2.2 GB 1.2B 7.64 tokens/s
falcon3:1b llama 1.8 GB 1.7B 7.06 tokens/s
gemma2:2b gemma2 3.1 GB 2.6B 5.08 tokens/s
phi3 phi3 2.2 GB 3.8B 3.67 tokens/s
smallthinker qwen2 3.6 GB 3.4B 3.17 tokens/s

Conclusion

The two llama-based smollm and smollm2 are giving pretty good speed on the raspberry, and could be of great use for local inference on edge devices.