Contents

Transparently use multiple Ollama servers

Contents

Just a code summary of a crazy week.

At this point Pyjama can load balance requests to different ollama server transparently via pyjama.parallel/generate.

Basically, that gives:

(->>
    {:url "http://localhost:11432,http://localhost:11434"
     :models  ["llama3.1"]
     :format  {:type "integer"}
     :pre     "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
       - Do not give an answer yourself.
       - No comment.
       - No explanation.
       - No extra text. "
     :prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
               ["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
               ["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
               ["Why is the sky blue" "Because it is Christmas. "]
               ]}
    (pyjama.parallel/generate)
    (map #(-> [(:prompt %) (:result %) (:url %)]))
    (clojure.pprint/pprint))

And yes, the prompts are dispatched to the different ollama servers as can he seen in the urls in the answer, showing where the request was effectively executed. The set of URLs can also be set via the OLLAMA_URL system env at runtime.

Note how the model properly shows a low score for the Christmas answer.

prompt result duration-ms
[“Why is the sky blue” “The sky appears blue because of a process called Rayleigh scattering.”] 95 238.981375
[“Why is the sky blue” “During the day the sky looks blue because it’s the blue light that gets scattered the most. “] 80 416.514791
[“Why is the sky blue” “Blue is scattered more than other colors because it travels as shorter, smaller waves.”] 90 242.423916
[“Why is the sky blue” “Because it is Christmas. “] 10 1409.653166

On the models setting line:

:models  ["llama3.1"]

See how this is an array, and how you can add more models and compare the answers.

(->>
    {:url "http://localhost:11432,http://localhost:11434"
     :models  ["llama3.1" "tinyllama"]
     :format  {:type "integer"}
     :pre     "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
       - Do not give an answer yourself.
       - No comment.
       - No explanation.
       - No extra text. "
     :prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
               ["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
               ["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
               ["Why is the sky blue" "Because it is Christmas. "]
               ]}
    (pyjama.parallel/pgen)
    (sort-by :model) 
    (pyjama.io.print/print-table [:model :url :result :duration-ms]))

Right here we can see, how bad the results from tinyllama are, with tinyllama answering out of range scores when the original query was for a 1-100 score.

Yes, we can also see, the format is forced on the model, and so the result column is super short and convenient to handle for further processing.

model url result duration-ms
llama3.1 http://localhost:11432 90 221.19325
llama3.1 http://localhost:11434 80 2240.858209
llama3.1 http://localhost:11432 20 178.245167
llama3.1 http://localhost:11432 96 223.569917
tinyllama http://localhost:11432 -100 180.978458
tinyllama http://localhost:11434 100 463.189667
tinyllama http://localhost:11434 355902 217.185542
tinyllama http://localhost:11434 30 107.712584

Pratical

When you have a bunch of questions for a specific text, pyjama.parallel/generate is a very good option.

Here we review a resume for a given IT position in Japan.

(let [resume (pyjama.io.readers/extract-text "..path_to_pdf.pdf")
        job-position "financial IT engineer in Japan"]
    (->>
      {:url     "http://localhost:11432"
       :models  ["llama3.1"]
       :pre     ["This is a potential resume %s01 for this job position: %s02.
                  In this context answer the question: %s03"
                 resume
                 job-position]
       :system  "always answer in a very short less than 50 characters sentence, and in one single line"
       :options {:num_context 15000}
       :prompts ["is the resume relevant"
                 "whats the average previous job length"
                 "what is the most obvious flaw"
                 "what are the 3 worst points of the resume"]}
      (pyjama.parallel/generate)
      (sort-by :model)
      (pyjama.io.print/print-table [:prompt :result])))

Context is set to 15000, changing from the 4096 default, so we can fit the whole resume inside the model context.

prompt result
what are the 3 worst points of the resume You’re not experienced in Japanese culture or language, so it’s unlikely you’ll get hired as a financial IT engineer in Japan.
is the resume relevant You lack relevant experience and certifications for a Financial IT Engineer role in Japan.
what is the most obvious flaw I’m interested in exploring opportunities in Japan!
whats the average previous job length You don’t meet the location requirement.

See how you can ominously use this code to sort out the set of applicants to join your company.