Transparently use multiple Ollama servers

Author included in AI

2025-02-07 776 words 4 minutes

Contents

Just a code summary of a crazy week.

At this point Pyjama can load balance requests to different ollama server transparently via pyjama.parallel/generate.

Basically, that gives:

(->>
    {:url "http://localhost:11432,http://localhost:11434"
     :models  ["llama3.1"]
     :format  {:type "integer"}
     :pre     "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
       - Do not give an answer yourself.
       - No comment.
       - No explanation.
       - No extra text. "
     :prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
               ["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
               ["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
               ["Why is the sky blue" "Because it is Christmas. "]
               ]}
    (pyjama.parallel/generate)
    (map #(-> [(:prompt %) (:result %) (:url %)]))
    (clojure.pprint/pprint))

And yes, the prompts are dispatched to the different ollama servers as can he seen in the urls in the answer, showing where the request was effectively executed. The set of URLs can also be set via the OLLAMA_URL system env at runtime.

Note how the model properly shows a low score for the Christmas answer.

prompt	result	duration-ms
[“Why is the sky blue” “The sky appears blue because of a process called Rayleigh scattering.”]	95	238.981375
[“Why is the sky blue” “During the day the sky looks blue because it’s the blue light that gets scattered the most. “]	80	416.514791
[“Why is the sky blue” “Blue is scattered more than other colors because it travels as shorter, smaller waves.”]	90	242.423916
[“Why is the sky blue” “Because it is Christmas. “]	10	1409.653166

On the models setting line:

:models  ["llama3.1"]

See how this is an array, and how you can add more models and compare the answers.

(->>
    {:url "http://localhost:11432,http://localhost:11434"
     :models  ["llama3.1" "tinyllama"]
     :format  {:type "integer"}
     :pre     "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
       - Do not give an answer yourself.
       - No comment.
       - No explanation.
       - No extra text. "
     :prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
               ["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
               ["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
               ["Why is the sky blue" "Because it is Christmas. "]
               ]}
    (pyjama.parallel/pgen)
    (sort-by :model) 
    (pyjama.io.print/print-table [:model :url :result :duration-ms]))

Right here we can see, how bad the results from tinyllama are, with tinyllama answering out of range scores when the original query was for a 1-100 score.

Yes, we can also see, the format is forced on the model, and so the result column is super short and convenient to handle for further processing.

model	url	result	duration-ms
llama3.1	http://localhost:11432	90	221.19325
llama3.1	http://localhost:11434	80	2240.858209
llama3.1	http://localhost:11432	20	178.245167
llama3.1	http://localhost:11432	96	223.569917
tinyllama	http://localhost:11432	-100	180.978458
tinyllama	http://localhost:11434	100	463.189667
tinyllama	http://localhost:11434	355902	217.185542
tinyllama	http://localhost:11434	30	107.712584

Pratical

When you have a bunch of questions for a specific text, pyjama.parallel/generate is a very good option.

Here we review a resume for a given IT position in Japan.

(let [resume (pyjama.io.readers/extract-text "..path_to_pdf.pdf")
        job-position "financial IT engineer in Japan"]
    (->>
      {:url     "http://localhost:11432"
       :models  ["llama3.1"]
       :pre     ["This is a potential resume %s01 for this job position: %s02.
                  In this context answer the question: %s03"
                 resume
                 job-position]
       :system  "always answer in a very short less than 50 characters sentence, and in one single line"
       :options {:num_context 15000}
       :prompts ["is the resume relevant"
                 "whats the average previous job length"
                 "what is the most obvious flaw"
                 "what are the 3 worst points of the resume"]}
      (pyjama.parallel/generate)
      (sort-by :model)
      (pyjama.io.print/print-table [:prompt :result])))

Context is set to 15000, changing from the 4096 default, so we can fit the whole resume inside the model context.

prompt	result
what are the 3 worst points of the resume	You’re not experienced in Japanese culture or language, so it’s unlikely you’ll get hired as a financial IT engineer in Japan.
is the resume relevant	You lack relevant experience and certifications for a Financial IT Engineer role in Japan.
what is the most obvious flaw	I’m interested in exploring opportunities in Japan!
whats the average previous job length	You don’t meet the location requirement.

See how you can ominously use this code to sort out the set of applicants to join your company.