Transparently use multiple Ollama servers
Just a code summary of a crazy week.
At this point Pyjama can load balance requests to different ollama server transparently via pyjama.parallel/generate.
Basically, that gives:
(->>
{:url "http://localhost:11432,http://localhost:11434"
:models ["llama3.1"]
:format {:type "integer"}
:pre "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
- Do not give an answer yourself.
- No comment.
- No explanation.
- No extra text. "
:prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
["Why is the sky blue" "Because it is Christmas. "]
]}
(pyjama.parallel/generate)
(map #(-> [(:prompt %) (:result %) (:url %)]))
(clojure.pprint/pprint))
And yes, the prompts are dispatched to the different ollama servers as can he seen in the urls in the answer, showing where the request was effectively executed. The set of URLs can also be set via the OLLAMA_URL system env at runtime.
Note how the model properly shows a low score for the Christmas answer.
prompt | result | duration-ms |
---|---|---|
[“Why is the sky blue” “The sky appears blue because of a process called Rayleigh scattering.”] | 95 | 238.981375 |
[“Why is the sky blue” “During the day the sky looks blue because it’s the blue light that gets scattered the most. “] | 80 | 416.514791 |
[“Why is the sky blue” “Blue is scattered more than other colors because it travels as shorter, smaller waves.”] | 90 | 242.423916 |
[“Why is the sky blue” “Because it is Christmas. “] | 10 | 1409.653166 |
On the models setting line:
:models ["llama3.1"]
See how this is an array, and how you can add more models and compare the answers.
(->>
{:url "http://localhost:11432,http://localhost:11434"
:models ["llama3.1" "tinyllama"]
:format {:type "integer"}
:pre "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
- Do not give an answer yourself.
- No comment.
- No explanation.
- No extra text. "
:prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
["Why is the sky blue" "Because it is Christmas. "]
]}
(pyjama.parallel/pgen)
(sort-by :model)
(pyjama.io.print/print-table [:model :url :result :duration-ms]))
Right here we can see, how bad the results from tinyllama are, with tinyllama answering out of range scores when the original query was for a 1-100 score.
Yes, we can also see, the format is forced on the model, and so the result column is super short and convenient to handle for further processing.
model | url | result | duration-ms |
---|---|---|---|
llama3.1 | http://localhost:11432 | 90 | 221.19325 |
llama3.1 | http://localhost:11434 | 80 | 2240.858209 |
llama3.1 | http://localhost:11432 | 20 | 178.245167 |
llama3.1 | http://localhost:11432 | 96 | 223.569917 |
tinyllama | http://localhost:11432 | -100 | 180.978458 |
tinyllama | http://localhost:11434 | 100 | 463.189667 |
tinyllama | http://localhost:11434 | 355902 | 217.185542 |
tinyllama | http://localhost:11434 | 30 | 107.712584 |
Pratical
When you have a bunch of questions for a specific text, pyjama.parallel/generate is a very good option.
Here we review a resume for a given IT position in Japan.
(let [resume (pyjama.io.readers/extract-text "..path_to_pdf.pdf")
job-position "financial IT engineer in Japan"]
(->>
{:url "http://localhost:11432"
:models ["llama3.1"]
:pre ["This is a potential resume %s01 for this job position: %s02.
In this context answer the question: %s03"
resume
job-position]
:system "always answer in a very short less than 50 characters sentence, and in one single line"
:options {:num_context 15000}
:prompts ["is the resume relevant"
"whats the average previous job length"
"what is the most obvious flaw"
"what are the 3 worst points of the resume"]}
(pyjama.parallel/generate)
(sort-by :model)
(pyjama.io.print/print-table [:prompt :result])))
Context is set to 15000, changing from the 4096 default, so we can fit the whole resume inside the model context.
prompt | result |
---|---|
what are the 3 worst points of the resume | You’re not experienced in Japanese culture or language, so it’s unlikely you’ll get hired as a financial IT engineer in Japan. |
is the resume relevant | You lack relevant experience and certifications for a Financial IT Engineer role in Japan. |
what is the most obvious flaw | I’m interested in exploring opportunities in Japan! |
whats the average previous job length | You don’t meet the location requirement. |
See how you can ominously use this code to sort out the set of applicants to join your company.