Clojure for Machine Learning


Modrzyk, N. (2019). Clojure. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham.

By the same sane community that gives us tablecloth, comes a full blown Machine Learning library named the scicloj.

Explaining the details of the whole library or machine learning in general being beyond the scope of this article, we will instead present how the reader can put into action a KNN, K Nearest Neighbor algorithm to guess quotes retrieved from the previously presented JQuants API. What is KNN? K Nearest Neighbor is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure. It is mostly used to classifies a data point based on how its neighbors are classified.

As an exercise we will try to predict Sony’s Close quote for the next day, supposing today is 2022/09/30 and we want to predict the quote for 2022/10/01, although October 1st being a Saturday the market is closed …

We update the deps.edn of our learning project with and, jna to help the library native calls.

org.clojure/clojure {:mvn/version "1.11.1"}
scicloj/ {:mvn/version "0.2.0"}
scicloj/tablecloth {:mvn/version "6.094.1"} {:mvn/version "5.12.1"}
net.clojars.hellonico/jquants-api-jvm {:mvn/version "0.2.9"}

In our Clojure REPL, or namespace, we will start by setting up the namespaces and retrieve the quotes since October 2020.

  '[ :as ml]
  '[ :as ds]
  '[hellonico.jquants-api :as api]
  '[ :as mm])

(def companyNameEnglish "Sony")

(def quotes
     (api/daily-fuzzy {:CompanyNameEnglish companyNameEnglish :from 20201002 :to 20220930})))

As a reminder the quotes are returned by the jquants API in an edn format, where each day entry looks like the one below:

As presented in the previous section, we can create a dataset that will be used to train the KNN model.

(def df
             {:x (map #(Integer/parseInt (% :Date)) quotes)
               :y (map :Close quotes)}))

Now, the only input parameter for our model would be the date, and what we want to get out of is a value representing the Close quote.

So, we create an ml pipeline, where the target is the value :y, and we specify a parameter of k 20 for the knn classification, so the model will look over 20 neighbors to generate the :y value.

(def pipe-fn
    (mm/set-inference-target :y)
    (mm/categorical->number [:y])
      {:model-type :smile.classification/knn
       :k 20})))

Now we :fit train the knn model calling the pipe-fn function on the dataset of quotes per date that we defined earlier on:

(def trained-ctx
  (pipe-fn {:metamorph/data df :metamorph/mode :fit}))

The training is very fast on this small dataset, and we are ready to use the trained context, with

(defn guess [date]
      {:metamorph/data (ds/dataset {:x [date]  :y [nil]}) :metamorph/mode :transform}) 
    :metamorph/data (ds/column-values->categorical :y)))

Let’s try it with:

(guess 20220720)
; 4032

You can of course compare with the real quote:

(defn one-quote [date]
  				{:CompanyNameEnglish companyNameEnglish 
  				 :date date})))))

(one-quote 20220719)
; 3944

The value is slightly different than the actual quote on that day, but there were big jumps on the :Close value around those days, and the KNN model does a very good job at staying in a valid range.

Now, it would be nice to see what happens when we train our model over multiple Companies or over a longer period, and we leave that as an exercise to the reader.

The code for both the data frame section and the machine learning section are available in the JQuants samples.