Raspberry PI, OpenCV, Deep Neural Networks and of course a bit of Clojure

I had to write a simple IoT prototype recently, counting the number of people in a queue in realtime recently. Of course, I could have hired someone to do that and just keep counting people, or … I could write a program in clojure that runs on a raspberry pi and detect the number of heads via a video stream.

We, You, have learned recently that using inlein, We, You, could easily write scripts in clojure including dependencies, and run them just about anywhere, at some quite decent speed.

Now the other good news is that origami just released a version of its DNN sibling, origami-dnn, with all you need to get started using Deep Neural Networks in Clojure.

Clojure-based Origami DNN, which is a wrapper around OpenCV’s DNN features, does three things rather nicely:

it makes a set of known trained networks immediately available for use
it integrated wonderfuly with Origami, so you can immediately use returned results from network with your image, videos read from file, or video streams from webcams.
it also focus on simply extending the clojure core threading macro -> usage from Origami, and plug itself in in an intuitive way.

Of course, as its sibling, no need to install OpenCV, or compile anything, this works on Raspberry, OSX, Windows, Linux etc … the binaries are pre-compiled and bundled, ready to be used transparently.

To set expectations, the goal of this article will not talk about training a network yet, only how to use a pre-trained network on a Raspberry Pi.

Getting ready

We mostly need a JDK and again here I like the ARM-ready JDKs packaged by Bellsoft. There is virtually no noticeable speed difference between most recent JDKs … JDK8 ~ JDK13 …are mostly running at same performance levels. IMHO.

The other thing we need is the thing that run the Clojure scripts, the daemon that makes us ready to run Clojure scripts, and this is where inlein comes in.

installing inlein (again)

Just to be sure, and because apart from a Java install, this is the only thing you need, download the inlein scripts and put it somewhere on a path readay for usage, on raspberry /usr/local/bin is cool.

wget https://github.com/hypirion/inlein/releases/download/0.2.0/inlein
chmod +x inlein
mv inlein /usr/local/bin

W a’re mostly there, let’s get some Clojure scripts from github …

getting the scripting samples (again)

Some Origami and Origami DNN scripting samples are located on github, and you can clone them with:

git clone https://github.com/hellonico/origami-scripting.git

Notably, the first example we will be looking at is this one.

Running a DNN network on an image.

Yolo v3 may not be the fastest network to perform object detection, but it’s still one of my favorite. Our first goal is to run a Yolo pre-trained network, the one provided if you do a local yolo install, to recognize and classify a cat.

Sounds like your usual Neural Network exerice and yes, we just want to make sure things are kept simple.

The core of the example is in the few lines below, a simple threading usage on an input image. Let’s drop this here, and analyse what it does, and how it does it.

(-> input
        (imread)
        (yolo/find-objects net)
        (blue-boxes! labels)
        (imwrite output))

input is a path to an image, that you then feed to imread or u/mat-from-url of the Origami library.

I think eventually those two functions will be merged in a single one at some stage so it does not matter whether the path to the image is local or not.

We then thread the OpenCV’s mat object through yolo/find-objects. That function from origami-dnn internally converts the origami/opencv mat to a blob image, in a format (number of channels, order of the channels, sie of the picture etc…) expected by the Yolo network.

The network is always trained on preformatted and prepared images, here when using Yolo the images are 416x416, and this is the indeed the size of the input matrix used to feed in the network.

(d/blue-boxes! labels) or (blue-boxes! labels) gets the results from the find-objects call, which is:

the image itself, so it can be piped through a threading macro for further usage
a sequence of maps with keys: {:label :box :confidence}, one map per detected object.

The labels themselves are retrieved when loading the network. Oh yes, that’s right we haven’t seen that part yet. The network is loaded from a remote repository using read-net-from-repo. This downloads the network files, and put them on the local device/machine for usage or re-usage. You only need to download the network files once, later runs will re-use the locally cached version.

(let [[net opts labels] (read-net-from-repo "networks.yolo:yolov2-tiny:1.0.0")]
  ...)

On using read-net-from-repo, you retrieve three variables:

net, which is an OpenCV object ready to be acted upon
opt, are read from a custom edn file, and can be used to tweak runs of the network
labels, i still can’t believe it is sometimes so hard to find which labels were used when training this or this network, so sometimes the output does not even makes sense.

Ok, so we loaded the network, and then threaded the result of the network run onto blue-boxes!.. You can write your own blue-boxes, indeed, the function is simply looping (doseq in clojure ..) over the results and drawing a blue rectangle along with adding the label and the confidence, as is usually done. You can of course set a color of your choice depending on the label quite easily with a map or something.

(defn blue-boxes! [result labels]
  (let [img (first result) detected (second result)]
  (doseq [{confidence :confidence label :label box :box} detected]
    (put-text! img (str (nth labels label) "[" confidence " %]") 
       (new-point (double (.-x box)) (double (.-y box))) 
       FONT_HERSHEY_PLAIN 2 
       (new-scalar 255 0 0))
    (rectangle img box (new-scalar 255 0 0) 2))
  img))

Running a DNN network on a webcam

If you been using origami to do stream processing before, you realize it is quite easy to plug-in the object detection on an image, as was done above, directly on the mat read from the video stream.

Cutting out the imports, the snippet below is enough to do object detections on a video stream:

(let [[net _ labels] (origami-dnn/read-net-from-repo "networks.yolo:yolov3-tiny:1.0.0")]
    (u/simple-cam-window
    {:frame {:fps true} :video {:device 0}}
     (fn [buffer]
       (-> buffer 
             (yolo/find-objects net) 
             (d/blue-boxes! labels) ))))

Note that we just plugged in the u/simple-cam-window to retrieve the stream and have access to a Mat object, here named buffer. We then apply the same find-objects and blue-boxes! functions.

Other packaged networks

Some of the other ready to use networks are listed below, you can just cut and replace the one used in the examples above …

Network Descriptor	Comments
networks.yolo:yolov3-tiny:1.0.0	Yolo v3 tiny
networks.yolo:yolov3:1.0.0	Yolo v3
networks.yolo:yolov2:1.0.0	Yolo v2
networks.yolo:yolov2-tiny:1.0.0	Yolo v2 tiny
networks.caffe:places365:1.0.0	Reconize places
networks.caffe:convnet-gender:1.0.0	Reconize gender
networks.caffe:convnet-age:1.0.0	Reconize Age

places365, convnet’s gender and convnet’s age, need slightly different functions to use the return values of the network, so you can have a look. In general, more demos are available should you want to try and plug in your own. Looking forward to here your feedback and what you detected with origami-dnn.