How to use AWS S3 for persistence in Kubernetes

Author included in IT

2022-12-16 1204 words 6 minutes

Contents

We recently deployed a Kubernetes cluster based on KubeSpray at a client. Since there is no out of the box cloud storage, one of the first task to clear for the cluster deployment is the data persistence.

We gave sshfs a try, and then thought we could actually also manage the distributed persistence using Goofys. Goofys allows you to simply mount an AWS S3 bucket as a file system. In effect, all the S3 data is synchronized between the different nodes of the cluster.

This post shows the few steps and configuration files to deploy a running iccube pod using Goofys as the pod persistence.

First let’s check the nodes of the Kubernetes cluster

kubectl get nodes   
                     
NAME       STATUS   ROLES           AGE   VERSION
akita      Ready    <none>          9d    v1.25.4
aomori     Ready    control-plane   29d   v1.25.4
hokkaido   Ready    control-plane   29d   v1.25.4
iwate      Ready    <none>          29d   v1.25.4
miyagi     Ready    <none>          29d   v1.25.4

Let’s make sure we have the s3 bucket, named for this article s3cubebucket, to mount our files:

And let’s run this ansible task to deploy the AWS credentials,. The goofys binary and mount the goofys filesystem directly on the non-control-plane nodes.

---
- name: create .aws directory
  file:
    path: '/root/.aws'
    state: directory

- name: copy aws credentials
  ansible.builtin.copy:
      src: "{{ item }}"
      dest: /root/.aws
      mode: 0600
      owner: root
  loop: 
    - config
    - credentials
    
- name: download goofys
  get_url:
    url: https://github.com/kahing/goofys/releases/latest/download/goofys
    dest: /usr/local/bin
    mode: 0755
    owner: me

- name: umount
  command: sudo umount /kube/iccube
  ignore_errors: yes

- name: mount
  command: sudo /usr/local/bin/goofys  -o allow_other  --file-mode=0777 --dir-mode=0777 s3cubebucket /kube/iccube

Speed testing

The Goofys filesystem being mounting, we can go to a node directly and see what are the read/write speeds. We are going to simply use dd to create a 1G file and then read from it

The steps used for the speed testing are mostly taken and adapted from the following article.

WRITE

$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync 
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.405 s, 103 MB/s

READ biased with recent FS caching

$ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.301232 s, 3.6 GB/s

READ after cleaning cache

$ sudo /sbin/sysctl -w vm.drop_caches=3
vm.drop_caches = 3

$ dd if=tempfile of=/dev/null bs=1M count=1024

1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.4693 s, 103 MB/s

cleanup

Let’s not forget to remove the temporary file created during the speed test.

rm tempfile

hdparm did not work for 3 mounts

I briefly tried to use hdparm for measuring speed, in the hope of comparing with the results from dd, but without success, seems like this is not supported by fuse.

$ sudo hdparm -Tt /kube/iccube 

/kube/iccube:
read() failed: Is a directory
 BLKGETSIZE failed: Inappropriate ioctl for device
BLKFLSBUF failed: Inappropriate ioctl for device

Use Goofys with Kubernetes to deploy iccube

A quick overview of the target design is shown in the diagram below:

The binding between all the boxes will be done when the pod has been properly scheduled to be deployed on the cluster. Meaning. We need to create a pod so that the PVC/PV are bind-ed properly.

So, remembering our goal here is to deploy Iccube pods on our Kubernetes cluster using the S3 bucket as persistence.

At first, It is not very obvious which and how to prepare all the yaml files for a proper deployment. Let’s list all the 7 files first:

namespace.yml
local-storage.yml
pv.yml and pv2.yml
pvc.yml and pvc2.yml
deployment.yml

And go over each file usage.

namespace.yml

For the binding to work between the container and the pv/pvc we need the all set to be within the same namespace. We will put all of it in the iccube namespace.

apiVersion: v1
kind: Namespace
metadata:
  name: iccube

local-storage.yml

This is the storage class that will be used in the pv/pvc definitions. There is close to no space for variation here,, apart from the mount option.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: my-local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
mountOptions:
  - debug

pv.yml

This is the Persistent Volume definition. Two things to note here:

it maps the path to the goofys mount (or within the goofys mount) and ,
we need to specify the node affinity on nodes that have the goofys mount setup.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-volume
  labels:
    type: local
spec:
  storageClassName: my-local-storage
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/kube/iccube/data"
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - iwate
          - miyagi

pvc.yml

Here again you need to define the storage size, but the check is only there to see if the PVC size is lower than the PV size, and does not affect the actual goofys mount. Note two other things here:

we need ReadWriteMany as the accessMode to make this accessible to other pods
the namespace needs to match the namespace of the pod/deployment that comes after, so here iccube

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: iccube-pvc
  namespace: iccube
spec:
  storageClassName: my-local-storage
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi

deployment.yml

Finally, this is our deployment file to deploy 2 pods for iccube. We skipped short the content of the iccube-pvc2 (along pv2) so we will let the reader fill in the (easy) blanks.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iccube
  namespace: iccube
spec:
  selector:
    matchLabels:
      run: iccube
  replicas: 2
  template:
    metadata:
      labels:
        run: iccube
    spec:
      volumes:
        - name: iccube-volume
          persistentVolumeClaim:
            claimName: iccube-pvc
        - name: iccube-volume2
          persistentVolumeClaim:
            claimName: iccube-pvc2
      containers:
        - name: iccube
          image: ic3software/iccube:8.2.2-chromium
          ports:
            - containerPort: 8282
          volumeMounts:
            - mountPath: "/home/ic3"
              mountPropagation: HostToContainer
              name: iccube-volume
            - mountPath: "/opt/icCube/bin"
              mountPropagation: HostToContainer
              name: iccube-volume2

service.yml

We also need a service definition to export the pod to the outside world. Here is a simple one use NodePort to expose the http port of iccube.

apiVersion: v1
kind: Service
metadata:
  name: iccube
  namespace: iccube
spec:
  type: NodePort
  sessionAffinity: ClientIP
  selector:
    matchLabels:
      run: iccube
  ports:
    - port: 8282
      targetPort: 8282
      nodePort: 30000
      protocol: TCP
  selector:
    run: iccube

Check the iccube pods are running

Deploying the files one by one in the following order:

kubectl apply -f local-storage.yml
kubectl apply -f pv.yml 
kubectl apply -f pv2.yml 
kubectl apply -f pvc.yml
kubectl apply -f pvc2.yml 
kubectl apply -f deployment.yml 
kubectl apply -f service.yml

We can see if the pvc are bind-ed properly:

kubectl get pvc -n iccube -o wide                                                                                  
NAME          STATUS   VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS       AGE   VOLUMEMODE

iccube-pvc    Bound    my-volume    50Gi       RWX            my-local-storage   41h   Filesystem

iccube-pvc2   Bound    my-volume2   50Gi       RWX            my-local-storage   41h   Filesystem

And if they are, the pods should be started:

kubectl get pods -n iccube -o wide                                                                                 

NAME                      READY   STATUS    RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES

iccube-59ffff668f-g68h6   1/1     Running   0          40h   10.233.121.70   miyagi   <none>           <none>

iccube-59ffff668f-rmsbv   1/1     Running   0          41h   10.233.76.247   iwate    <none>           <none>

The pods can only be started on the cluster nodes specified in the pv’s nodeAffinity section.

The S3 bucket contains the iccube files:

And if you are on the same network you can now access iccube via the 30000 port specified in NodePort.

Going further

There are a few things you can check further. Deleting the pods (or creating new ones) does not delete the data in the S3 bucket and so the data is properly propagated.

The read/write speed are very decent.

#articles