Skip to main content

Persistent storage

Whether you are deploying a database or an image gallery, sooner or later you will end up having to persist data across pods.

The way you usually add persistent storage to an application in Kubernetes is through a Persistent Volume. In order to get access to a Persistent Volume, you will need to create PersistentVolumeClaim. This can easily be done using a manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myapp-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi

By creating this Persistent Volume Claim, you implicitly also create a persistent volume, which can be used by a pod. Once deployed, you can run the following command to check if your claim has been reserved successfully:

> kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myapp-pvc Bound pvc-4e17de08-34a5-4177-9c96-90f799f0181e 500Mi RWO longhorn 10s
tip

pvc is short for persistentvolumeclaims. There are many more of these shorthands, like po, svc or deploy. There is a list of short codes that might come in handy when punching away KubeCTL commands.

Once deployed, Kubernetes will try to allocate space somewhere, where space is available. This could be a dedicated storage cluster via Ceph, a managed cloud service like AWS EBS or simply a local directory on the current node. Where the data ultimately resides is decided by the Storage Class of the volume claim.

In our case, the only storage class available is Longhorn, so you don't need to explicitly add the storageClass directive to the manifest of your PersistentVolumeClaim. Longhorn is a storage engine that ensures multiple replicas of your data across the cluster. This means that if the node with your data were to go down, there is at least one, but most likely two copies of that data on other nodes. Longhorn will automatically swap your existing volume to one of those replicas to ensure that your data is always available and secure.

Let's look at an example of how you might use a persistent volume in a deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:10.1
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: db0
- name: POSTGRES_USER
value: admin
- name: POSTGRES_PASSWORD
value: admin123
volumeMounts:
- mountPath: "/var/lib/postgresql/data"
subPath: pgdata # The data will be placed at "/pgdata" on the volume
name: myapp-pgdata
volumes:
- name: myapp-pgdata
persistentVolumeClaim:
claimName: myapp-pvc
danger

In this example, we're storing the credentials to the database in the manifest itself for the sake of simplicity. In a production environment, this should be avoided at all cost! Take a look at the "Secrets" page (to be written...) to learn how you can securely store sensitive data in a cluster.

Here, we're creating a deployment of PostgreSQL. We first link the volume to the deployment using the volumes directive, and then mount that volume to a path in a container. Postgres by default stores its data under /var/lib/postgresql/data, so that's where we want to mount our volume.

After applying the resources, you should see that a new pod has been created by the deployment:

> kubectl get pod   
NAME READY STATUS RESTARTS AGE
postgres-5c7dfc7f67-qrns4 1/1 Running 0 83s