On your provisioning machine
During the installation process, a cluster config directory is created
on your provisioning machine, located in the top-level sub-directory
clusters in your clone of the stackspin git repository. Although
these files are not essential for your Stackspin cluster to continue
functioning, you may want to back this folder up because it allows easy
access to your cluster.
On your cluster
Stackspin supports using the program Velero to make backups of your Stackspin instance to external storage via the S3 API. See Backups with Velero (Optional) in the installation instructions for setup details.
For the maintenance operations described below – in particular, restoring
backups – you need the
velero client program installed, typically on your
provisioning machine although you can also run it on the VPS if preferred. You
may find it at Velero’s github release page.
By default Velero will make nightly backups of the entire cluster (minus Prometheus data). To make a manual backup, run
cluster$ velero create backup BACKUP_NAME --exclude-namespaces velero --wait
from your VPS. See
velero --help for other commands, and Velero’s
documentation for more information.
Note: in case you want to make an (additional) backup of application
data via alternate means, all persistent volume data of the cluster are
stored in directories under
Restoring from backups is a process that for now has to be done via the command line. We intend to allow doing this from the Stackspin dashboard instead in the future.
These instructions explain how to restore the persistent data of an individual app (such as Nextcloud, or Zulip) to a previous point in time, from a backup to S3-compatible storage made using velero, on a Stackspin cluster that is in a healthy state. Using backups to recover from more severe problems, like a broken or completely destroyed Stackspin cluster, is also possible, by reinstalling the cluster from scratch and restoring individual app data on top of that. However, that procedure is not so streamlined and not documented here. If you are in that situation, please reach out to us for advice or assistence.
To show a list of available backups, perform the following command on your VPS:
$ kubectl get backup -A
Once you have chosen a backup to restore from, record its name as written in
Please be aware that for technical reasons the restore operation will restore not only the persistent data from this backup, but also the app’s software version that was running at that time. Although the auto-update mechanism should in turn update the app to a recent version, and the recent app version should be able to automatically perform any necessary data format migrations on the old data, this operation has not been tested for older backups, so please proceed carefully. As an example of what could go wrong, Nextcloud requires upgrades to be done in a serial fashion, never skipping a major version upgrade, so if your backup is from two or more major Nextcloud versions ago, some manual intervention is required. If you have any doubts, please reach out to us.
Restore app data
Please note that restoring data is a destructive operation! It will replace the app’s data as they are now. There is no way to undo a restore operation, unless you have a copy of the current app data, in the form of a current Stackspin backup or an app-specific data export. For that reason, we recommend making another backup right before beginning a restore operation.
To restore the data of app
$app (for restoring the dashboard, see the note
at the end of this subsection) from the backup named
$backup, perform the
$ flux suspend kustomization $app
$ flux suspend helmrelease -n stackspin-apps $app
$ kubectl delete all -n stackspin-apps -l stackspin.net/backupSet=$app
$ kubectl delete secret -n stackspin-apps -l stackspin.net/backupSet=$app
$ kubectl delete configmap -n stackspin-apps -l stackspin.net/backupSet=$app
$ kubectl delete pvc -n stackspin-apps -l stackspin.net/backupSet=$app
$ velero restore create arbitrary-name-of-restore-operation --from-backup=$backup -l stackspin.net/backupSet=$app
At this point, please first wait for the restore operation to finish, see text below.
$ flux resume helmrelease -n stackspin-apps $app
$ flux resume kustomization $app
Specifically for Nextcloud, the
kubectl delete pvc ... command might hang due
to a Kubernetes job that references that PVC. To solve that, look for such jobs
kubectl get job -n stackspin-apps and delete any finished ones using
kubectl delete job .... That should let the
kubectl delete pvc ...
command finish; if it was already terminated, run it again.
velero restore create ... command initiates the restore operation, but
it doesn’t wait until the operation is complete. You may use the commands
suggested in the terminal output to check on the status of the operation.
Additionally, once the restore operation is finished, it may take some more
time for the various app components to be fully started and for the app to be
To restore the “dashboard” data, which contains among other things the set
of Stackspin users, follow the instructions above, using
$app, except that the kustomization to suspend and resume is the
single-sign-on one, and the helmrelease to suspend and resume is the
single-sign-on-database one in the
Change the IP of your cluster
In case your cluster needs to migrate to another IP, make sure to update
the IP address in
/etc/rancher/k3s/k3s.yaml and, if applicable, your
local kube config and inventory.yml in the cluster directory
Delete evicted pods
In case your cluster disk is full, kubernetes taints the node with
DiskPressure. Then it tries to evict pods, which is pointless in a single
node setup but can still happen. We have experienced hundreds of pods in
evicted state that still showed up after
DiskPressure had recovered. See
also the out of resource handling with kubelet documentation.
You can delete all evicted pods with this command:
$ kubectl get pods --all-namespaces -ojson | jq -r '.items | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | .metadata.name + " " + .metadata.namespace' | xargs -n2 -l bash -c 'kubectl delete pods $0 --namespace=$1'
Nextcloud includes a CLI tool called
occ (“OwnCloud Console”).
This tool can be used for all kinds of tasks
you might want to do as a system administrator.
To use the tool, you need to enter Nextcloud’s “pod” and change to the correct user. The following commands achieve that:
exec opens a root terminal inside the pod:
$ kubectl -n stackspin-apps exec deploy/nc-nextcloud -it -- bash
Change to the
$ su -s /bin/bash www-data
$ php occ list