The volume group snapshot feature, introduced as an Alpha in Kubernetes 1.27 has now reached Beta in version 1.32. This functionality enables crash-consistent snapshots of multiple volumes using group snapshot extension APIs. Kubernetes organizes PersistentVolumeClaims (PVCs) into groups via a label selector for snapshotting. The feature’s primary goal is to facilitate workload recovery by restoring a collection of snapshots to new volumes from a crash-consistent recovery point.
This feature is supported exclusively for CSI volume drivers.
Understanding Volume Group Snapshots
Certain storage systems can create crash-consistent snapshots of multiple volumes simultaneously. These “group snapshots” ensure that all volumes are captured at the same point in time. Group snapshots can either populate new volumes with the snapshot data or restore existing volumes to a previous state.
Reasons for Implementing Volume Group Snapshots
The ability to take consistent group snapshots is beneficial for applications that span multiple volumes, ensuring that all components are captured at the same point in time. Although it is possible to manually quiesce applications before taking individual snapshots, this process can be time-consuming or impractical in certain scenarios. Consequently, users may prefer to conduct regular backups with application quiescence while relying on consistent group snapshots for more frequent backups.
Read Also: Data Management in Kubernetes with Portworx
Kubernetes APIs for Volume Group Snapshots
Kubernetes manages volume group snapshots using three API resources:
- VolumeGroupSnapshot: A user-defined object that requests a snapshot for multiple PVCs. It includes metadata like the creation timestamp and readiness status.
- VolumeGroupSnapshotContent: Automatically created by the snapshot controller for dynamically provisioned snapshots, storing details like the snapshot ID. Each instance is uniquely mapped to a corresponding VolumeGroupSnapshot.
- VolumeGroupSnapshotClass: Defined by administrators to specify how group snapshots should be created, including driver information and deletion policies.
These APIs are implemented as CustomResourceDefinitions (CRDs) and require installation in Kubernetes clusters for CSI driver compatibility.
Components Supporting Volume Group Snapshots
The implementation of volume group snapshots, part of the external-snapshotter repository, involved updates to multiple components:
- New CRDs for VolumeGroupSnapshot and related APIs.
- Snapshot controller logic enhancements.
- CSI call logic integrated into the snapshotter sidecar controller.
The volume snapshot controller and CRDs operate at the cluster level, while the snapshotter sidecar is deployed with each CSI driver. Kubernetes encourages distributors to include the snapshot controller and CRDs in their cluster management processes as a default addon.
Beta-Stage Improvements
- CSI Specification Update: VolumeGroupSnapshot support reached General Availability (GA) in CSI spec v1.11.0.
- Validation Webhook Removal: Deprecated in external-snapshotter v8.0.0, its rules were mostly transferred to CRDs, requiring Kubernetes v1.25 or newer. Some validation rules, such as preventing multiple default snapshot classes for the same driver, remain outside CRDs but still trigger errors during provisioning.
- Feature Gate Introduction: The
--enable-volumegroup-snapshot
flag was replaced with a feature gate (--feature-gates=CSIVolumeGroupSnapshot=true
). The feature is disabled by default. - RBAC Rule Updates: Responsibility for dynamic snapshot creation moved from the CSI snapshotter to the common snapshot controller, updating the required RBAC rules.
Using Kubernetes Volume Group Snapshots
Creating a New Group Snapshot
To create a group snapshot:
- Define a
VolumeGroupSnapshotClass
specifying the CSI driver and provisioning rules. - Create a
VolumeGroupSnapshot
, which either dynamically provisions the snapshot or references a pre-existingVolumeGroupSnapshotContent
.
For dynamic provisioning, use a selector to group PVCs by labels. The creation process also generates individual volume snapshots and a VolumeGroupSnapshotContent
with references to the underlying storage.
Importing an Existing Group Snapshot
To import an existing group snapshot, manually create the following:
VolumeSnapshotContent
objects for each individual snapshot.- A
VolumeGroupSnapshotContent
with references to the individual snapshot handles. - A
VolumeGroupSnapshot
referencing theVolumeGroupSnapshotContent
.
Restoring from a Group Snapshot
Restore involves creating new PVCs from the individual snapshots in the group. Repeat the process for each snapshot to fully restore the application state.
Supporting Volume Group Snapshots in a CSI Driver
To implement support, CSI drivers must:
- Introduce a new group controller service.
- Implement RPCs for creating, deleting, and retrieving group snapshots.
- Add the
CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT
capability.
The Kubernetes project recommends bundling the snapshot controller and CRDs with cluster management processes, independent of CSI drivers. The external-snapshotter sidecar monitors API server changes and triggers CSI operations for group snapshots.
Limitations and Future Plans
Current limitations include:
- No support for reverting existing PVCs to earlier states (only creating new volumes).
- Application consistency is limited to what the storage system provides (e.g., crash consistency).
Future releases aim to gather feedback and increase adoption to advance the feature to General Availability (GA).
Looking forward to seeing how this feature evolves as it moves toward general availability.
Really cool