Build with Kubernetes Operators¶
Use this guide when your product is already managed by a Kubernetes Operator and you want Omnistrate to turn it into a customer-facing SaaS Product.
Operators are a good fit when your application lifecycle is already expressed as Kubernetes custom resources: database clusters, message queues, streaming platforms, storage systems, AI platforms, or any product where an operator reconciles the desired state. Omnistrate keeps that operator model intact and adds the control plane around it: cloud account onboarding, tenant management, APIs, Customer Portal, subscriptions, deployment cells, networking, backups, restores, upgrades, observability, and fleet operations.
For a complete working example, use the operator spec template. It builds a CloudNativePG-based PostgreSQL service plan with create, modify, start, stop, scale, backup, restore, and delete-backup system workflows.
What You Define¶
An operator-backed service plan usually has four layers:
| Layer | Where it is defined | Purpose |
|---|---|---|
| Customer inputs | apiParameters | Values your customers provide, such as instance type, replica count, storage size, database name, or backup bucket. |
| Operator installation | operatorCRDConfiguration.helmChartDependencies or deployment-cell amenities | Installs the operator and required CRDs before tenant resources are reconciled. |
| Runtime lifecycle | systemWorkflows | Uses Argo Workflow-style DAGs to create, update, delete, start, stop, scale, back up, restore, and delete backups. |
| Provider operations | Optional customWorkflows | Exposes provider-defined actions that are not platform lifecycle APIs, such as compact, repair, diagnostics, or product-specific administrative tasks. |
Note
operatorCRDConfiguration.template, operatorCRDConfiguration.supplementalFiles, and operatorCRDConfiguration.readinessConditions are deprecated for operator lifecycle management. Prefer systemWorkflows because they let you model multi-step lifecycle operations, capture outputs, run backups and restores, and use Kubernetes resource success and failure conditions with the same workflow engine. Keep operatorCRDConfiguration.helmChartDependencies when the service plan should install the operator and CRDs.
Minimal Service Plan Shape¶
The operator spec template starts with a normal service plan:
name: Postgres Operator
deployment:
hostedDeployment:
awsAccountId: "<AWS_ACCOUNT_ID>"
awsBootstrapRoleAccountArn: "arn:aws:iam::<AWS_ACCOUNT_ID>:role/omnistrate-bootstrap-role"
tenancyType: CUSTOM_TENANCY
services:
- name: CNPG
compute:
instanceTypes:
- apiParam: instanceType
cloudProvider: aws
apiParameters:
- key: instanceType
name: Instance Type
type: String
modifiable: true
defaultValue: "t3.medium"
- key: numberOfInstances
name: Total Number of Instances
type: Float64
modifiable: true
defaultValue: "1"
- key: storageSize
name: Storage Size
type: String
modifiable: true
defaultValue: "20Gi"
Add endpoint configuration for customer-facing connection details:
endpointConfiguration:
writer:
host: "$sys.network.externalClusterEndpoint"
ports:
- 5432
primary: true
networkingType: PUBLIC
reader:
host: "reader-{{ $sys.network.externalClusterEndpoint }}"
ports:
- 5432
primary: false
networkingType: PUBLIC
Install the operator as a Helm dependency when the operator version should be tied to the service plan version:
operatorCRDConfiguration:
helmChartDependencies:
- chartName: cloudnative-pg
chartVersion: 0.28.2
chartRepoName: cnpg
chartRepoURL: https://cloudnative-pg.github.io/charts
- chartName: plugin-barman-cloud
chartVersion: 0.6.0
chartRepoName: cnpg
chartRepoURL: https://cloudnative-pg.github.io/charts
Note
The current operator template intentionally omits the deprecated operatorCRDConfiguration.template, supplementalFiles, and readinessConditions fields. Lifecycle resources, readiness, and outputs are modeled in systemWorkflows instead.
If the operator is shared by many tenant instances in the same deployment cell, install it as a deployment-cell amenity instead. That lets you upgrade the operator once per cluster instead of coupling it to every tenant instance upgrade.
Backup Capability¶
Backup policy belongs to the resource capability, not the workflow definition:
capabilities:
backupConfiguration:
backupRetentionInDays: 1
backupPeriodInHours: 1
snapshotBeforeDeletion: true
Omnistrate uses this configuration to schedule automated backups, set expiration for automated backups, run backup-before-delete when enabled, and trigger delete-backup cleanup for expired snapshots. Manual snapshots are user-controlled and are not expired by this schedule.
System Workflows¶
systemWorkflows are lifecycle hooks invoked by Omnistrate's existing platform APIs. They use an Argo Workflow-style structure: entrypoint, arguments.parameters, templates, DAG tasks, and Kubernetes resource templates.
The operator template uses the following system workflows:
| Workflow | Trigger | Example behavior |
|---|---|---|
create | Instance provisioning | Create secrets, create backup object-store configuration, apply the CNPG Cluster, and wait for ready instances. |
modify | Instance update or upgrade | Reapply the Cluster with updated inputs such as storage or replica count. |
delete | Instance delete | Delete the Cluster, secrets, and backup object-store resources. |
start | Start API | Patch the CNPG hibernation annotation to off. |
stop | Stop API | Patch the CNPG hibernation annotation to on. |
backup | Manual backup, periodic backup, backup-before-delete | Rehydrate the cluster if needed, create a CNPG Backup CR, and persist backup metadata. |
restore | Restore API | Create a new target cluster from selected snapshot metadata. |
deleteBackup | Manual snapshot delete and retention cleanup | Delete the operator backup CR or external backup marker. |
At minimum, operator-backed service plans should define create, modify, and delete so Omnistrate can provision, update, and remove the managed resource through standard lifecycle APIs. Add the other system workflows only for the operations your service plan supports.
Create workflow example¶
This simplified example shows the shape of a create workflow. The full template includes secrets, backup object stores, load balancer annotations, affinity, and output parameters.
systemWorkflows:
create:
outputParameters:
postgresContainerImage: "$tasks.applycluster.resource.status.image"
status: "$tasks.applycluster.resource.status.phase"
topology: "$tasks.applycluster.resource.status.topology"
workflow:
entrypoint: create
arguments:
parameters:
- name: namespace
value: "{{ $sys.namespace }}"
- name: instanceId
value: "{{ $sys.instanceId }}"
- name: numberOfInstances
value: "{{ $var.numberOfInstances }}"
- name: storageSize
value: "{{ $var.storageSize }}"
templates:
- name: create
inputs:
parameters:
- name: namespace
- name: instanceId
- name: numberOfInstances
- name: storageSize
dag:
tasks:
- name: applycluster
template: apply-cluster
arguments:
parameters:
- name: namespace
value: "{{inputs.parameters.namespace}}"
- name: instanceId
value: "{{inputs.parameters.instanceId}}"
- name: numberOfInstances
value: "{{inputs.parameters.numberOfInstances}}"
- name: storageSize
value: "{{inputs.parameters.storageSize}}"
- name: apply-cluster
inputs:
parameters:
- name: namespace
- name: instanceId
- name: numberOfInstances
- name: storageSize
resource:
action: apply
successCondition: status.instances == {{inputs.parameters.numberOfInstances}} && status.readyInstances == {{inputs.parameters.numberOfInstances}}
failureCondition: status.phase == failed
manifest: |
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: "{{inputs.parameters.instanceId}}"
namespace: "{{inputs.parameters.namespace}}"
spec:
instances: {{inputs.parameters.numberOfInstances}}
storage:
size: "{{inputs.parameters.storageSize}}"
storageClass: gp3
The resource.action can be apply, patch, or delete. successCondition and failureCondition are evaluated against the live Kubernetes resource status. Output parameters can reference completed task resources through $tasks.<taskName>.resource.*.
Backup and Restore Workflow Context¶
Backup workflows receive Omnistrate snapshot context:
systemWorkflows:
backup:
outputParameters:
backupId: "$tasks.createBackupCR.resource.status.backupId"
backupName: "$tasks.createBackupCR.resource.status.backupName"
workflow:
arguments:
parameters:
- name: snapshotId
value: "{{ $sys.snapshot.id }}"
- name: snapshotTime
value: "{{ $sys.snapshot.time }}"
- name: namespace
value: "{{ $sys.namespace }}"
- name: instanceId
value: "{{ $sys.instanceId }}"
Values rendered from backup.outputParameters are stored as snapshot metadata. Restore workflows can use that metadata:
systemWorkflows:
restore:
workflow:
arguments:
parameters:
- name: restoreSnapshotId
value: "{{ $sys.restore.snapshotId }}"
- name: restoreSnapshotTime
value: "{{ $sys.restore.snapshotTime }}"
- name: backupId
value: "{{ $sys.restore.metadata.backupId }}"
- name: backupName
value: "{{ $sys.restore.metadata.backupName }}"
- name: sourceInstanceId
value: "{{ $sys.sourceInstanceId }}"
- name: newClusterName
value: "{{ $sys.targetInstanceId }}"
This keeps provider backup identifiers out of the customer restore form. Customers select an Omnistrate snapshot; the restore workflow receives the metadata captured during the original backup.
Custom Workflows¶
Use customWorkflows for provider-defined operations that are not platform lifecycle APIs. The operator template does not include a custom workflow by default; it focuses on the standard lifecycle operations implemented as systemWorkflows.
Add custom workflows only when your product needs an additional operation beyond create, modify, delete, start, stop, scale, backup, restore, or delete-backup. Custom workflows appear in supportedOperations for the instance and can be invoked from the UI, API, or omnistrate-ctl operation commands.
Argo Workflow Syntax¶
The workflow body follows the Argo Workflow model for DAGs and Kubernetes resource templates. The most important concepts are:
entrypointselects the template to run first.arguments.parametersbinds Omnistrate variables to workflow inputs.templates[].dag.tasksdeclares the task graph and dependencies.templates[].resourceapplies, patches, or deletes Kubernetes resources.successConditionandfailureConditiondefine how Omnistrate decides whether a resource task completed.
For Argo syntax details, see the Argo Workflows documentation. Omnistrate renders $sys.*, $var.*, $secret.*, and $func.* expressions before executing the workflow against the tenant instance context.
Optional Terraform Dependencies¶
Operators often need cloud resources that are not Kubernetes resources, such as object-storage buckets, IAM roles, encryption keys, private networking, or DNS zones. Model those with a Terraform resource in the same service plan, mark it internal, and make the operator resource depend on it.
The Terraform resource can export outputs, and the operator workflow can consume those outputs through normal Omnistrate parameters. This lets Terraform manage cloud primitives while the operator manages application lifecycle in Kubernetes.
Build and Test¶
Clone the template and build it with omnistrate-ctl:
git clone https://github.com/omnistrate-community/operator-spec-template
cd operator-spec-template
omnistrate-ctl build -f spec.yaml --name "Postgres Operator" --release-as-preferred --spec-type ServicePlanSpec
Then create a test instance from the Customer Portal or CLI. Validate:
- The operator Helm dependencies are installed.
- The tenant namespace is created.
- The
createworkflow applies the operator custom resources. - Endpoints are visible after the operator reports readiness.
- Manual backup creates a snapshot and stores provider metadata.
- Restore creates a new target instance from the selected snapshot.
- Start, stop, add capacity, and remove capacity drive the operator through its CRD.
Use Deployment Cell Access and Manage Workflows to debug Kubernetes resources and workflow events.