Before You Start¶
This page covers the prerequisites, terminology, and concepts you should understand before deploying DocumentDB on Kubernetes.
Prerequisites checklist¶
Before installing the DocumentDB Kubernetes Operator, ensure you have the following:
Required components¶
| Component | Minimum Version | Purpose | Installation Guide |
|---|---|---|---|
| Kubernetes cluster | 1.35+ | Container orchestration platform | See Cluster Options |
| kubectl | 1.35+ | Kubernetes command-line tool | Install kubectl |
| Helm | 3.x | Package manager for Kubernetes | Install Helm |
| cert-manager | 1.19+ | TLS certificate management | Install cert-manager |
Kubernetes 1.35+ Required
The operator requires Kubernetes 1.35 or later because it uses the ImageVolume feature (GA in Kubernetes 1.35) to mount the DocumentDB extension into PostgreSQL pods.
Optional components¶
| Component | Purpose | When Needed |
|---|---|---|
| mongosh | MongoDB shell for connecting to DocumentDB | Testing and administration |
| Azure CLI | Azure resource management | Deploying on AKS |
| AWS CLI + eksctl | AWS resource management | Deploying on EKS |
| gcloud CLI | Google Cloud resource management | Deploying on GKE |
Kubernetes cluster options¶
The operator runs on any conformant Kubernetes distribution (1.35+). Choose based on your environment:
For local development and testing:
For production deployments:
- Azure Kubernetes Service (AKS) - Microsoft Azure
- Amazon EKS - Amazon Web Services
- Google Kubernetes Engine (GKE) - Google Cloud Platform
Resource requirements¶
Minimum resources for a basic DocumentDB deployment:
| Resource | Minimum | Recommended (Production) |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| Memory | 4 GB | 8+ GB |
| Storage | 10 GB | 50+ GB (SSD recommended) |
| Nodes | 1 | 3+ (for high availability) |
Terminology¶
Kubernetes concepts¶
Understanding these Kubernetes concepts helps you work effectively with the DocumentDB operator:
| Term | Definition |
|---|---|
| Pod | The smallest deployable unit in Kubernetes. DocumentDB runs as pods containing the database and gateway containers. |
| Service | An abstraction that exposes pods to network traffic. DocumentDB uses Services to enable client connections. |
| PersistentVolumeClaim (PVC) | A request for storage. DocumentDB stores data on PVCs backed by your cluster's storage class. |
| StorageClass | Defines storage characteristics (performance tier, provisioner). Choose based on your cloud provider. |
| Namespace | A logical partition within a cluster. Deploy DocumentDB in its own namespace for isolation. |
| Custom Resource (CR) | An extension of the Kubernetes API. DocumentDB, Backup, and ScheduledBackup are custom resources. |
| Custom Resource Definition (CRD) | Defines the schema for a custom resource. The operator installs CRDs for its resources. |
| Operator | A controller that manages custom resources. The DocumentDB operator manages the lifecycle of DocumentDB clusters. |
| ConfigMap | Stores non-sensitive configuration data. The operator uses ConfigMaps for certain settings. |
| Secret | Stores sensitive data like passwords. DocumentDB credentials are stored in Secrets. |
DocumentDB concepts¶
| Term | Definition |
|---|---|
| DocumentDB Cluster | A managed deployment of DocumentDB on Kubernetes, represented by the DocumentDB custom resource. |
| Instance | A single PostgreSQL + Gateway pod. A cluster can have 1-3 instances for high availability. |
| Primary | The instance that accepts write operations. There is always exactly one primary per cluster. |
| Replica | A read-only instance that replicates data from the primary. Replicas can be promoted during failover. |
| Gateway | A sidecar container that provides MongoDB-compatible API on top of PostgreSQL. Clients connect to the gateway. |
| Node | In DocumentDB terms, a logical grouping of instances. Currently limited to 1 node per cluster. |
Cloud and infrastructure concepts¶
| Term | Definition |
|---|---|
| Region | A geographic location where cloud resources are deployed (for example, us-west-2, westus2). |
| Availability Zone (AZ) | An isolated location within a region. Distribute instances across zones for resilience. |
| Load Balancer | Distributes traffic across instances. Use LoadBalancer service type for external access. |
| Storage Class | Cloud-specific storage configuration. Examples: managed-csi (AKS), gp3 (EKS). |
High availability concepts¶
| Term | Definition |
|---|---|
| High Availability (HA) | Running multiple instances to survive failures. Set instancesPerNode: 3 for HA. |
| Failover | Automatic promotion of a replica to primary when the primary fails. |
| RTO (Recovery Time Objective) | Maximum acceptable downtime after a failure. Local HA typically achieves < 30 seconds. |
| RPO (Recovery Point Objective) | Maximum acceptable data loss. With streaming replication, RPO is near-zero (milliseconds of lag). |
| Replication Lag | The delay between writes on the primary and their application on replicas. |
Architecture overview¶
For a detailed explanation of how the operator works, see Architecture Overview.
flowchart TB
API[Kubernetes API Server]
subgraph DocDBNS[documentdb-operator namespace]
Operator[DocumentDB Operator]
end
subgraph CNPGNS[cnpg-system namespace]
CNPG[CloudNative-PG Operator]
end
subgraph AppNS[Application Namespace]
CR[DocumentDB CR]
CLUSTER_CR[CNPG Cluster CR]
LB[External LoadBalancer]
P1[Primary - Postgres + Gateway]
P2[Replica 1 - Postgres + Gateway]
P3[Replica 2 - Postgres + Gateway]
PVC1[PVC 1]
PVC2[PVC 2]
PVC3[PVC 3]
end
CLIENT[MongoDB Client]
API --> Operator
API --> CNPG
CLIENT --> LB
LB --> P1
Operator --> CR
Operator --> CLUSTER_CR
Operator --> LB
CNPG --> CLUSTER_CR
CNPG --> P1
CNPG --> P2
CNPG --> P3
P1 --> PVC1
P2 --> PVC2
P3 --> PVC3
P1 --> P2
P1 --> P3
Next steps¶
- Quickstart - Deploy your first DocumentDB cluster
- Deploy on AKS - Production deployment on Azure
- Deploy on EKS - Production deployment on AWS