Tags: coder/coder-k8s
Tags
🤖 feat: implement Helm parity phases 1-2 for CoderControlPlane (#77) ## Summary This PR implements Phase 1 and Phase 2 Helm-chart parity for `CoderControlPlane`, including production-hardening controls, workspace RBAC/ServiceAccount reconciliation, TLS/probe/scheduling passthroughs, and optional external exposure via either Ingress or Gateway API. ## Background The operator previously reconciled only a basic Deployment/Service/token flow. The plan for this branch adds the higher-leverage chart capabilities needed for production readiness and operability while preserving fail-fast behavior and backward compatibility. ## Implementation - Extended `CoderControlPlaneSpec` with parity fields for: - ServiceAccount, workspace RBAC rules/namespaces - Resources, container/pod security context - TLS secret mounts + probe configs - EnvFrom/volumes/volumeMounts/cert mounts - Scheduling controls (nodeSelector/tolerations/affinity/topology spread) - Exposure API: `spec.expose.ingress` or `spec.expose.gateway` - Added Gateway API dependency and scheme registration (`sigs.k8s.io/gateway-api/apis/v1`). - Implemented controller reconciliation for: - ServiceAccount creation/attachment - Workspace Role/RoleBinding management - Deployment alignment (port 8080, HA env defaults, optional access URL injection, TLS wiring, probes, pass-throughs) - Service HTTPS port when TLS is enabled - Ingress reconciliation and cleanup - HTTPRoute reconciliation and cleanup with graceful `NoMatch` handling when Gateway CRDs are absent - Regenerated CRD/RBAC manifests and API reference docs. - Added controller tests covering the new reconciliation behavior end-to-end. ## Validation - `make verify-vendor` - `make build` - `make test` - `make manifests` - `make codegen` - `make docs-reference` ## Risks - This change touches a broad reconciliation surface (Deployment, RBAC, and exposure resources). Reconciliation paths are covered by targeted tests, but cluster-specific integrations (Gateway controller behavior, ingress-class semantics) still depend on downstream environment configuration. --- <details> <summary>📋 Implementation Plan</summary> # Plan: Phase 1–2 parity with `coder/coder` Helm chart (+ optional Gateway API) ## Context / Why We want the `CoderControlPlane` controller in this repo to reach closer feature parity with the upstream `coder/coder` Helm chart. Today, our operator mainly reconciles: - a `Deployment` running the Coder control plane, - a fronting `Service`, and - an operator-token `Secret` (plus license/entitlements logic). The Helm chart additionally provides production-hardening and operability knobs: ServiceAccount + namespace RBAC for workspaces, resource limits, security contexts, TLS enablement, probes, HA env injection, Ingress exposure, scheduling controls, and volume/envFrom passthroughs. This plan implements **Phase 1 (production readiness)** and **Phase 2 (operability + HA)** items, and adds an **opt-in exposure API** allowing operators to choose **Ingress** or **Gateway API**. ## Goals (Phases 1 & 2) 1. **Make Coder pods runnable in production clusters** with Pod Security constraints by exposing/setting security context, resources, probes, and TLS. 2. **Provide first-class RBAC + ServiceAccount management** for workspace provisioning (pods/PVCs/deployments) across multiple namespaces. 3. **Support HA-relevant defaults** (pod IP env + DERP relay URL + default access URL behavior). 4. **Expose the control plane externally** via: - `networking.k8s.io/v1` **Ingress**, OR - `gateway.networking.k8s.io/v1` **Gateway API** (HTTPRoute), without requiring Gateway API CRDs to exist unless configured. ## Non-goals (explicitly deferred) - Full parity with every Helm chart knob (e.g., HPA, PDB, NetworkPolicy, workspace-proxy mode, provisioner daemon deployment). - Replacing the existing operator-access / licensing workflows. ## Evidence / Sources consulted - Upstream Helm chart container/env/TLS/probe behaviors: - `./tmpfork/coder/helm/coder/templates/_coder.tpl` - `./tmpfork/coder/helm/libcoder/templates/_helpers.tpl` - Upstream Helm chart RBAC rules & multi-namespace behavior: - `./tmpfork/coder/helm/libcoder/templates/_rbac.yaml` - Our API surface today: - `api/v1alpha1/codercontrolplane_types.go` - `api/v1alpha1/types_shared.go` - Our controller reconciliation + constants + SetupWithManager: - `internal/controller/codercontrolplane_controller.go` - Scheme construction (for adding Gateway API types): - `internal/app/sharedscheme/sharedscheme.go` --- ## Implementation plan ### 0) Create a parity tracking document (optional but recommended) Add a short markdown doc (e.g., `docs/design/helm-parity.md`) listing each Helm chart knob and which `CoderControlPlaneSpec` field covers it. This keeps future parity work honest. --- ## Phase 1 — Production readiness ### 1) Extend the CRD: `CoderControlPlaneSpec` (API additions) **Files** - `api/v1alpha1/codercontrolplane_types.go` - `api/v1alpha1/types_shared.go` **Add spec fields (Phase 1 scope)** 1) **Pod identity / permissions** - `spec.serviceAccount` (new struct) - `spec.rbac` (new struct) 2) **Hardening & resource controls** - `spec.resources` (`*corev1.ResourceRequirements`) - `spec.securityContext` (`*corev1.SecurityContext`) - `spec.podSecurityContext` (`*corev1.PodSecurityContext`) 3) **TLS enablement (Coder built-in TLS)** - `spec.tls.secretNames` (`[]string`) — enable internal TLS when non-empty 4) **Probes** - `spec.readinessProbe` and `spec.livenessProbe` (chart-style config with `enabled` + timing knobs) 5) **(Parity) Default access URL behavior** - `spec.envUseClusterAccessURL` (`*bool`, default `true`) — if enabled and user didn’t provide `CODER_ACCESS_URL` explicitly via `extraEnv`, the operator injects a default in-cluster URL. **Proposed Go shapes (illustrative)** ```go // api/v1alpha1/types_shared.go // ServiceAccountSpec configures the ServiceAccount used by the Coder pod. type ServiceAccountSpec struct { // DisableCreate skips ServiceAccount creation (use an existing SA). // +kubebuilder:default=false DisableCreate bool `json:"disableCreate,omitempty"` // Name is the ServiceAccount name. If empty, default to the CoderControlPlane name. Name string `json:"name,omitempty"` Annotations map[string]string `json:"annotations,omitempty"` Labels map[string]string `json:"labels,omitempty"` } // RBACSpec configures namespace-scoped RBAC for workspace provisioning. type RBACSpec struct { // WorkspacePerms enables Role/RoleBinding creation. // +kubebuilder:default=true WorkspacePerms bool `json:"workspacePerms,omitempty"` // EnableDeployments grants apps/deployments permissions (only if WorkspacePerms). // +kubebuilder:default=true EnableDeployments bool `json:"enableDeployments,omitempty"` // ExtraRules are appended to the Role rules (only if WorkspacePerms). ExtraRules []rbacv1.PolicyRule `json:"extraRules,omitempty"` // WorkspaceNamespaces are additional namespaces to create the Role/RoleBinding in. WorkspaceNamespaces []string `json:"workspaceNamespaces,omitempty"` } type TLSSpec struct { SecretNames []string `json:"secretNames,omitempty"` } type ProbeSpec struct { // +kubebuilder:default=true Enabled bool `json:"enabled,omitempty"` // +kubebuilder:default=0 InitialDelaySeconds int32 `json:"initialDelaySeconds,omitempty"` PeriodSeconds *int32 `json:"periodSeconds,omitempty"` TimeoutSeconds *int32 `json:"timeoutSeconds,omitempty"` SuccessThreshold *int32 `json:"successThreshold,omitempty"` FailureThreshold *int32 `json:"failureThreshold,omitempty"` } ``` ```go // api/v1alpha1/codercontrolplane_types.go type CoderControlPlaneSpec struct { ... ServiceAccount ServiceAccountSpec `json:"serviceAccount,omitempty"` RBAC RBACSpec `json:"rbac,omitempty"` Resources *corev1.ResourceRequirements `json:"resources,omitempty"` SecurityContext *corev1.SecurityContext `json:"securityContext,omitempty"` PodSecurityContext *corev1.PodSecurityContext `json:"podSecurityContext,omitempty"` TLS TLSSpec `json:"tls,omitempty"` // +kubebuilder:default={enabled:true,initialDelaySeconds:0} ReadinessProbe ProbeSpec `json:"readinessProbe,omitempty"` // +kubebuilder:default={enabled:false,initialDelaySeconds:0} LivenessProbe ProbeSpec `json:"livenessProbe,omitempty"` // +kubebuilder:default=true EnvUseClusterAccessURL *bool `json:"envUseClusterAccessURL,omitempty"` } ``` **Notes** - Keep fields optional and backward compatible. - Prefer `types_shared.go` for structs that may be reused by future CRDs. --- ### 2) Reconcile ServiceAccount + namespace RBAC for workspaces **Files** - `internal/controller/codercontrolplane_controller.go` **Where** - Add a new reconciliation step in `Reconcile()` **before** `reconcileDeployment()` so the Deployment can reference the SA. **What to add** 1) `reconcileServiceAccount(ctx, cp)` - If `spec.serviceAccount.disableCreate=true`: ensure previously-owned SA is deleted (cleanup). - Else create/update a `corev1.ServiceAccount` named `spec.serviceAccount.name` (default to `cp.Name`). - Apply labels `controlPlaneLabels(cp.Name)` plus user-provided SA labels/annotations. 2) `reconcileWorkspaceRBAC(ctx, cp)` - If `spec.rbac.workspacePerms=false`: delete previously-owned Roles/RoleBindings (cleanup). - Else create/update a `rbacv1.Role` and `rbacv1.RoleBinding` in: - `cp.Namespace`, and - each namespace in `spec.rbac.workspaceNamespaces`. **Match Helm chart semantics** (from `libcoder.rbac.rules.basic` / `deployments`): - Basic rules (pods + PVCs) when `workspacePerms=true`. - Deployments rules only when `workspacePerms=true && enableDeployments=true`. - Append `extraRules` only when `workspacePerms=true`. **Role/RoleBinding naming** - Role: `<serviceAccountName>-workspace-perms` - RoleBinding: `<serviceAccountName>` (matches chart) **Cleanup strategy** - Use label selectors + `OwnerReference` checks to delete only operator-owned RBAC objects. - Mirror the pattern used by `cleanupDisabledOperatorAccess`. --- ### 3) Align the Deployment with Helm defaults (ports, probes, env) **Files** - `internal/controller/codercontrolplane_controller.go` **Changes** 1) **Port alignment** - Change `controlPlaneTargetPort` from `3000` → `8080`. - Change default arg from `--http-address=0.0.0.0:3000` → `--http-address=0.0.0.0:8080`. 2) **Inject HA env defaults** (as Helm chart does) - Always include: - `KUBE_POD_IP` from `fieldRef: status.podIP` - `CODER_DERP_SERVER_RELAY_URL=http://$(KUBE_POD_IP):8080` 3) **Default `CODER_ACCESS_URL` injection** - If `spec.envUseClusterAccessURL` is true and `extraEnv` does not set `CODER_ACCESS_URL`, inject: - `http://<service>.<namespace>.svc.cluster.local` when internal TLS disabled - `https://<service>.<namespace>.svc.cluster.local` when internal TLS enabled 4) **Readiness/Liveness probes** - If `spec.readinessProbe.enabled`: set readiness probe `GET /healthz` on named port `http`. - If `spec.livenessProbe.enabled`: set liveness probe similarly. - Map timing knobs to `corev1.Probe` fields. 5) **Security & resources** - Apply `spec.resources` → `container.resources`. - Apply `spec.securityContext` → `container.securityContext`. - Apply `spec.podSecurityContext` → `pod.securityContext`. 6) **ServiceAccount usage** - Set `pod.spec.serviceAccountName` to the resolved SA name. --- ### 4) Implement internal TLS (Coder built-in TLS) like Helm **Files** - `api/v1alpha1/*` (spec field already added) - `internal/controller/codercontrolplane_controller.go` **Behavior (match Helm chart’s `coder.tlsEnv` + mounts)** If `spec.tls.secretNames` is non-empty: - Add env vars: - `CODER_TLS_ENABLE=true` - `CODER_TLS_ADDRESS=0.0.0.0:8443` - `CODER_TLS_CERT_FILE` = comma-separated list of `/etc/ssl/certs/coder/<secret>/tls.crt` - `CODER_TLS_KEY_FILE` = comma-separated list of `/etc/ssl/certs/coder/<secret>/tls.key` - Add pod volumes (one per TLS secret) - Add volume mounts at `/etc/ssl/certs/coder/<secret>` (read-only) - Add container port `https:8443` **Service impact** - Add an additional `ServicePort` named `https` at 443 → targetPort 8443 when TLS is enabled. **Status impact** - Update `desiredStatus().URL` scheme to `https` when TLS is enabled. --- ## Phase 2 — Operability + HA ### 5) Add pass-through config knobs: envFrom, volumes, cert bundles, scheduling **API changes** Add these optional fields to `CoderControlPlaneSpec`: - `envFrom []corev1.EnvFromSource` - `volumes []corev1.Volume` - `volumeMounts []corev1.VolumeMount` - `certs.secrets []SecretKeySelector` (name+key) to mount CA certs at `/etc/ssl/certs/<name>.crt` with `subPath: key` - scheduling fields: - `nodeSelector map[string]string` - `tolerations []corev1.Toleration` - `affinity *corev1.Affinity` - `topologySpreadConstraints []corev1.TopologySpreadConstraint` **Controller changes** - Append `envFrom` to container. - Append `volumes` and `volumeMounts` to the pod. - For each CA cert secret selector: - add volume + volumeMount matching Helm behavior. - Apply scheduling fields on the pod spec. --- ### 6) Exposure API: choose between Ingress or Gateway API **Goal**: Let operators choose **one** of: - `networking.k8s.io/v1 Ingress`, or - `gateway.networking.k8s.io/v1 HTTPRoute` (Gateway API) #### 6.1 CRD changes: add `spec.expose` Add a new `ExposeSpec` with mutually exclusive `ingress` vs `gateway` config. ```go // types_shared.go type ExposeSpec struct { // +optional Ingress *IngressExposeSpec `json:"ingress,omitempty"` // +optional Gateway *GatewayExposeSpec `json:"gateway,omitempty"` // NOTE: add kubebuilder XValidation to ensure at most one is set. // Example intent: // +kubebuilder:validation:XValidation:rule="!(has(self.ingress) && has(self.gateway))",message="only one of ingress or gateway may be set" } type IngressExposeSpec struct { ClassName *string `json:"className,omitempty"` Host string `json:"host"` WildcardHost string `json:"wildcardHost,omitempty"` Annotations map[string]string `json:"annotations,omitempty"` // Optional TLS termination at the Ingress. TLS *IngressTLSExposeSpec `json:"tls,omitempty"` } type IngressTLSExposeSpec struct { SecretName string `json:"secretName,omitempty"` WildcardSecretName string `json:"wildcardSecretName,omitempty"` } type GatewayExposeSpec struct { Host string `json:"host"` WildcardHost string `json:"wildcardHost,omitempty"` // ParentRefs are Gateways that this HTTPRoute attaches to. ParentRefs []GatewayParentRef `json:"parentRefs,omitempty"` } type GatewayParentRef struct { Name string `json:"name"` Namespace *string `json:"namespace,omitempty"` SectionName *string `json:"sectionName,omitempty"` } ``` #### 6.2 Controller changes: reconcile + cleanup **Files** - `internal/controller/codercontrolplane_controller.go` **Where** - In `Reconcile()`, reconcile exposure resources **after** `reconcileService()`. **Ingress reconciliation** - Create/update `networkingv1.Ingress` named `cp.Name` (or `cp.Name + "-ingress"` if name collisions are a concern). - Rules: - one rule for `host` (required) - optional rule for `wildcardHost` - Backend: - Service: `cp.Name` - Port: `spec.service.port` - Apply annotations/className. - TLS: - If `tls.secretName` set: add `IngressTLS{SecretName, Hosts:[host]}` - If `tls.wildcardSecretName` set: add `IngressTLS{SecretName, Hosts:[wildcardHost]}` **Gateway API reconciliation (minimal viable)** - Reconcile a `gatewayv1.HTTPRoute` named `cp.Name`. - `spec.parentRefs`: from `spec.expose.gateway.parentRefs`. - `spec.hostnames`: include `host` and `wildcardHost` when set. - One rule routing `/` to backend service `cp.Name` at port `spec.service.port`. **Critical compatibility requirement** - Gateway API CRDs may not exist in the cluster. Ensure the operator: - does **not** add `Owns(&gatewayv1.HTTPRoute{})` watches in `SetupWithManager`, and - gracefully handles `meta.IsNoMatchError(err)` (or equivalent) during reconcile: - record a Condition or Event (recommended), and - do not crash or block other reconciliation. #### 6.3 Scheme & deps - Add `sigs.k8s.io/gateway-api` to `go.mod` and `vendor/`. - Register Gateway API types into the scheme in `internal/app/sharedscheme/sharedscheme.go` (e.g., `gatewayv1.AddToScheme(scheme)`). --- ## Cross-cutting work ### 7) Update operator RBAC markers and generated manifests **Files** - `internal/controller/codercontrolplane_controller.go` kubebuilder RBAC comments Add operator permissions to manage new resources: - `serviceaccounts` - `roles`, `rolebindings` - `ingresses` - (optional) `httproutes`, `gateways` (Gateway API) Then regenerate: - `make manifests` --- ### 8) Testing plan **Unit/envtest** - Extend existing controller tests (likely `internal/controller/codercontrolplane_controller_test.go`): - ServiceAccount created and referenced by Deployment - Role/RoleBinding created in `cp.Namespace` and extra namespaces - TLS secretNames → volumes/mounts/env + service https port - Probes enabled/disabled behavior - Ingress created when `spec.expose.ingress` is set; deleted when unset - Gateway API: when configured but CRDs missing, reconcile should not hard-fail (assert on condition/event or logged behavior) **Integration / make targets** - `make test` - `make test-integration` (if it exercises controller-runtime manager behavior) --- ### 9) Generated artifacts & docs - After API changes: - `make codegen` - `make manifests` - Update examples in `config/samples/` to include: - a minimal cluster-internal install - an Ingress-exposed install - a Gateway API HTTPRoute-exposed install - If this repo maintains API reference docs, regenerate them (per project conventions). --- ## Validation checklist (when implementing) 1. `make test` 2. `make test-integration` 3. `make build` 4. `make lint` 5. Confirm generated manifests (`config/`) updated and committed. </details> --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$8.55`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=8.55 -->