Skip to content

Tags: coder/coder-k8s

Tags

v0.1.0

Toggle v0.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 feat: implement Helm parity phases 1-2 for CoderControlPlane (#77)

## Summary
This PR implements Phase 1 and Phase 2 Helm-chart parity for
`CoderControlPlane`, including production-hardening controls, workspace
RBAC/ServiceAccount reconciliation, TLS/probe/scheduling passthroughs,
and optional external exposure via either Ingress or Gateway API.

## Background
The operator previously reconciled only a basic Deployment/Service/token
flow. The plan for this branch adds the higher-leverage chart
capabilities needed for production readiness and operability while
preserving fail-fast behavior and backward compatibility.

## Implementation
- Extended `CoderControlPlaneSpec` with parity fields for:
  - ServiceAccount, workspace RBAC rules/namespaces
  - Resources, container/pod security context
  - TLS secret mounts + probe configs
  - EnvFrom/volumes/volumeMounts/cert mounts
- Scheduling controls (nodeSelector/tolerations/affinity/topology
spread)
  - Exposure API: `spec.expose.ingress` or `spec.expose.gateway`
- Added Gateway API dependency and scheme registration
(`sigs.k8s.io/gateway-api/apis/v1`).
- Implemented controller reconciliation for:
  - ServiceAccount creation/attachment
  - Workspace Role/RoleBinding management
- Deployment alignment (port 8080, HA env defaults, optional access URL
injection, TLS wiring, probes, pass-throughs)
  - Service HTTPS port when TLS is enabled
  - Ingress reconciliation and cleanup
- HTTPRoute reconciliation and cleanup with graceful `NoMatch` handling
when Gateway CRDs are absent
- Regenerated CRD/RBAC manifests and API reference docs.
- Added controller tests covering the new reconciliation behavior
end-to-end.

## Validation
- `make verify-vendor`
- `make build`
- `make test`
- `make manifests`
- `make codegen`
- `make docs-reference`

## Risks
- This change touches a broad reconciliation surface (Deployment, RBAC,
and exposure resources). Reconciliation paths are covered by targeted
tests, but cluster-specific integrations (Gateway controller behavior,
ingress-class semantics) still depend on downstream environment
configuration.

---

<details>
<summary>📋 Implementation Plan</summary>

# Plan: Phase 1–2 parity with `coder/coder` Helm chart (+ optional
Gateway API)

## Context / Why
We want the `CoderControlPlane` controller in this repo to reach closer
feature parity with the upstream `coder/coder` Helm chart. Today, our
operator mainly reconciles:
- a `Deployment` running the Coder control plane,
- a fronting `Service`, and
- an operator-token `Secret` (plus license/entitlements logic).

The Helm chart additionally provides production-hardening and
operability knobs: ServiceAccount + namespace RBAC for workspaces,
resource limits, security contexts, TLS enablement, probes, HA env
injection, Ingress exposure, scheduling controls, and volume/envFrom
passthroughs.

This plan implements **Phase 1 (production readiness)** and **Phase 2
(operability + HA)** items, and adds an **opt-in exposure API** allowing
operators to choose **Ingress** or **Gateway API**.

## Goals (Phases 1 & 2)
1. **Make Coder pods runnable in production clusters** with Pod Security
constraints by exposing/setting security context, resources, probes, and
TLS.
2. **Provide first-class RBAC + ServiceAccount management** for
workspace provisioning (pods/PVCs/deployments) across multiple
namespaces.
3. **Support HA-relevant defaults** (pod IP env + DERP relay URL +
default access URL behavior).
4. **Expose the control plane externally** via:
   - `networking.k8s.io/v1` **Ingress**, OR
- `gateway.networking.k8s.io/v1` **Gateway API** (HTTPRoute), without
requiring Gateway API CRDs to exist unless configured.

## Non-goals (explicitly deferred)
- Full parity with every Helm chart knob (e.g., HPA, PDB, NetworkPolicy,
workspace-proxy mode, provisioner daemon deployment).
- Replacing the existing operator-access / licensing workflows.

## Evidence / Sources consulted
- Upstream Helm chart container/env/TLS/probe behaviors:
  - `./tmpfork/coder/helm/coder/templates/_coder.tpl`
  - `./tmpfork/coder/helm/libcoder/templates/_helpers.tpl`
- Upstream Helm chart RBAC rules & multi-namespace behavior:
  - `./tmpfork/coder/helm/libcoder/templates/_rbac.yaml`
- Our API surface today:
  - `api/v1alpha1/codercontrolplane_types.go`
  - `api/v1alpha1/types_shared.go`
- Our controller reconciliation + constants + SetupWithManager:
  - `internal/controller/codercontrolplane_controller.go`
- Scheme construction (for adding Gateway API types):
  - `internal/app/sharedscheme/sharedscheme.go`

---

## Implementation plan

### 0) Create a parity tracking document (optional but recommended)
Add a short markdown doc (e.g., `docs/design/helm-parity.md`) listing
each Helm chart knob and which `CoderControlPlaneSpec` field covers it.
This keeps future parity work honest.

---

## Phase 1 — Production readiness

### 1) Extend the CRD: `CoderControlPlaneSpec` (API additions)
**Files**
- `api/v1alpha1/codercontrolplane_types.go`
- `api/v1alpha1/types_shared.go`

**Add spec fields (Phase 1 scope)**

1) **Pod identity / permissions**
- `spec.serviceAccount` (new struct)
- `spec.rbac` (new struct)

2) **Hardening & resource controls**
- `spec.resources` (`*corev1.ResourceRequirements`)
- `spec.securityContext` (`*corev1.SecurityContext`)
- `spec.podSecurityContext` (`*corev1.PodSecurityContext`)

3) **TLS enablement (Coder built-in TLS)**
- `spec.tls.secretNames` (`[]string`) — enable internal TLS when
non-empty

4) **Probes**
- `spec.readinessProbe` and `spec.livenessProbe` (chart-style config
with `enabled` + timing knobs)

5) **(Parity) Default access URL behavior**
- `spec.envUseClusterAccessURL` (`*bool`, default `true`) — if enabled
and user didn’t provide `CODER_ACCESS_URL` explicitly via `extraEnv`,
the operator injects a default in-cluster URL.

**Proposed Go shapes (illustrative)**
```go
// api/v1alpha1/types_shared.go

// ServiceAccountSpec configures the ServiceAccount used by the Coder pod.
type ServiceAccountSpec struct {
	// DisableCreate skips ServiceAccount creation (use an existing SA).
	// +kubebuilder:default=false
	DisableCreate bool `json:"disableCreate,omitempty"`

	// Name is the ServiceAccount name. If empty, default to the CoderControlPlane name.
	Name string `json:"name,omitempty"`

	Annotations map[string]string `json:"annotations,omitempty"`
	Labels      map[string]string `json:"labels,omitempty"`
}

// RBACSpec configures namespace-scoped RBAC for workspace provisioning.
type RBACSpec struct {
	// WorkspacePerms enables Role/RoleBinding creation.
	// +kubebuilder:default=true
	WorkspacePerms bool `json:"workspacePerms,omitempty"`

	// EnableDeployments grants apps/deployments permissions (only if WorkspacePerms).
	// +kubebuilder:default=true
	EnableDeployments bool `json:"enableDeployments,omitempty"`

	// ExtraRules are appended to the Role rules (only if WorkspacePerms).
	ExtraRules []rbacv1.PolicyRule `json:"extraRules,omitempty"`

	// WorkspaceNamespaces are additional namespaces to create the Role/RoleBinding in.
	WorkspaceNamespaces []string `json:"workspaceNamespaces,omitempty"`
}

type TLSSpec struct {
	SecretNames []string `json:"secretNames,omitempty"`
}

type ProbeSpec struct {
	// +kubebuilder:default=true
	Enabled bool `json:"enabled,omitempty"`
	// +kubebuilder:default=0
	InitialDelaySeconds int32 `json:"initialDelaySeconds,omitempty"`
	PeriodSeconds       *int32 `json:"periodSeconds,omitempty"`
	TimeoutSeconds      *int32 `json:"timeoutSeconds,omitempty"`
	SuccessThreshold    *int32 `json:"successThreshold,omitempty"`
	FailureThreshold    *int32 `json:"failureThreshold,omitempty"`
}
```

```go
// api/v1alpha1/codercontrolplane_types.go

type CoderControlPlaneSpec struct {
	...
	ServiceAccount ServiceAccountSpec `json:"serviceAccount,omitempty"`
	RBAC           RBACSpec           `json:"rbac,omitempty"`

	Resources         *corev1.ResourceRequirements `json:"resources,omitempty"`
	SecurityContext   *corev1.SecurityContext      `json:"securityContext,omitempty"`
	PodSecurityContext *corev1.PodSecurityContext  `json:"podSecurityContext,omitempty"`

	TLS TLSSpec `json:"tls,omitempty"`

	// +kubebuilder:default={enabled:true,initialDelaySeconds:0}
	ReadinessProbe ProbeSpec `json:"readinessProbe,omitempty"`
	// +kubebuilder:default={enabled:false,initialDelaySeconds:0}
	LivenessProbe  ProbeSpec `json:"livenessProbe,omitempty"`

	// +kubebuilder:default=true
	EnvUseClusterAccessURL *bool `json:"envUseClusterAccessURL,omitempty"`
}
```

**Notes**
- Keep fields optional and backward compatible.
- Prefer `types_shared.go` for structs that may be reused by future
CRDs.

---

### 2) Reconcile ServiceAccount + namespace RBAC for workspaces
**Files**
- `internal/controller/codercontrolplane_controller.go`

**Where**
- Add a new reconciliation step in `Reconcile()` **before**
`reconcileDeployment()` so the Deployment can reference the SA.

**What to add**
1) `reconcileServiceAccount(ctx, cp)`
- If `spec.serviceAccount.disableCreate=true`: ensure previously-owned
SA is deleted (cleanup).
- Else create/update a `corev1.ServiceAccount` named
`spec.serviceAccount.name` (default to `cp.Name`).
- Apply labels `controlPlaneLabels(cp.Name)` plus user-provided SA
labels/annotations.

2) `reconcileWorkspaceRBAC(ctx, cp)`
- If `spec.rbac.workspacePerms=false`: delete previously-owned
Roles/RoleBindings (cleanup).
- Else create/update a `rbacv1.Role` and `rbacv1.RoleBinding` in:
  - `cp.Namespace`, and
  - each namespace in `spec.rbac.workspaceNamespaces`.

**Match Helm chart semantics** (from `libcoder.rbac.rules.basic` /
`deployments`):
- Basic rules (pods + PVCs) when `workspacePerms=true`.
- Deployments rules only when `workspacePerms=true &&
enableDeployments=true`.
- Append `extraRules` only when `workspacePerms=true`.

**Role/RoleBinding naming**
- Role: `<serviceAccountName>-workspace-perms`
- RoleBinding: `<serviceAccountName>` (matches chart)

**Cleanup strategy**
- Use label selectors + `OwnerReference` checks to delete only
operator-owned RBAC objects.
- Mirror the pattern used by `cleanupDisabledOperatorAccess`.

---

### 3) Align the Deployment with Helm defaults (ports, probes, env)
**Files**
- `internal/controller/codercontrolplane_controller.go`

**Changes**
1) **Port alignment**
- Change `controlPlaneTargetPort` from `3000` → `8080`.
- Change default arg from `--http-address=0.0.0.0:3000` →
`--http-address=0.0.0.0:8080`.

2) **Inject HA env defaults** (as Helm chart does)
- Always include:
  - `KUBE_POD_IP` from `fieldRef: status.podIP`
  - `CODER_DERP_SERVER_RELAY_URL=http://$(KUBE_POD_IP):8080`

3) **Default `CODER_ACCESS_URL` injection**
- If `spec.envUseClusterAccessURL` is true and `extraEnv` does not set
`CODER_ACCESS_URL`, inject:
- `http://<service>.<namespace>.svc.cluster.local` when internal TLS
disabled
- `https://<service>.<namespace>.svc.cluster.local` when internal TLS
enabled

4) **Readiness/Liveness probes**
- If `spec.readinessProbe.enabled`: set readiness probe `GET /healthz`
on named port `http`.
- If `spec.livenessProbe.enabled`: set liveness probe similarly.
- Map timing knobs to `corev1.Probe` fields.

5) **Security & resources**
- Apply `spec.resources` → `container.resources`.
- Apply `spec.securityContext` → `container.securityContext`.
- Apply `spec.podSecurityContext` → `pod.securityContext`.

6) **ServiceAccount usage**
- Set `pod.spec.serviceAccountName` to the resolved SA name.

---

### 4) Implement internal TLS (Coder built-in TLS) like Helm
**Files**
- `api/v1alpha1/*` (spec field already added)
- `internal/controller/codercontrolplane_controller.go`

**Behavior (match Helm chart’s `coder.tlsEnv` + mounts)**
If `spec.tls.secretNames` is non-empty:
- Add env vars:
  - `CODER_TLS_ENABLE=true`
  - `CODER_TLS_ADDRESS=0.0.0.0:8443`
- `CODER_TLS_CERT_FILE` = comma-separated list of
`/etc/ssl/certs/coder/<secret>/tls.crt`
- `CODER_TLS_KEY_FILE` = comma-separated list of
`/etc/ssl/certs/coder/<secret>/tls.key`
- Add pod volumes (one per TLS secret)
- Add volume mounts at `/etc/ssl/certs/coder/<secret>` (read-only)
- Add container port `https:8443`

**Service impact**
- Add an additional `ServicePort` named `https` at 443 → targetPort 8443
when TLS is enabled.

**Status impact**
- Update `desiredStatus().URL` scheme to `https` when TLS is enabled.

---

## Phase 2 — Operability + HA

### 5) Add pass-through config knobs: envFrom, volumes, cert bundles,
scheduling
**API changes**
Add these optional fields to `CoderControlPlaneSpec`:
- `envFrom []corev1.EnvFromSource`
- `volumes []corev1.Volume`
- `volumeMounts []corev1.VolumeMount`
- `certs.secrets []SecretKeySelector` (name+key) to mount CA certs at
`/etc/ssl/certs/<name>.crt` with `subPath: key`
- scheduling fields:
  - `nodeSelector map[string]string`
  - `tolerations []corev1.Toleration`
  - `affinity *corev1.Affinity`
  - `topologySpreadConstraints []corev1.TopologySpreadConstraint`

**Controller changes**
- Append `envFrom` to container.
- Append `volumes` and `volumeMounts` to the pod.
- For each CA cert secret selector:
  - add volume + volumeMount matching Helm behavior.
- Apply scheduling fields on the pod spec.

---

### 6) Exposure API: choose between Ingress or Gateway API
**Goal**: Let operators choose **one** of:
- `networking.k8s.io/v1 Ingress`, or
- `gateway.networking.k8s.io/v1 HTTPRoute` (Gateway API)

#### 6.1 CRD changes: add `spec.expose`
Add a new `ExposeSpec` with mutually exclusive `ingress` vs `gateway`
config.

```go
// types_shared.go

type ExposeSpec struct {
	// +optional
	Ingress *IngressExposeSpec `json:"ingress,omitempty"`
	// +optional
	Gateway *GatewayExposeSpec `json:"gateway,omitempty"`

	// NOTE: add kubebuilder XValidation to ensure at most one is set.
	// Example intent:
	// +kubebuilder:validation:XValidation:rule="!(has(self.ingress) && has(self.gateway))",message="only one of ingress or gateway may be set"
}

type IngressExposeSpec struct {
	ClassName   *string           `json:"className,omitempty"`
	Host        string            `json:"host"`
	WildcardHost string           `json:"wildcardHost,omitempty"`
	Annotations map[string]string `json:"annotations,omitempty"`

	// Optional TLS termination at the Ingress.
	TLS *IngressTLSExposeSpec `json:"tls,omitempty"`
}

type IngressTLSExposeSpec struct {
	SecretName         string `json:"secretName,omitempty"`
	WildcardSecretName string `json:"wildcardSecretName,omitempty"`
}

type GatewayExposeSpec struct {
	Host         string `json:"host"`
	WildcardHost string `json:"wildcardHost,omitempty"`

	// ParentRefs are Gateways that this HTTPRoute attaches to.
	ParentRefs []GatewayParentRef `json:"parentRefs,omitempty"`
}

type GatewayParentRef struct {
	Name      string  `json:"name"`
	Namespace *string `json:"namespace,omitempty"`
	SectionName *string `json:"sectionName,omitempty"`
}
```

#### 6.2 Controller changes: reconcile + cleanup
**Files**
- `internal/controller/codercontrolplane_controller.go`

**Where**
- In `Reconcile()`, reconcile exposure resources **after**
`reconcileService()`.

**Ingress reconciliation**
- Create/update `networkingv1.Ingress` named `cp.Name` (or `cp.Name +
"-ingress"` if name collisions are a concern).
- Rules:
  - one rule for `host` (required)
  - optional rule for `wildcardHost`
- Backend:
  - Service: `cp.Name`
  - Port: `spec.service.port`
- Apply annotations/className.
- TLS:
  - If `tls.secretName` set: add `IngressTLS{SecretName, Hosts:[host]}`
- If `tls.wildcardSecretName` set: add `IngressTLS{SecretName,
Hosts:[wildcardHost]}`

**Gateway API reconciliation (minimal viable)**
- Reconcile a `gatewayv1.HTTPRoute` named `cp.Name`.
- `spec.parentRefs`: from `spec.expose.gateway.parentRefs`.
- `spec.hostnames`: include `host` and `wildcardHost` when set.
- One rule routing `/` to backend service `cp.Name` at port
`spec.service.port`.

**Critical compatibility requirement**
- Gateway API CRDs may not exist in the cluster. Ensure the operator:
- does **not** add `Owns(&gatewayv1.HTTPRoute{})` watches in
`SetupWithManager`, and
- gracefully handles `meta.IsNoMatchError(err)` (or equivalent) during
reconcile:
    - record a Condition or Event (recommended), and
    - do not crash or block other reconciliation.

#### 6.3 Scheme & deps
- Add `sigs.k8s.io/gateway-api` to `go.mod` and `vendor/`.
- Register Gateway API types into the scheme in
`internal/app/sharedscheme/sharedscheme.go` (e.g.,
`gatewayv1.AddToScheme(scheme)`).

---

## Cross-cutting work

### 7) Update operator RBAC markers and generated manifests
**Files**
- `internal/controller/codercontrolplane_controller.go` kubebuilder RBAC
comments

Add operator permissions to manage new resources:
- `serviceaccounts`
- `roles`, `rolebindings`
- `ingresses`
- (optional) `httproutes`, `gateways` (Gateway API)

Then regenerate:
- `make manifests`

---

### 8) Testing plan
**Unit/envtest**
- Extend existing controller tests (likely
`internal/controller/codercontrolplane_controller_test.go`):
  - ServiceAccount created and referenced by Deployment
  - Role/RoleBinding created in `cp.Namespace` and extra namespaces
  - TLS secretNames → volumes/mounts/env + service https port
  - Probes enabled/disabled behavior
- Ingress created when `spec.expose.ingress` is set; deleted when unset
- Gateway API: when configured but CRDs missing, reconcile should not
hard-fail (assert on condition/event or logged behavior)

**Integration / make targets**
- `make test`
- `make test-integration` (if it exercises controller-runtime manager
behavior)

---

### 9) Generated artifacts & docs
- After API changes:
  - `make codegen`
  - `make manifests`
- Update examples in `config/samples/` to include:
  - a minimal cluster-internal install
  - an Ingress-exposed install
  - a Gateway API HTTPRoute-exposed install
- If this repo maintains API reference docs, regenerate them (per
project conventions).

---

## Validation checklist (when implementing)
1. `make test`
2. `make test-integration`
3. `make build`
4. `make lint`
5. Confirm generated manifests (`config/`) updated and committed.

</details>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking:
`xhigh` • Cost: `$8.55`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh
costs=8.55 -->