What's New¶

This page summarises the notable changes in each Kubernaut release. Kubernaut does not support in-place upgrades — each release is a fresh install. Review the changes below to understand what differs from the version you are currently running.

v1.5.1¶

Kubernaut Console¶

Kubernaut v1.5.1 introduces the Kubernaut Console, a web UI for interactive investigation and remediation. Operators can chat with the Kubernaut Agent in real time, view live RCA progress, approve remediation actions, and inspect audit trails from a single pane of glass.

Key capabilities:

A2A chat interface — interactive investigation via POST /a2a/invoke with real-time SSE streaming of agent reasoning, tool calls, and investigation events
Thinking panel — live visualization of the agent's reasoning with collapsible sections for reasoning, tool_call, and investigation events
RCA cards — structured root cause analysis display with causal chain, confidence score, severity, and tool call count
Workflow selection — recommended remediation workflows with countdown confirmation and alignment verdicts
Approval gate — approve or decline RemediationApprovalRequest via kubernaut_approve on the MCP bridge
Escalation input — inline escalation with reason capture via kubernaut_complete_no_action with escalation_reason
Verification timer — live stabilization window countdown tracking stabilization_elapsed, spec_hash_computed, alert_check, and health_check steps
Phase indicator — real-time lifecycle banner (Investigating, Decision, Remediation, Verifying, Complete) with elapsed timer
Real-time status streaming — separate SSE subscription to POST /a2a/status for RR phase changes with automatic reconnection

The Console deploys as a single pod with two containers: an oauth2-proxy sidecar (OIDC authentication, port 4180) and an nginx container serving the SPA and proxying API calls to the API Frontend (port 8080). On OpenShift, a TLS-terminated Route is created automatically.

Deployment¶

Prerequisites: Kubernetes 1.28+ or OpenShift 4.14+, Kubernaut API Frontend deployed, OIDC provider (Keycloak, Dex).

# 1. Create the OIDC secret
kubectl create secret generic kubernaut-console-oidc \
  --namespace kubernaut-system \
  --from-literal=client-id=kubernaut-console \
  --from-literal=client-secret=<YOUR_CLIENT_SECRET> \
  --from-literal=cookie-secret=$(openssl rand -base64 32)

# 2. Install the chart
helm install kubernaut-console ./chart \
  --namespace kubernaut-system \
  --set auth.issuerUrl=https://your-keycloak/realms/kubernaut \
  --set auth.clientId=kubernaut-console \
  --set apiFrontend.url=http://apifrontend-service.kubernaut-system.svc:8443

When deploying via the Kubernaut Operator, use spec.console instead (see ConsoleSpec).

Helm values¶

Value	Default	Description
`image.repository`	`ghcr.io/jordigilh/kubernaut-console`	Container image
`image.tag`	`latest`	Image version (pin by digest for production)
`apiFrontend.url`	`http://apifrontend.kubernaut-system.svc:8443`	API Frontend service URL
`auth.provider`	`oidc`	OAuth2 Proxy provider
`auth.issuerUrl`	—	OIDC issuer URL
`auth.clientId`	`kubernaut-console`	OIDC client ID
`auth.existingSecret`	`kubernaut-console-oidc`	Secret with keys: `client-id`, `client-secret`, `cookie-secret`
`auth.skipTlsVerify`	`true`	Skip TLS for dev (must be `false` in production)
`auth.redirectUrl`	—	OAuth2 callback URL
`service.type`	`ClusterIP`	Service type
`service.port`	`4180`	Service port (OAuth2 Proxy)
`route.enabled`	`true`	Create OpenShift Route
`route.host`	auto-derived	Custom route hostname
`route.tls.termination`	`edge`	TLS termination mode

Nginx proxy routes¶

Location	Target	Timeout	Notes
`/a2a/`	API Frontend	3600s	SSE streaming, buffering disabled
`/mcp`	API Frontend	30s	JSON-RPC tool calls
`/.well-known/`	API Frontend	default	Agent card discovery
`/healthz`	local 200	—	Liveness/readiness probe
`/`	static files	—	SPA fallback to `index.html`

Troubleshooting¶

Symptom	Cause	Fix
502 on `/a2a/`	AF not reachable	Check AF service DNS and port
OIDC redirect loop	Incorrect redirect URI	Verify Keycloak/Dex client config matches `auth.redirectUrl`
SSE disconnects	Proxy timeout too low	Ensure 3600s timeouts on SSE route
Stale UI after deploy	Image pull policy `IfNotPresent`	Use `Always` or pin by digest

See the Kubernaut Console repository for full architecture, Kind demo deployment, and development setup.

Interactive GitOps remediation demo¶

A video demonstrating GitOps drift remediation in interactive mode has been added to the Kubernaut README. The demo shows the full journey: a bad commit breaks a ConfigMap in a GitOps-managed production namespace, Kubernaut traces the pod crash to the ConfigMap root cause, selects git revert over kubectl rollback because the environment is GitOps-managed, pauses for human approval (production namespace), and executes the fix via ArgoCD sync.

Per-phase LLM routing (`phaseModels`)¶

Configure different LLM models for each phase of the investigation pipeline via the phaseModels map in the kubernaut-agent-llm-runtime ConfigMap:

Phase key	Description
`rca`	Root-cause analysis loop (K8s + Prometheus tools)
`workflow_discovery`	Workflow selection and discovery
`validation`	Post-selection validation

Override fields per phase: provider, endpoint, model, apiKey, plus cloud-specific fields (azureApiVersion, vertexProject, vertexLocation, bedrockRegion). Non-empty fields override the base config; temperature, maxRetries, and timeoutSeconds are always inherited. Hot-reloadable via FileWatcher — no pod restart needed.

Configuration paths: operator CR (spec.kubernautAgent.llm.phaseModels) or direct ConfigMap patch. The Helm chart does not yet expose phaseModels as a value key.

See Kubernaut Agent Config: phaseModels for the full reference.

Severity triage LLM configuration¶

The severity triage pipeline can now use a dedicated LLM instead of sharing the agent's LLM. Configure via a ConfigMap overlay under severityTriage.llm — supports provider (vertex_ai, gemini, anthropic), model, endpoint, apiKeyFile, timeoutSeconds, oauth2, circuitBreaker, and customHeaders.

The Helm chart exposes only two severity triage values: cacheTTLSeconds (default 30) and llmConfidence (default 0.7). The full LLMConfig block requires a ConfigMap overlay. The operator auto-derives severity triage enablement from spec.monitoring.enabled.

See Configuration: Severity Triage for the full reference.

Multi-provider JWT authentication¶

Both the Kubernaut Agent and API Frontend now support multiple JWT providers via a jwtProviders[] array, enabling dual-provider configurations (e.g., Keycloak + SPIRE). The two services use intentionally different schemas:

Field	Kubernaut Agent	API Frontend
Issuer	`issuer` (string)	`issuerURL` (string)
Audience	`audience` (singular string)	`audiences` (string array)
JWKS URL	`jwksURL` (required)	`jwksURL` (optional, falls back to issuerURL)
Claim mappings	Simple claim names, dot-notation	CEL expressions or claim paths

ClaimMappingsSpec supports username and groups fields. Legacy single-provider fields (issuerURL + audience) remain supported for backward compatibility.

See Configuration: JWT Providers for the full reference.

MCP tool surface: 21 to 23¶

Two tools join the public MCP bridge:

Tool	Condition	Purpose
`kubernaut_list_alerts`	Registered when `severityTriage.enabled: true` and Prometheus is configured	Query firing alerts with `namespace`, `severity`, `state` filters
`kubernaut_complete_no_action`	Always registered	Complete an investigation with no remediation — dismiss or escalate to operator

When interactive.enabled: false, 11 session-dependent tools are hidden (up from 10), leaving 12 stateless tools on MCP (13 with list_alerts if Prometheus is configured).

See MCP Tool Reference for the updated tool list.

Breaking: `kubernaut_approve` removed from A2A agent (DD-AF-006)¶

kubernaut_approve is structurally absent from the A2A agent's buildToolList(). It remains available on the MCP bridge for the Kubernaut Console's Approve/Reject buttons. This prevents an LLM from autonomously approving RemediationApprovalRequests via prompt injection, preserving the human consent gate that RARs exist to enforce.

Defense-in-depth: (1) tool absent from buildToolList(), (2) explicit prompt instruction, (3) SAR RBAC on the MCP path, (4) audit trail attributing every approval to the human user.

`POST /a2a/status` SSE endpoint (DD-AF-008)¶

New endpoint for real-time remediation status streaming. Clients subscribe to phase transitions for a specific RemediationRequest.

Request: JSON-RPC 2.0 body with method status/subscribe and params.rr_id
Events: status/update (phase, timestamp, final, metadata) and status/closing (reason, reconnect)
Heartbeat: 15-second keepalive
Auth: same OIDC chain as /mcp and /a2a/invoke

See API Frontend API: Status SSE for the full reference.

CRD changes¶

HumanReviewReason enum: added operator_escalation — triggered by kubernaut_complete_no_action with escalation_reason
SubReason enum: added OperatorEscalation
AIAnalysis reasons: added InteractiveCancelled and ParentCancelled

See CRD Reference for the updated enum tables.

Operator CR updates¶

The Kubernaut Operator CRD now includes:

JWTProviderSpec at spec.kubernautAgent.interactive.jwtProviders[] and spec.apiFrontend.auth.jwtProviders[] — name, issuerURL, jwksURL, audiences, claimMappings (username, groups)
phaseModels at spec.kubernautAgent.llm.phaseModels — per-phase LLM override map with CEL validation for keys (rca, workflow_discovery, validation)
ConsoleSpec at spec.console — enabled, auth.secretName, route.enabled, route.host, resources

See Operator CR Reference for the full schema.

Platform hardening¶

Cascade terminal phase — When a RemediationRequest enters a terminal phase, all child resources (AIAnalysis, SignalProcessing, WorkflowExecution) are patched to PhaseFailed. Idempotent, non-fatal.
alignment_verdict audit event — Emitted after every investigation with structured payload: result (aligned/suspicious), circuit_breaker_activated, summary, findings[], and optional grounding_review

v1.5¶

API Frontend — new service¶

Kubernaut v1.5 introduces the API Frontend (AF), the 11th microservice. It acts as the unified external protocol layer for MCP, A2A, and REST clients — replacing direct access to internal services with a single authenticated entry point.

MCP gateway — Exposes investigation, workflow discovery, and remediation tools via the Model Context Protocol
A2A support — Agent-to-Agent protocol with agent card discovery at /.well-known/agent-card.json
SSE streaming — Real-time investigation output streamed token-by-token via Server-Sent Events
SAR authorization — Kubernetes-native SubjectAccessReview tool authorization with 6 per-persona ClusterRoles, fail-closed, and TTL-cached results
MCP bridge — Dispatches 23 kubernaut_* MCP tools to their backends (K8s API, KA MCP, DataStorage) with per-tool RBAC, rate limiting, and audit. Not a transparent proxy — each tool has its own handler and routing

See API Frontend Architecture for the full design, and Configuration: API Frontend for Helm values.

Interactive MCP sessions¶

Operators and AI agents can now connect to Kubernaut via MCP for interactive investigation and remediation. This is the flagship v1.5 feature, replacing the autonomous-only pipeline with an operator-in-the-loop model when desired.

The API Frontend exposes 23 kubernaut_* MCP tools on its MCP endpoint (POST /mcp), organized by domain:

Domain	Tools
Investigation & session lifecycle	`kubernaut_investigate`, `kubernaut_message`, `kubernaut_complete`, `kubernaut_cancel`, `kubernaut_status`, `kubernaut_reconnect`
CRD operations	`kubernaut_list_remediations`, `kubernaut_get_remediation`, `kubernaut_approve`, `kubernaut_cancel_remediation`, `kubernaut_watch`, `kubernaut_list_approval_requests`, `kubernaut_get_approval_request`, `kubernaut_await_session`
Workflow	`kubernaut_discover_workflows`, `kubernaut_select_workflow`
Data & history	`kubernaut_list_workflows`, `kubernaut_get_remediation_history`, `kubernaut_get_effectiveness`, `kubernaut_get_audit_trail`
Presentation	`kubernaut_present_decision`

The Kubernaut Agent also runs a separate MCP server (/api/v1/mcp) with 3 tools (kubernaut_investigate, kubernaut_select_workflow, kubernaut_complete_no_action) for direct client connections with Lease-based session management.

See Interactive Sessions for the operator guide and API Frontend API for the full tool reference.

Interactive workflow discovery with LLM-populated parameters¶

The discover_workflows action on kubernaut_investigate returns workflow alternatives with parameters pre-populated by the LLM based on the root cause analysis (PR #1171, PR #1188). Operators review and edit parameters before confirming execution via kubernaut_select_workflow.

Parameter safety is enforced through comprehensive validation with LLM self-correction (PR #1187): type checking against the workflow's declared parameter schema, regex pattern matching, required field enforcement, and automatic retry when the LLM provides invalid values.

See Workflow Authoring: Parameters for the parameter schema reference.

Breaking: SAR-based tool authorization replaces file-based RBAC¶

The API Frontend's static rbac_roles.yaml ConfigMap has been replaced by Kubernetes-native SubjectAccessReview (SAR) authorization at tools/call time (PR #1222). tools/list remains unfiltered (ADR-020).

Migration required: customers who customized rbac_roles.yaml must create equivalent ClusterRoleBinding resources. The Helm chart ships 6 per-persona ClusterRoles:

ClusterRole	Persona	MCP Tools
`kubernaut-tool-sre`	Investigation + remediation (no approval)	20
`kubernaut-tool-ai-orchestrator`	Automated agent orchestration	15
`kubernaut-tool-cicd`	CI/CD pipeline integration	3
`kubernaut-tool-observability`	Read-only observability	5
`kubernaut-tool-l3-audit`	Compliance and auditing	6
`kubernaut-tool-remediation-approver`	Human approval workflows	4

SAR uses verb use on resource tools in apiGroup kubernaut.ai. Authorization is fail-closed: SAR API errors deny the tool call. Results are cached with a configurable TTL (default 30s via apifrontend.config.rbac.sarCacheTTL).

See Security & RBAC: Tool Authorization for the full model and binding examples.

Unified ServiceAccount model (ADR-022)¶

All AF Kubernetes API calls use the AF pod's own ServiceAccount — there is no per-user impersonation or token forwarding. Application-level authorization is enforced entirely through SAR-based tool gating; user attribution is preserved in the application audit trail (tool.executed events with UserID, actor_ip).

See Security & RBAC: Unified SA model for accepted risks and mitigations.

Generic cluster context tools replace narrow triage tools¶

The 4 narrow AF triage tools (af_get_pods, af_get_workloads, af_list_events, af_resolve_owner) have been replaced with 3 generic internal tools that can inspect any namespaced Kubernetes resource (#1230):

New Tool	Replaces	Purpose
`kubectl_get`	`af_get_pods`, `af_get_workloads` (single)	Get any namespaced resource by kind/name/namespace
`kubectl_list`	`af_get_pods`, `af_get_workloads` (list)	List any namespaced resources with optional label selector
`kubectl_list_events`	`af_list_events`	List events with reason/object filters (renamed for consistency)

af_resolve_owner is removed — KA independently resolves the owner chain during RCA. The new tools use RESTMapper for dynamic kind-to-GVR resolution. Secret .data fields are redacted before returning to the LLM.

All internal tools (kubectl_*, kubernaut_check_existing_remediation, kubernaut_remediate) run inside the AF's A2A agent loop and are not exposed on the MCP bridge. They are still SAR-gated via newRBACGuard() and included in per-persona ClusterRoles. The external MCP surface consists of 23 kubernaut_* MCP tools spanning CRD operations, investigation, interactive session lifecycle, alerts, analytics, and presentation.

Session takeover security (SEC-TAKEOVER-001)¶

When a second user connects to an active MCP session, the original user's investigation is abandoned, not completed. This prevents a takeover from inheriting or completing work under a different identity. The abandoned session is logged as an audit event.

DataStorage advanced configuration (v1.5)¶

Three new configuration sub-blocks for DataStorage:

server — maxBodySize (5 MiB default), corsAllowedOrigins for browser-based access, signerCertDir for audit event signing
redis — dlqMaxLen (10,000), TLS configuration for Redis/Valkey connections
retention — Automatic data retention cleanup with configurable interval (24h), batchSize (1,000), and defaultDays (2,555 ≈ 7 years)

See Configuration: DataStorage for all parameters.

OLM-first disconnected installation¶

The disconnected installation guide has been rewritten with the Operator (OLM) path as the primary method. The Helm chart path is retained as a development/testing appendix.

The OLM flow uses oc-mirror v2 with an upstream digest-pinned ImageSetConfiguration from the operator repository, producing IDMS and CatalogSource resources automatically.

Lease RBAC for session management¶

The Kubernaut Agent ServiceAccount now requires list permission on coordination.k8s.io/leases (in addition to the existing create/get/update/delete) for orphaned session reclamation at startup. The Helm chart and Operator both provision this automatically.

Platform hardening¶

SessionDrainer (BR-OPS-013) — Active MCP sessions are drained before KA pod termination during rolling updates
Race-safe session transitions — Mutex-protected session state machine prevents concurrent state corruption

v1.4¶

Prompt injection defense — Shadow Agent¶

Kubernaut v1.4 introduces a fail-closed shadow agent that evaluates every LLM tool output for prompt injection. Two evaluation layers provide defense-in-depth:

Per-step scanning with random boundary markers and data exfiltration detection
Full-context grounding review at the RCA-to-workflow boundary that detects distributed "boiling frog" injection attacks

Enforcement modes (monitor or enforce) control whether suspicious content is logged or triggers a circuit breaker that cancels the investigation. See Security & RBAC: Shadow Agent for details.

Operator workflow overrides¶

Operators can now override the AI-selected workflow when approving a RemediationApprovalRequest. The authwebhook validates that the override workflow exists and is active; the orchestrator merges the override with full audit trail. See Human Approval: Overrides.

PagerDuty and Microsoft Teams notifications¶

Two new delivery channels join Slack:

PagerDuty — Events API v2 delivery with circuit breaker and CredentialRef config pattern
Microsoft Teams — Adaptive Card delivery with circuit breaker

All delivery channels now share a generic circuit breaker pattern. See Notification Channels.

NetworkPolicies¶

12 NetworkPolicy templates with default-deny ingress posture are deployed for all Kubernaut services. Configurable CIDRs and per-service toggles via networkPolicies.<service>.enabled. See Security & RBAC: NetworkPolicies.

Breaking: Kubernaut Agent config restructured¶

The Kubernaut Agent configuration has three breaking changes:

camelCase migration (#908) — All YAML config fields migrated from snake_case to camelCase
Three-domain layout — Config reorganized into runtime, ai, and integrations top-level domains
Config split (#916) — Static ConfigMap (mounted at startup) and hot-reloadable ConfigMap (watched at runtime)

See Kubernaut Agent SDK Config for the updated reference.

Parallel tool execution¶

The investigation pipeline now executes multiple LLM tool calls concurrently when the model returns batched requests. The investigation prompt also instructs the LLM to batch independent tool calls for reduced round-trips.

Platform hardening¶

Inconclusive outcome exponential backoff (#1091) — Inconclusive outcomes trigger exponential backoff (1m → 10m cap) and 3-strikes blocking, preventing RR flood for persistent alerts
SA token refresh (#1055) — Custom token path constructor with 401 cache invalidation for Kubernaut Agent
CRD-aware engine registration (#868) — Engine registration validates CRD availability; enters degraded status when required CRDs are missing
Session hardening (#1078) — Panic recovery, two-tier TTL eviction, 25-minute wall-clock investigation timeout
Gateway security hardening (#673) — 256KB body limits, generic RFC 7807 errors, header stripping, RBAC least-privilege, trusted proxy middleware
Unified monitoring config (#463) — Prometheus and AlertManager configuration unified into a single monitoring block
Standardized log levels (#875) — Log level configuration standardized across all services
Verdict label rename (#1077) — VerdictClean changed from "clean" to "aligned". Breaking: update Prometheus queries
Audit event batching fix (#1056) — Audit 401/403 errors reclassified as retryable; token source extracted for shared cache across all callers
API version validation gate (#1044) — Detects when the LLM omits api_version for ambiguous Kubernetes Kinds (e.g., Event in both v1 and events.k8s.io/v1), retries with a correction listing all conflicting API groups, and escalates to human review on exhaustion to prevent incorrect RBAC grants
CRD TTL enforcement (#265) — Terminal RemediationRequest resources are garbage-collected after 24h (configurable via retention.period), preventing CRD accumulation in high-volume clusters

Dry-run mode¶

When dryRun is enabled, the pipeline stops after AI analysis — no WorkflowExecution, RAR, or EA CRDs are created. The RemediationRequest completes with outcome DryRun.

Kubernaut Operator¶

The Kubernaut Operator — introduced in v1.3 — is the recommended deployment method for OpenShift. v1.4 adds:

OLM lifecycle management — Install, upgrade, and uninstall via Operator Lifecycle Manager with automatic CRD installation and cleanup
Supply chain security — Container images ship with SBOM, Cosign signatures, and SLSA provenance attestations
postgresql.sslMode — Configurable SSL mode for PostgreSQL connections (disable, require, verify-ca, verify-full)
notification.routing BYO — Bring-your-own routing ConfigMap with hot-reload support
runtimeConfigMapName — Separate hot-reloadable ConfigMap for Kubernaut Agent runtime configuration
Init image mirroring — RELATED_IMAGE_* environment variables for disconnected/air-gapped installs

See the Operator installation guide for deployment instructions.

Deprecated: OCP-specific Helm chart¶

The OCP-specific Helm chart is deprecated (#848). Use the unified kubernaut chart with the Kubernaut Operator for OpenShift deployments.

Removed: Conversation API¶

Conversational mode for Kubernaut Agent (#592) has been removed from v1.4 and deferred to v1.5 as part of the interactive session model.

v1.3¶

Kubernaut Agent (formerly HolmesGPT)¶

The LLM integration component has been renamed from HolmesGPT / HAPI to Kubernaut Agent across all services, Helm values, ConfigMaps, and documentation.

Before (v1.2)	After (v1.3)
`holmesgptApi.*` Helm values	`kubernautAgent.*`
`holmesgpt-sdk-config` ConfigMap	`kubernaut-agent-sdk-config`
`holmesgpt-config` ConfigMap	`kubernaut-agent-config`

Two-invocation investigation architecture¶

The investigation pipeline has been redesigned from a single three-phase LLM session (v1.1/v1.2) into two independent LLM invocations:

Invocation 1 — Root Cause Analysis: A full tool-access session that performs live Kubernetes inspection and produces a structured RCA result.
Invocation 2 — Workflow Selection: A separate session with no memory of Invocation 1, receiving only structured context fields. Selects a workflow or reports that none is applicable.

This separation improves reliability and makes each invocation independently testable.

mTLS and three-port model¶

All inter-service communication can now be secured with mutual TLS. API-serving components (Gateway, DataStorage, Kubernaut Agent, AIAnalysis) expose three ports:

HTTPS serving port — mTLS-protected API traffic
Health port — plaintext liveness/readiness probes
Metrics port — plaintext Prometheus scrape target

Certificate rotation is handled automatically when tls.mode: hook is set, or delegated to cert-manager. See Monitoring for port details and probe configuration.

SDK config hot-reload¶

The Kubernaut Agent SDK config (LLM model, endpoint, API key, toolset settings) now supports hot-reload via fsnotify. Active investigations pin a config snapshot at session start, so in-flight work is unaffected. Provider-level settings (llm.provider, OAuth2 credentials) still require a pod restart.

Expanded LLM provider support¶

The Kubernaut Agent now supports Vertex AI, OpenAI, Anthropic, Bedrock, Ollama, and additional providers via LangChainGo.

Custom HTTP headers for LLM endpoints¶

Users can now inject custom HTTP headers into outbound LLM API requests. This supports LLM proxies, API gateways, and corporate firewalls that require additional authentication headers. Three value sources are available: static values, Kubernetes Secret references (via environment variables), and file paths (for rotating tokens).

Effectiveness Monitor improvements¶

maxConcurrentReconciles for parallel EA processing
Configurable connectionTimeout, prometheusLookback, and scrapeInterval
Clarified stabilization window semantics (EM-internal vs RO-configured EA.spec)

See Effectiveness for configuration details.

Notification coverage¶

Block reasons and terminal failure states now produce notifications (BR-ORCH-036), closing gaps where operators were not informed of remediation failures.

Prometheus metric rename¶

Kubernaut Agent metrics have been renamed from the legacy holmesgpt_* namespace to aiagent_api_*. Update any Prometheus queries, alerting rules, or dashboards that reference the old metric names. See Monitoring for the current metric reference.

New notification types¶

v1.3 introduces additional notification types that may require routing configuration updates:

Escalation notifications for trust-ladder escalation events
StatusUpdate notifications for transient block conditions
ManualReview notifications now split by review-source for finer routing control

Data persistence¶

Comprehensive schema documentation rewritten from the live v1.3 database, including enrichment tables, metric baselines, and updated entity-relationship diagrams.

Feature enrichments and metrics¶

New documentation for feature enrichment pipeline stages and the notification metrics design decision (DD-METRICS-001).

v1.2¶

Per-workflow ServiceAccount and RBAC¶

Each workflow execution now runs under its own ServiceAccount with a dedicated TokenRequest. This replaces the shared SA model from v1.1 and provides fine-grained RBAC isolation per remediation workflow.

Declarative workflow catalog¶

The workflow catalog has moved from OCI-containerized workflow bundles to declarative RemediationWorkflow CRDs with category and label-based matching plus confidence scoring.

Effectiveness and notification pipeline¶

Updated effectiveness assessment configuration, notification routing semantics, and EM config key alignment.

DataStorage, audit, and monitoring¶

Updated data access patterns, audit event documentation, and monitoring metric names.

Signal Processing and Gateway¶

Rego policy entrypoint corrections, gateway label contract updates, and investigation tier-1 semantics fixes.

v1.1¶

Initial documented release of Kubernaut.

CRD-based microservices architecture with the full six-stage remediation pipeline
Prometheus AlertManager and Kubernetes Event ingestion
LLM-powered root cause analysis with Kubernetes inspection tools
Remediation execution via Kubernetes Jobs, Tekton Pipelines, or Ansible (AWX/AAP)
Effectiveness assessment with four-dimensional scoring
Human approval gates via RemediationApprovalRequest CRDs
Rego-based policy evaluation for signal processing and approval
Multichannel notifications (Slack, console, log, file)
Full audit trail with 7-year retention and CRD reconstruction
ActionType and RemediationWorkflow CRD registration via Auth Webhook
Alert decay detection (DD-EM-003)
Resource lock persistence with deterministic naming (DD-WE-003)

v1.0¶

End-of-life. No longer documented or supported.

What's New¶

v1.5.1¶

Kubernaut Console¶

Deployment¶

Helm values¶

Nginx proxy routes¶

Troubleshooting¶

Interactive GitOps remediation demo¶

Per-phase LLM routing (phaseModels)¶

Severity triage LLM configuration¶

Multi-provider JWT authentication¶

MCP tool surface: 21 to 23¶

Breaking: kubernaut_approve removed from A2A agent (DD-AF-006)¶

POST /a2a/status SSE endpoint (DD-AF-008)¶

CRD changes¶

Operator CR updates¶

Platform hardening¶

v1.5¶

API Frontend — new service¶

Interactive MCP sessions¶

Interactive workflow discovery with LLM-populated parameters¶

Breaking: SAR-based tool authorization replaces file-based RBAC¶

Unified ServiceAccount model (ADR-022)¶

Generic cluster context tools replace narrow triage tools¶

Session takeover security (SEC-TAKEOVER-001)¶

DataStorage advanced configuration (v1.5)¶

OLM-first disconnected installation¶

Lease RBAC for session management¶

Platform hardening¶

v1.4¶

Prompt injection defense — Shadow Agent¶

Operator workflow overrides¶

PagerDuty and Microsoft Teams notifications¶

NetworkPolicies¶

Breaking: Kubernaut Agent config restructured¶

Parallel tool execution¶

Platform hardening¶

Dry-run mode¶

Kubernaut Operator¶

Deprecated: OCP-specific Helm chart¶

Removed: Conversation API¶

v1.3¶

Kubernaut Agent (formerly HolmesGPT)¶

Two-invocation investigation architecture¶

mTLS and three-port model¶

SDK config hot-reload¶

Expanded LLM provider support¶

Custom HTTP headers for LLM endpoints¶

Effectiveness Monitor improvements¶

Notification coverage¶

Prometheus metric rename¶

New notification types¶

Data persistence¶

Feature enrichments and metrics¶

v1.2¶

Per-workflow ServiceAccount and RBAC¶

Declarative workflow catalog¶

Effectiveness and notification pipeline¶

DataStorage, audit, and monitoring¶

Signal Processing and Gateway¶

v1.1¶

v1.0¶

Per-phase LLM routing (`phaseModels`)¶

Breaking: `kubernaut_approve` removed from A2A agent (DD-AF-006)¶

`POST /a2a/status` SSE endpoint (DD-AF-008)¶