What's Next¶

The features below are planned for future Kubernaut releases. For features that shipped in v1.5, see What's New: v1.5. For features that shipped in v1.5.1 (Kubernaut Console, per-phase LLM routing, multi-provider JWT, severity triage LLM, SSE status endpoint), see What's New: v1.5.1.

v1.6 — Fleet Remediation & ITSM (next)¶

Fleet Operations¶

Hub-and-spoke deployment using ACM/OCM (Open Cluster Management) — policy-driven remediation across fleet-scale Kubernetes environments, 7 steps from alert to remediation, zero remote footprint (#54).

Remediation flow¶

Remote Prometheus forwards metrics to Thanos on hub
Alertmanager fires alert → Kubernaut Engine triggers pipeline
KE obtains JWT from Keycloak for MCP investigation
KE calls MCP on target remote cluster for RCA investigation
KE obtains JWT from Keycloak for remediation execution
KE dispatches remediation playbook to AWX
AWX executes fix on target remote cluster via ephemeral SA

Zero persistent credentials

Remediation uses ephemeral ServiceAccounts with OCM-managed lifecycle — no long-lived secrets stored on remote clusters.

ServiceNow Incident Triage¶

Consume ServiceNow incidents as signals through the API Frontend, enabling Kubernaut to investigate and remediate ITSM tickets alongside Kubernetes alerts (#1338).

Future¶

Custom Agent Injection¶

Pluggable investigation and remediation agents via the AgenticWorkflow CRD, enabling operators to inject domain-specific automation into the Kubernaut pipeline (#1242, #883, #711).

SREs define reusable agentic workflows as declarative Goose recipes — YAML-based configurations that package instructions, MCP extensions, and parameters into shareable, reproducible agent behaviors. Kubernaut injects them at three pipeline points via the Goose runtime, each calling external MCP tools. Each injection point accepts multiple stacked recipes.

Declarative Recipes — 3 pipeline injection points

Injection 1: Pre-Investigation (Kubernaut Agent)¶

Context injected into the LLM prompt before analysis begins.

Example: check-maintenance-window — Calls a CMDB MCP server to check if the resource is in a maintenance window or had recent deployments. The result is injected into the investigation context before the LLM starts. If under maintenance, alerting is skipped and the RCA is annotated as expected downtime.

Injection 2: Pre-Workflow Selection (Kubernaut Agent)¶

Constraints injected to bias workflow choice.

Example: enforce-cost-guardrails — Calls a Cost/Resource MCP for budget utilization and scaling limits for the namespace. Returns constraints such as "do not select scale-up workflows", nudging the LLM toward restart/rollback over resource-intensive remediations.

Injection 3: EM Direct Execution (via Goose)¶

Recipe runs via Kubernaut Agent endpoint at effectiveness assessment time.

Example: verify-business-slo — Calls an SLO/Business Metrics MCP to check p95 latency, error rate, and order throughput against SLO budget. Returns a structured pass/fail verdict with business impact data, replacing the default Kubernetes health check with SRE-defined assessment SOPs.

Shipped in v1.5.1

Kubernaut Console — Web UI for interactive investigation and remediation. See What's New: v1.5.1.

Subject to change

Features listed here are planned but may change. See the Kubernaut milestones for the latest status. For the full roadmap including Collective Intelligence and Operational Expansion (cost, security, non-K8s), see ROADMAP.md.