Custom Resources (CRDs)¶
Kubernaut API reference for all Custom Resource Definitions.
API Group: kubernaut.ai/v1alpha1
AIAnalysis¶
AIAnalysis is the Schema for the aianalyses API.
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | AIAnalysis |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
AIAnalysisSpec | |
status |
AIAnalysisStatus |
AIAnalysisReason¶
Underlying type: string
AIAnalysisReason represents the umbrella failure or completion reason.
Appears in: - AIAnalysisStatus
Validation: - Enum: [AnalysisCompleted WorkflowResolutionFailed WorkflowNotNeeded NoWorkflowSelected RegoEvaluationError TransientError APIError]
| Value | Description |
|---|---|
AnalysisCompleted |
|
WorkflowResolutionFailed |
|
WorkflowNotNeeded |
|
NoWorkflowSelected |
|
RegoEvaluationError |
|
TransientError |
|
APIError |
AIAnalysisSpec¶
AIAnalysisSpec defines the desired state of AIAnalysis.
Spec Immutability AIAnalysis represents an immutable event (AI investigation). Once created by RemediationOrchestrator, spec cannot be modified to ensure: - Audit trail integrity (AI investigation matches original RCA request) - No tampering with RCA targets post-HAPI validation - No workflow selection modification after AI recommendation
To re-analyze, delete and recreate the AIAnalysis CRD.
Appears in: - AIAnalysis
| Field | Type | Description |
|---|---|---|
remediationRequestRef |
ObjectReference | Reference to parent RemediationRequest CRD for audit trail |
remediationId |
string | Remediation ID for audit correlation |
analysisRequest |
AnalysisRequest | Complete analysis request with structured context |
timeoutConfig |
AIAnalysisTimeoutConfig | TIMEOUT CONFIGURATION Replaces deprecated annotation-based timeout (security + validation) Passed through from RR.Status.TimeoutConfig.AIAnalysisTimeout by RO ( moved to Status) Optional timeout configuration for this analysis If nil, AIAnalysis controller uses defaults (Investigating: 60s, Analyzing: 5s) |
AIAnalysisStatus¶
AIAnalysisStatus defines the observed state of AIAnalysis.
Appears in: - AIAnalysis
| Field | Type | Description |
|---|---|---|
observedGeneration |
integer | ObservedGeneration is the most recent generation observed by the controller. Used to prevent duplicate reconciliations and ensure idempotency. Per Standard pattern for all Kubernetes controllers. |
phase |
string | Phase tracking (no "Approving" or "Recommending" phase - simplified 4-phase flow) |
message |
string | |
reason |
AIAnalysisReason | Reason provides the umbrella failure or completion category. |
subReason |
string | SubReason provides specific failure cause within the Reason category Maps to needs_human_review triggers from HolmesGPT-API Added InvestigationInconclusive, ProblemResolved for new investigation outcomes |
startedAt |
Time | Timestamps |
completedAt |
Time | |
rootCause |
string | Identified root cause |
rootCauseAnalysis |
RootCauseAnalysis | Root cause analysis details |
selectedWorkflow |
SelectedWorkflow | Selected workflow for execution (populated when phase=Completed) |
alternativeWorkflows |
AlternativeWorkflow array | ALTERNATIVE WORKFLOWS Alternative workflows considered but not selected. INFORMATIONAL ONLY - NOT for automatic execution. Helps operators make informed approval decisions and provides audit trail. Per HolmesGPT-API team: Alternatives are for CONTEXT, not EXECUTION. |
approvalRequired |
boolean | True if approval is required (confidence < 80% or policy requires) |
approvalReason |
string | Reason why approval is required (when ApprovalRequired=true) |
approvalContext |
ApprovalContext | Rich context for approval notification |
needsHumanReview |
boolean | Set by HAPI when AI cannot produce reliable result True if human review required (HAPI decision: RCA incomplete/unreliable) Triggers NotificationRequest creation in RO BR-496 v2: Set when root_owner missing (rca_incomplete) or validation/confidence issues. |
humanReviewReason |
string | Reason why human review needed (when NeedsHumanReview=true) Maps to HAPI's human_review_reason enum values |
actionability |
string | #388: LLM's assessment of whether the alert warrants action. Empty when not yet assessed (pre-investigation or error paths). "Actionable" when the LLM determines the alert warrants action (default for all processed alerts). "NotActionable" when the LLM determines the alert is benign (e.g., orphaned PVCs). |
investigationId |
string | HolmesGPT investigation ID for correlation |
investigationTime |
integer | Investigation duration in seconds |
warnings |
string array | Non-fatal warnings from HolmesGPT-API (e.g., low confidence) |
validationAttemptsHistory |
ValidationAttempt array | ValidationAttemptsHistory contains complete history of all HAPI validation attempts Per HAPI retries up to 3 times with LLM self-correction This field provides audit trail for operator notifications and debugging |
degradedMode |
boolean | DegradedMode indicates if the analysis ran with degraded capabilities (e.g., Rego policy evaluation failed, using safe defaults) |
totalAnalysisTime |
integer | TotalAnalysisTime is the total duration of the analysis in seconds |
consecutiveFailures |
integer | ConsecutiveFailures tracks retry attempts for exponential backoff Reset to 0 on success, increment on transient failure Used with for retry logic with jitter |
investigationSession |
InvestigationSession | Tracks the async submit/poll session with HAPI InvestigationSession tracks the async HAPI session for submit/poll pattern |
postRCAContext |
PostRCAContext | Runtime-computed cluster characteristics from HAPI PostRCAContext holds data computed by HAPI after RCA (e.g., DetectedLabels). Immutable once set — use CEL validation on the PostRCAContext type. |
conditions |
Condition array | Conditions |
AIAnalysisTimeoutConfig¶
AIAnalysisTimeoutConfig defines timeout settings for AIAnalysis phases
Appears in: - AIAnalysisSpec
| Field | Type | Description |
|---|---|---|
investigatingTimeout |
Duration | Timeout for Investigating phase (HolmesGPT-API call) Default: 60s if not specified |
analyzingTimeout |
Duration | Timeout for Analyzing phase (Rego policy evaluation) Default: 5s if not specified |
ActionLink¶
ActionLink represents an external service action link
Appears in: - NotificationRequestSpec
| Field | Type | Description |
|---|---|---|
service |
ActionLinkServiceType | Service name (github, grafana, prometheus, kubernetes-dashboard, etc.) |
url |
string | Action link URL |
label |
string | Human-readable label for the link |
ActionLinkServiceType¶
Underlying type: string
Appears in: - ActionLink
| Value | Description |
|---|---|
grafana |
|
prometheus |
ActionType¶
ActionType is the Schema for the actiontypes API. Kubernetes-native action type taxonomy definition.
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | ActionType |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
ActionTypeSpec | |
status |
ActionTypeStatus |
ActionTypeDescription¶
ActionTypeDescription provides structured information about an action type.
Appears in: - ActionTypeSpec
| Field | Type | Description |
|---|---|---|
what |
string | What describes what this action type concretely does. |
whenToUse |
string | WhenToUse describes conditions under which this action type is appropriate. |
whenNotToUse |
string | WhenNotToUse describes specific exclusion conditions. |
preconditions |
string | Preconditions describes conditions that must be verified before use. |
ActionTypeSpec¶
ActionTypeSpec defines the desired state of ActionType. ActionType CRD lifecycle management.
Appears in: - ActionType
| Field | Type | Description |
|---|---|---|
name |
string | Name is the PascalCase action type identifier (e.g., RestartPod, ScaleReplicas). Immutable after creation. |
description |
ActionTypeDescription | Description provides structured information about the action type. Only this field is mutable after creation. |
ActionTypeStatus¶
ActionTypeStatus defines the observed state of ActionType.
Appears in: - ActionType
| Field | Type | Description |
|---|---|---|
registered |
boolean | Registered indicates whether the action type has been successfully registered in the DS catalog. |
registeredAt |
Time | RegisteredAt is the timestamp of initial registration in the catalog. |
registeredBy |
string | RegisteredBy is the identity of the registrant (K8s SA or user). |
previouslyExisted |
boolean | PreviouslyExisted indicates if this action type was re-enabled after being disabled. |
activeWorkflowCount |
integer | ActiveWorkflowCount is the number of active RemediationWorkflows referencing this action type. Best-effort, updated asynchronously by the RW admission webhook handler. |
catalogStatus |
CatalogStatus | CatalogStatus reflects the DS catalog lifecycle state. |
AlternativeApproach¶
AlternativeApproach describes an alternative approach with pros/cons
Appears in: - ApprovalContext
| Field | Type | Description |
|---|---|---|
approach |
string | Approach description |
prosCons |
string | ProsCons analysis |
AlternativeWorkflow¶
AlternativeWorkflow contains alternative workflows considered but not selected. INFORMATIONAL ONLY - NOT for automatic execution. Helps operators understand AI reasoning during approval decisions.
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
workflowId |
string | Workflow identifier (catalog lookup key) |
executionBundle |
string | Execution bundle OCI reference (digest-pinned) - resolved by HolmesGPT-API |
confidence |
float | Confidence score (0.0-1.0) - shows why it wasn't selected |
rationale |
string | Rationale explaining why this workflow was considered |
AnalysisContext¶
AnalysisContext captures AI analysis results.
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
approvalReason |
string | ApprovalReason explains why approval was required. |
rootCause |
string | RootCause is the AI-determined root cause summary. |
outcome |
string | Outcome is the remediation outcome (e.g., "Success", "Failed"). |
AnalysisRequest¶
AnalysisRequest contains the structured analysis request Self-contained context for AIAnalysis
Appears in: - AIAnalysisSpec
| Field | Type | Description |
|---|---|---|
signalContext |
SignalContextInput | Signal context from SignalProcessing enrichment |
analysisTypes |
AnalysisType array | Analysis types to perform |
AnalysisType¶
Underlying type: string
AnalysisType represents a type of analysis to perform.
Appears in: - AnalysisRequest
Validation: - Enum: [Investigation RootCause WorkflowSelection]
| Value | Description |
|---|---|
Investigation |
|
RootCause |
|
WorkflowSelection |
ApprovalAlternative¶
ApprovalAlternative describes an alternative approach with pros/cons
Appears in: - RemediationApprovalRequestSpec
| Field | Type | Description |
|---|---|---|
approach |
string | Alternative approach description |
prosCons |
string | Pros and cons analysis |
ApprovalContext¶
ApprovalContext contains rich context for approval notifications
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
reason |
string | Reason why approval is required |
confidenceScore |
float | ConfidenceScore from AI analysis (0.0-1.0) |
confidenceLevel |
string | ConfidenceLevel: "low" | "medium" | "high" |
investigationSummary |
string | InvestigationSummary from HolmesGPT analysis |
evidenceCollected |
string array | EvidenceCollected that led to this conclusion |
recommendedActions |
RecommendedAction array | RecommendedActions with rationale |
alternativesConsidered |
AlternativeApproach array | AlternativesConsidered with pros/cons |
whyApprovalRequired |
string | WhyApprovalRequired explains the need for human review |
policyEvaluation |
PolicyEvaluation | PolicyEvaluation contains Rego policy evaluation details |
ApprovalDecision¶
Underlying type: string
ApprovalDecision represents the operator's decision on an approval request
Appears in: - RemediationApprovalRequestStatus
Validation: - Enum: [ Approved Rejected Expired]
| Value | Description |
|---|---|
| `` | ApprovalDecisionPending indicates no decision has been made yet |
Approved |
ApprovalDecisionApproved indicates the operator approved the remediation |
Rejected |
ApprovalDecisionRejected indicates the operator rejected the remediation |
Expired |
ApprovalDecisionExpired indicates the approval request timed out |
ApprovalPolicyEvaluation¶
ApprovalPolicyEvaluation contains Rego policy evaluation results
Appears in: - RemediationApprovalRequestSpec
| Field | Type | Description |
|---|---|---|
policyName |
string | Policy name that was evaluated |
matchedRules |
string array | Rules that matched and triggered approval requirement |
decision |
string | Policy decision (PascalCase per K8s enum convention, values from PolicyDecision type) |
ApprovalRecommendedAction¶
ApprovalRecommendedAction describes a recommended action with rationale
Appears in: - RemediationApprovalRequestSpec
| Field | Type | Description |
|---|---|---|
action |
string | Action description |
rationale |
string | Rationale for this action |
BlockClearanceDetails¶
BlockClearanceDetails tracks the clearing of PreviousExecutionFailed blocks Required for SOC2 CC7.3 (Immutability), CC7.4 (Completeness), CC8.1 (Attribution) Preserves audit trail when operators clear execution blocks after investigation
Appears in: - WorkflowExecutionStatus
| Field | Type | Description |
|---|---|---|
clearedAt |
Time | ClearedAt is the timestamp when the block was cleared |
clearedBy |
string | ClearedBy is the Kubernetes user who cleared the block Extracted from request context (if available) or annotation value Format: username@domain or service-account:namespace:name Example: "admin@kubernaut.ai" or "service-account:kubernaut-system:operator" |
clearReason |
string | ClearReason is the operator-provided reason for clearing Required for audit trail accountability Example: "manual investigation complete, cluster state verified" |
clearMethod |
string | ClearMethod indicates how the block was cleared Annotation: Via kubernaut.ai/clear-execution-block annotation APIEndpoint: Via dedicated clearing API endpoint (future) StatusField: Via direct status field update (future) |
BlockReason¶
Underlying type: string
BlockReason represents the reason why a RemediationRequest is blocked (non-terminal).
Appears in: - RemediationRequestStatus
Validation: - Enum: [ConsecutiveFailures DuplicateInProgress ResourceBusy RecentlyRemediated ExponentialBackoff UnmanagedResource IneffectiveChain]
| Value | Description |
|---|---|
ConsecutiveFailures |
BlockReasonConsecutiveFailures indicates remediation failed 3+ times consecutively. This is a temporary block with a 1-hour cooldown period. |
DuplicateInProgress |
BlockReasonDuplicateInProgress indicates another RR with the same fingerprint is active. This prevents Gateway RR flood by keeping the duplicate in non-terminal Blocked state. |
ResourceBusy |
BlockReasonResourceBusy indicates another WorkflowExecution is running on the same target. This prevents concurrent modifications to the same Kubernetes resource. |
RecentlyRemediated |
BlockReasonRecentlyRemediated indicates the same workflow+target was executed recently. This enforces a cooldown period (default 5 minutes) to prevent redundant executions. |
ExponentialBackoff |
BlockReasonExponentialBackoff indicates pre-execution failures require a backoff period. This implements graduated retry for transient infrastructure failures. |
UnmanagedResource |
BlockReasonUnmanagedResource indicates the target resource is not managed by Kubernaut. The resource or namespace does not have the kubernaut.ai/managed=true label. RO will retry with exponential backoff (5s → 10s → ... → 5min) until RR times out. |
IneffectiveChain |
BlockReasonIneffectiveChain indicates consecutive remediations for the same target have been ineffective (resource keeps reverting or health doesn't improve). Escalates to human review via NotificationRequest. |
DedupContext¶
DedupContext captures deduplication context .
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
duplicateCount |
string | DuplicateCount is the number of duplicate signals. |
DeduplicationStatus¶
DeduplicationStatus tracks signal occurrence for deduplication. OWNER: Gateway Service (exclusive write access)
Appears in: - RemediationRequestStatus
| Field | Type | Description |
|---|---|---|
firstSeenAt |
Time | FirstSeenAt is when this signal fingerprint was first observed |
lastSeenAt |
Time | LastSeenAt is when this signal fingerprint was last observed |
occurrenceCount |
integer | OccurrenceCount tracks how many times this signal has been seen |
DeliveryAttempt¶
DeliveryAttempt records a single delivery attempt to a channel
Appears in: - NotificationRequestStatus
| Field | Type | Description |
|---|---|---|
channel |
DeliveryChannelName | Channel name |
attempt |
integer | Attempt number (1-based) |
timestamp |
Time | Timestamp of this attempt |
status |
DeliveryAttemptStatus | Status of this attempt (success, failed, timeout, invalid) |
error |
string | Error message if failed |
durationSeconds |
float | Duration of delivery attempt in seconds |
DeliveryAttemptStatus¶
Underlying type: string
Appears in: - DeliveryAttempt
Validation: - Enum: [success failed timeout invalid]
| Value | Description |
|---|---|
success |
|
failed |
|
timeout |
|
invalid |
DeliveryChannelName¶
Underlying type: string
Appears in: - DeliveryAttempt
EAComponents¶
EAComponents tracks the completion state and scores of each assessment component. The EM updates these fields as each component check completes. This enables restart recovery: if EM restarts mid-assessment, it can skip already-completed components by checking these flags.
Appears in: - EffectivenessAssessmentStatus
| Field | Type | Description |
|---|---|---|
healthAssessed |
boolean | HealthAssessed indicates whether the health check has been completed. |
healthScore |
float | HealthScore is the health check score (0.0-1.0), nil if not yet assessed. |
hashComputed |
boolean | HashComputed indicates whether the spec hash comparison has been completed. |
postRemediationSpecHash |
string | PostRemediationSpecHash is the hash of the target resource spec after remediation. |
currentSpecHash |
string | CurrentSpecHash is the most recent hash of the target resource spec, re-computed on each reconcile after HashComputed is true . If it differs from PostRemediationSpecHash, spec drift was detected. |
alertAssessed |
boolean | AlertAssessed indicates whether the alert resolution check has been completed. |
alertScore |
float | AlertScore is the alert resolution score (0.0 or 1.0), nil if not yet assessed. |
metricsAssessed |
boolean | MetricsAssessed indicates whether the metric comparison has been completed. |
metricsScore |
float | MetricsScore is the metric comparison score (0.0-1.0), nil if not yet assessed. |
alertDecayRetries |
integer | AlertDecayRetries tracks the number of times the EM re-checked a firing alert during decay monitoring. Incremented each reconcile where isAlertDecay returns true. A non-zero value means the EM confirmed the resource was healthy but the alert persisted, indicating Prometheus lookback window decay. |
EAConfig¶
EAConfig contains assessment configuration set by the RO at creation time. StabilizationWindow controls how long the EM waits after remediation before starting assessment checks. HashComputeDelay and AlertCheckDelay are optional Duration-based delays that the RO computes based on target type and signal mode. All other assessment parameters (PrometheusEnabled, AlertManagerEnabled, ValidityWindow) are EM-internal configuration read from effectivenessmonitor.Config. The EM emits individual component audit events to DataStorage; the overall effectiveness score is computed by DataStorage on demand, not by the EM.
Appears in: - EffectivenessAssessmentSpec
| Field | Type | Description |
|---|---|---|
stabilizationWindow |
Duration | StabilizationWindow is the duration to wait after remediation before assessment. Set by the Remediation Orchestrator. The EM uses this to delay assessment until the system stabilizes post-remediation. |
hashComputeDelay |
Duration | HashComputeDelay is the duration to defer post-remediation spec hash computation after EA creation. Set by the RO for async-managed targets (GitOps, operator CRDs) where spec changes propagate after the WorkflowExecution completes. The EM computes the deferral deadline as: creation + HashComputeDelay. Nil means compute immediately (sync workflows, backward compatible). |
alertCheckDelay |
Duration | AlertCheckDelay is an additional duration to defer alert resolution checks beyond the StabilizationWindow. Set by the RO for proactive (predictive) alerts where the underlying Prometheus alert (e.g. predict_linear) requires extra time to resolve after remediation. The EM computes AlertManagerCheckAfter as: creation + StabilizationWindow + AlertCheckDelay Nil means no additional delay (AlertManagerCheckAfter = PrometheusCheckAfter). |
EffectivenessAssessment¶
EffectivenessAssessment is the Schema for the effectivenessassessments API. It is created by the Remediation Orchestrator and watched by the Effectiveness Monitor.
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | EffectivenessAssessment |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
EffectivenessAssessmentSpec | |
status |
EffectivenessAssessmentStatus |
EffectivenessAssessmentSpec¶
EffectivenessAssessmentSpec defines the desired state of an EffectivenessAssessment.
The spec is set by the Remediation Orchestrator at creation time and is immutable. Immutability is enforced by CEL validation (self == oldSelf) to prevent tampering.
Appears in: - EffectivenessAssessment
| Field | Type | Description |
|---|---|---|
correlationID |
string | CorrelationID is the name of the parent RemediationRequest. Used as the correlation ID for audit events . |
remediationRequestPhase |
string | RemediationRequestPhase is the RemediationRequest's OverallPhase at the time the EA was created. Captured as an immutable spec field so the EM can branch assessment logic based on the RR outcome (Verifying, Completed, Failed, TimedOut). Verifying: happy path — WFE succeeded, EA created while RR awaits assessment . Previously stored as the mutable label kubernaut.ai/rr-phase; moved to spec for immutability and security. |
signalTarget |
TargetResource | SignalTarget is the resource that triggered the alert. Source: RR.Spec.TargetResource (from Gateway alert extraction). Used by: health assessment, alert resolution, metrics queries . |
remediationTarget |
TargetResource | RemediationTarget is the resource the workflow modified. Source: AA.Status.RootCauseAnalysis.RemediationTarget (from HAPI RCA resolution). Used by: spec hash computation, drift detection . |
config |
EAConfig | Config contains the assessment configuration parameters. |
remediationCreatedAt |
Time | RemediationCreatedAt is the creation timestamp of the parent RemediationRequest. Set by the RO at EA creation time from rr.CreationTimestamp. Used by the audit manager to compute resolution_time_seconds in the assessment.completed event (CompletedAt - RemediationCreatedAt). |
signalName |
string | SignalName is the original alert/signal name from the parent RemediationRequest. Set by the RO at EA creation time from rr.Spec.SignalName. Used by the audit manager to populate the signal_name field in assessment.completed events (OBS-1: distinct from CorrelationID which is the RR name). |
preRemediationSpecHash |
string | PreRemediationSpecHash is the canonical spec hash of the target resource BEFORE remediation was applied. Copied from rr.Status.PreRemediationSpecHash by the RO at EA creation time. The EM uses this to compare pre vs post-remediation state for spec drift detection, eliminating the need to query DataStorage audit events. |
EffectivenessAssessmentStatus¶
EffectivenessAssessmentStatus defines the observed state of an EffectivenessAssessment.
Appears in: - EffectivenessAssessment
| Field | Type | Description |
|---|---|---|
phase |
string | Phase is the current lifecycle phase of the assessment. |
validityDeadline |
Time | ValidityDeadline is the absolute time after which the assessment expires. Computed by the EM controller on first reconciliation as: EA.creationTimestamp + validityWindow (from EM config). This follows Kubernetes spec/status convention: the RO sets desired state (StabilizationWindow in spec), and the EM computes observed/derived state (ValidityDeadline in status). This prevents misconfiguration where StabilizationWindow > ValidityDeadline. |
prometheusCheckAfter |
Time | PrometheusCheckAfter is the earliest time to query Prometheus for metrics. Computed by the EM controller on first reconciliation as: EA.creationTimestamp + StabilizationWindow (from EA spec). Stored in status to avoid recomputation on every reconcile and for operator observability of the assessment timeline. |
alertManagerCheckAfter |
Time | AlertManagerCheckAfter is the earliest time to check AlertManager for alert resolution. Computed by the EM controller on first reconciliation as: EA.creationTimestamp + StabilizationWindow + AlertCheckDelay (if set). When AlertCheckDelay is nil, equals PrometheusCheckAfter. Stored in status to avoid recomputation on every reconcile and for operator observability of the assessment timeline. |
components |
EAComponents | Components tracks the completion state of each assessment component. |
assessmentReason |
string | AssessmentReason describes why the assessment completed with this outcome. |
completedAt |
Time | CompletedAt is the timestamp when the assessment finished. |
message |
string | Message provides human-readable details about the current state. |
conditions |
Condition array | Conditions represent the latest available observations of the EA's state. |
EnrichmentConfig¶
EnrichmentConfig specifies per-signal enrichment settings. V2.0 PLACEHOLDER: These fields are currently NOT read by the controller. All signals use the global enrichment config from the controller's YAML configuration (enrichment.cacheTtl, enrichment.timeout). Per-signal overrides will be implemented in V2.0.
Appears in: - SignalProcessingSpec
| Field | Type | Description |
|---|---|---|
enableClusterState |
boolean | Enable cluster state enrichment |
enableMetrics |
boolean | Enable metrics enrichment |
enableHistorical |
boolean | Enable historical enrichment |
timeout |
Duration | Timeout for enrichment operations |
Environment¶
Underlying type: string
Environment represents a canonical deployment environment. 4 canonical environments + Unknown fallback.
Appears in: - EnvironmentClassification
Validation: - Enum: [Production Staging Development Test Unknown]
| Value | Description |
|---|---|
Production |
|
Staging |
|
Development |
|
Test |
|
Unknown |
EnvironmentClassification¶
EnvironmentClassification from .
V2.0: Removed signal-labels source (security vulnerability)
Appears in: - SignalProcessingStatus
| Field | Type | Description |
|---|---|---|
environment |
Environment | |
source |
string | Source of classification: namespace-labels, rego-inference, default |
classifiedAt |
Time | When classification was performed |
ExecutionConfig¶
ExecutionConfig contains minimal execution settings. ServiceAccountName moved to Spec.ServiceAccountName (engine-agnostic).
Appears in: - WorkflowExecutionSpec
| Field | Type | Description |
|---|---|---|
timeout |
Duration | Timeout for the entire workflow (Tekton PipelineRun timeout) Default: use global timeout from RemediationRequest or 30m |
ExecutionContext¶
ExecutionContext captures execution and retry data.
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
retryCount |
string | RetryCount is the number of retries attempted. |
maxRetries |
string | MaxRetries is the maximum number of retries allowed. |
lastExitCode |
string | LastExitCode is the last exit code from the workflow execution. |
previousExecution |
string | PreviousExecution is the name of the previous WorkflowExecution. |
timeoutPhase |
string | TimeoutPhase is the phase that timed out. |
phaseTimeout |
string | PhaseTimeout is the duration string for the phase timeout. |
ExecutionStatusSummary¶
ExecutionStatusSummary captures key execution resource status fields Lightweight summary for both Tekton PipelineRun and K8s Job backends
Appears in: - WorkflowExecutionStatus
| Field | Type | Description |
|---|---|---|
status |
ConditionStatus | Status of the execution resource (Unknown, True, False) |
reason |
string | Reason from the execution resource (e.g., "Succeeded", "Failed", "Running") |
message |
string | Message from the execution resource |
completedTasks |
integer | CompletedTasks count |
totalTasks |
integer | TotalTasks count (from pipeline spec) |
FailureDetails¶
FailureDetails contains structured failure classification information
Appears in: - WorkflowExecutionStatus
| Field | Type | Description |
|---|---|---|
failedTaskIndex |
integer | FailedTaskIndex is 0-indexed position of failed task in pipeline |
failedTaskName |
string | FailedTaskName is the name of the failed Tekton Task |
failedStepName |
string | FailedStepName is the name of the failed step within the task (if available) Tekton tasks can have multiple steps; this identifies the specific step |
reason |
string | Reason is a Kubernetes-style reason code Used for deterministic failure classification by RO |
message |
string | Message is human-readable error message (for logging/UI/notifications) |
exitCode |
integer | ExitCode from container (if applicable) Useful for script-based tasks that return specific exit codes |
failedAt |
Time | FailedAt is the timestamp when the failure occurred |
executionTimeBeforeFailure |
Duration | ExecutionTimeBeforeFailure is how long the workflow ran before failing |
naturalLanguageSummary |
string | NaturalLanguageSummary is a human/LLM-readable failure description Generated by WE controller from structured data above Used by: - RO: Included in failure notifications - Notification: Included in user-facing failure alerts |
wasExecutionFailure |
boolean | WasExecutionFailure indicates whether the failure occurred during workflow execution true = workflow RAN and failed (non-idempotent actions may have occurred) false = workflow failed BEFORE execution (validation, image pull, quota, etc.) CRITICAL: Execution failures (true) block ALL future retries for this target Pre-execution failures (false) get exponential backoff |
FailurePhase¶
Underlying type: string
FailurePhase represents the orchestration phase where a failure occurred. PascalCase for CRD phase values.
Appears in: - RemediationRequestStatus
Validation: - Enum: [Configuration SignalProcessing AIAnalysis Approval WorkflowExecution Blocked]
| Value | Description |
|---|---|
Configuration |
|
SignalProcessing |
|
AIAnalysis |
|
Approval |
|
WorkflowExecution |
|
Blocked |
InvestigationSession¶
InvestigationSession tracks the async HAPI session lifecycle. AA controller session tracking Session regeneration on 404 (HAPI restart)
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
id |
string | Session ID returned by HAPI on submit (cleared on session loss) |
generation |
integer | Generation counter tracking session regenerations (0 = first session, incremented on 404) |
lastPolled |
Time | LastPolled timestamp of the last poll attempt |
createdAt |
Time | CreatedAt timestamp when the current session was created |
pollCount |
integer | PollCount tracks the number of poll attempts for observability Constant 15s poll interval (configurable 1s–5m) |
LineageContext¶
LineageContext tracks parent resource references for audit correlation .
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
remediationRequest |
string | RemediationRequest is the name of the parent RemediationRequest. |
aiAnalysis |
string | AIAnalysis is the name of the parent AIAnalysis. |
NotificationContext¶
NotificationContext provides structured context for a notification, replacing the former unstructured Metadata map[string]string.
Appears in: - NotificationRequestSpec
| Field | Type | Description |
|---|---|---|
lineage |
LineageContext | Lineage tracks parent resource references for audit correlation. |
workflow |
WorkflowContext | Workflow captures selected workflow details (approval/completion notifications). |
analysis |
AnalysisContext | Analysis captures AI analysis results (approval/completion notifications). |
review |
ReviewContext | Review captures manual review context (manual-review notifications). |
execution |
ExecutionContext | Execution captures execution and retry context (manual-review WE source, timeout notifications). |
dedup |
DedupContext | Dedup captures deduplication context (bulk duplicate notifications). |
target |
TargetContext | Target captures target resource context (timeout notifications). |
verification |
VerificationContext | Verification captures EA verification results (completion notifications, #318). Enables routing rules to match on verification outcome (e.g., inconclusive -> escalation). |
NotificationPhase¶
Underlying type: string
Appears in: - NotificationRequestStatus
Validation: - Enum: [Pending Sending Retrying Sent PartiallySent Failed]
| Value | Description |
|---|---|
Pending |
|
Sending |
|
Retrying |
|
Sent |
|
PartiallySent |
|
Failed |
NotificationPriority¶
Underlying type: string
Appears in: - NotificationRequestSpec
Validation: - Enum: [critical high medium low]
| Value | Description |
|---|---|
critical |
|
high |
|
medium |
|
low |
NotificationRequest¶
NotificationRequest is the Schema for the notificationrequests API
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | NotificationRequest |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
NotificationRequestSpec | |
status |
NotificationRequestStatus |
NotificationRequestSpec¶
NotificationRequestSpec defines the desired state of NotificationRequest
Spec Immutability ALL spec fields are immutable after CRD creation. Users cannot update notification content once created. To change a notification, delete and recreate the CRD.
Rationale: Notifications are immutable events, not mutable resources. This prevents race conditions, simplifies controller logic, and provides perfect audit trail.
Cancellation: Delete the NotificationRequest CRD to cancel delivery.
Appears in: - NotificationRequest
| Field | Type | Description |
|---|---|---|
remediationRequestRef |
ObjectReference | Reference to parent RemediationRequest (if applicable) Used for audit correlation and lineage tracking Optional: NotificationRequest can be standalone (e.g., system-generated alerts) |
type |
NotificationType | Type of notification (escalation, simple, status-update) |
priority |
NotificationPriority | Priority of notification (critical, high, medium, low) |
subject |
string | Subject line for notification |
body |
string | Notification body content |
severity |
string | Severity from the originating signal (used for routing) promoted from mutable label to immutable spec field |
phase |
string | Phase that triggered this notification (for phase-timeout notifications) promoted from mutable label to immutable spec field |
reviewSource |
ReviewSourceType | ReviewSource indicates what triggered manual review (for manual-review notifications) promoted from mutable label to immutable spec field |
context |
NotificationContext | Context provides typed, structured notification context replacing the former unstructured Metadata map. Each sub-struct is optional (nil means not applicable for this notification type). |
extensions |
object (keys:string, values:string) | Extensions holds arbitrary key-value pairs for routing and custom data that don't fit the typed Context schema (e.g., test routing overrides, vendor-specific tags). Routing rules can match on these keys. |
actionLinks |
ActionLink array | Action links to external services |
retryPolicy |
RetryPolicy | Retry policy for delivery |
retentionDays |
integer | Retention period in days after completion |
NotificationRequestStatus¶
NotificationRequestStatus defines the observed state of NotificationRequest
Appears in: - NotificationRequest
| Field | Type | Description |
|---|---|---|
phase |
NotificationPhase | Phase of notification lifecycle (Pending, Sending, Sent, PartiallySent, Failed) |
conditions |
Condition array | Conditions represent the latest available observations of the notification's state |
deliveryAttempts |
DeliveryAttempt array | List of all delivery attempts across all channels |
totalAttempts |
integer | Total number of delivery attempts across all channels |
successfulDeliveries |
integer | Number of successful deliveries |
failedDeliveries |
integer | Number of failed deliveries |
queuedAt |
Time | Time when notification was queued for processing |
processingStartedAt |
Time | Time when processing started |
completionTime |
Time | Time when all deliveries completed (success or failure) |
observedGeneration |
integer | Observed generation from spec |
reason |
NotificationStatusReason | Reason for current phase |
message |
string | Human-readable message about current state |
NotificationStatusReason¶
Underlying type: string
Appears in: - NotificationRequestStatus
| Value | Description |
|---|---|
AllDeliveriesSucceeded |
|
PartialDeliverySuccess |
|
AllDeliveriesFailed |
|
NoChannelsResolved |
|
PartialFailureRetrying |
|
MaxRetriesExhausted |
NotificationType¶
Underlying type: string
Appears in: - NotificationRequestSpec
Validation: - Enum: [escalation simple status-update approval manual-review completion]
| Value | Description |
|---|---|
escalation |
|
simple |
|
status-update |
|
approval |
NotificationTypeApproval is used for approval request notifications Added Dec 2025 per RO team request for explicit approval workflow support |
manual-review |
NotificationTypeManualReview is used for manual intervention required notifications Added Dec 2025 for ExhaustedRetries/PreviousExecutionFailed scenarios requiring operator action Distinct from 'escalation' to enable spec-field-based routing rules |
completion |
NotificationTypeCompletion is used for successful remediation completion notifications Created when WorkflowExecution completes successfully and RR transitions to Completed phase Enables operators to track successful autonomous remediations |
ObjectRef¶
ObjectRef is a lightweight reference to another object in the same namespace
Appears in: - RemediationApprovalRequestSpec
| Field | Type | Description |
|---|---|---|
name |
string | Name of the referenced object |
ObjectReference¶
ObjectReference contains enough information to let you locate the referenced object.
Appears in: - SignalProcessingSpec
| Field | Type | Description |
|---|---|---|
apiVersion |
string | API version of the referent |
kind |
string | Kind of the referent |
name |
string | Name of the referent |
namespace |
string | Namespace of the referent |
uid |
string | UID of the referent |
PolicyDecision¶
Underlying type: string
PolicyDecision represents the Rego policy evaluation outcome.
Appears in: - PolicyEvaluation
Validation: - Enum: [Approved ManualReviewRequired Denied DegradedMode]
| Value | Description |
|---|---|
Approved |
|
ManualReviewRequired |
|
Denied |
|
DegradedMode |
PolicyEvaluation¶
PolicyEvaluation contains Rego policy evaluation results
Appears in: - ApprovalContext
| Field | Type | Description |
|---|---|---|
policyName |
string | Policy name that was evaluated |
matchedRules |
string array | Rules that matched |
decision |
PolicyDecision | Decision from policy evaluation |
PostRCAContext¶
PostRCAContext holds data computed by HAPI after the RCA phase. DetectedLabels are computed at runtime by HAPI's LabelDetector and returned in the HAPI response for storage in the AIAnalysis status. This data is used by Rego policies for approval gating (e.g., stateful workload detection) and is immutable once set.
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
detectedLabels |
DetectedLabels | DetectedLabels contains cluster characteristics computed by HAPI's LabelDetector during get_namespaced_resource_context or get_cluster_resource_context tool invocations. |
setAt |
Time | SetAt records when the PostRCAContext was populated. Used as the immutability guard: once SetAt is non-nil, the entire PostRCAContext becomes immutable via CEL validation. |
Priority¶
Underlying type: string
Priority represents an operational priority level.
Appears in: - PriorityAssignment
Validation: - Enum: [P0 P1 P2 P3]
| Value | Description |
|---|---|
P0 |
|
P1 |
|
P2 |
|
P3 |
PriorityAssignment¶
PriorityAssignment from .
Appears in: - SignalProcessingStatus
| Field | Type | Description |
|---|---|---|
priority |
Priority | |
source |
string | Source of assignment: rego-policy, severity-fallback, default |
policyName |
string | Which Rego rule matched (if applicable) |
assignedAt |
Time | When assignment was performed |
RecommendedAction¶
RecommendedAction describes a remediation action with rationale
Appears in: - ApprovalContext
| Field | Type | Description |
|---|---|---|
workflowId |
string | WorkflowId is the catalog workflow identifier for this recommendation |
rationale |
string | Rationale explaining why this action is recommended |
RecommendedWorkflowSummary¶
RecommendedWorkflowSummary contains a summary of the recommended workflow
Appears in: - RemediationApprovalRequestSpec
| Field | Type | Description |
|---|---|---|
workflowId |
string | Workflow identifier from catalog |
version |
string | Workflow version |
executionBundle |
string | Execution bundle OCI reference (digest-pinned) |
rationale |
string | Rationale for selecting this workflow |
RemediationApprovalRequest¶
RemediationApprovalRequest is the Schema for the remediationapprovalrequests API.
RemediationApprovalRequest CRD Architecture - Follows Kubernetes CertificateSigningRequest pattern (immutable spec, mutable status) - Owned by RemediationRequest - AIAnalysis controller uses field index on spec.aiAnalysisRef.name for efficient lookup - Timeout expiration handled by dedicated controller
Lifecycle: 1. RO creates when AIAnalysis.status.approvalRequired=true 2. Operator approves/rejects via status.conditions update 3. Dedicated controller detects decision or timeout 4. AIAnalysis controller watches and transitions phase accordingly
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | RemediationApprovalRequest |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
RemediationApprovalRequestSpec | |
status |
RemediationApprovalRequestStatus |
RemediationApprovalRequestSpec¶
RemediationApprovalRequestSpec defines the desired state of RemediationApprovalRequest.
Spec Immutability ALL spec fields are immutable after CRD creation (follows CertificateSigningRequest pattern). This provides a complete audit trail and prevents race conditions.
Appears in: - RemediationApprovalRequest
| Field | Type | Description |
|---|---|---|
remediationRequestRef |
ObjectReference | Reference to parent RemediationRequest CRD (owner) RemediationRequest owns this CRD via ownerReferences |
aiAnalysisRef |
ObjectRef | Reference to the AIAnalysis that requires approval Used by AIAnalysis controller for efficient field-indexed lookup |
confidence |
float | Confidence score from AI analysis (0.0-1.0) Typically 0.6-0.79 triggers approval (below auto-approve threshold) |
confidenceLevel |
string | Confidence level derived from score |
reason |
string | Reason why approval is required |
recommendedWorkflow |
RecommendedWorkflowSummary | Recommended workflow from AI analysis |
investigationSummary |
string | Investigation summary from HolmesGPT |
evidenceCollected |
string array | Evidence collected during investigation |
recommendedActions |
ApprovalRecommendedAction array | Recommended actions with rationale |
alternativesConsidered |
ApprovalAlternative array | Alternative approaches considered |
whyApprovalRequired |
string | Detailed explanation of why approval is required |
policyEvaluation |
ApprovalPolicyEvaluation | Policy evaluation results if Rego policy triggered approval |
requiredBy |
Time | Deadline for approval decision (approval expires after this time) Calculated by RO using hierarchy: per-request → policy → namespace → default (15m) |
RemediationApprovalRequestStatus¶
RemediationApprovalRequestStatus defines the observed state of RemediationApprovalRequest.
Appears in: - RemediationApprovalRequest
| Field | Type | Description |
|---|---|---|
decision |
ApprovalDecision | Decision made by operator or system (timeout) Empty string indicates pending decision |
decidedBy |
string | Who made the decision (username or "system" for timeout) |
decidedAt |
Time | When the decision was made |
decisionMessage |
string | Optional message from the decision maker |
conditions |
Condition array | Conditions represent the latest available observations Standard condition types: - "Approved" - Decision is Approved - "Rejected" - Decision is Rejected - "Expired" - Decision timed out |
createdAt |
Time | Time when the approval request was created |
timeRemaining |
string | Time remaining until expiration (human-readable, e.g., "5m30s") Updated by controller periodically |
expired |
boolean | True if the approval request has expired |
observedGeneration |
integer | ObservedGeneration is the most recent generation observed |
reason |
string | Reason for current state (machine-readable) |
message |
string | Human-readable message about current state |
RemediationPhase¶
Underlying type: string
RemediationPhase represents the orchestration phase of a RemediationRequest. These constants are exported for external consumers (e.g., Gateway) to enable type-safe cross-service integration .
Capitalized phase values per Kubernetes API conventions.
Appears in: - RemediationRequestStatus
Validation: - Enum: [Pending Processing Analyzing AwaitingApproval Executing Verifying Blocked Completed Failed TimedOut Skipped Cancelled]
| Value | Description |
|---|---|
Pending |
PhasePending is the initial state when RemediationRequest is created. |
Processing |
PhaseProcessing indicates SignalProcessing is enriching the signal. |
Analyzing |
PhaseAnalyzing indicates AIAnalysis is determining remediation workflow. |
AwaitingApproval |
PhaseAwaitingApproval indicates human approval is required. |
Executing |
PhaseExecuting indicates WorkflowExecution is running remediation. |
Verifying |
PhaseVerifying indicates remediation succeeded and EffectivenessAssessment is running. Non-terminal: Gateway deduplicates signals while EA assesses remediation effectiveness. RO transitions to Completed when EA reaches a terminal state or VerificationDeadline expires. |
Blocked |
PhaseBlocked indicates remediation cannot proceed due to external blocking condition. This is a NON-terminal phase (Gateway deduplicates, prevents RR flood). V1.0: Unified blocking for 6 scenarios: - ConsecutiveFailures: After cooldown → Failed - ResourceBusy: When resource available → Proceeds to execute - RecentlyRemediated: After cooldown → Proceeds to execute - ExponentialBackoff: After backoff window → Retries execution - DuplicateInProgress: When original completes → Inherits outcome - UnmanagedResource: Retries until scope label added or RR times out |
Completed |
PhaseCompleted is the terminal success state. |
Failed |
PhaseFailed is the terminal failure state. |
TimedOut |
PhaseTimedOut is the terminal timeout state. |
Skipped |
PhaseSkipped is the terminal state when remediation was not needed. |
Cancelled |
PhaseCancelled is the terminal state when remediation was manually cancelled. Gateway treats this as terminal (allows new RR creation for retry) |
RemediationRequest¶
RemediationRequest is the Schema for the remediationrequests API. Printer columns for operational triage
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | RemediationRequest |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
RemediationRequestSpec | |
status |
RemediationRequestStatus |
RemediationRequestSpec¶
RemediationRequestSpec defines the desired state of RemediationRequest.
Spec Immutability RemediationRequest represents an immutable event (signal received, remediation required). Once created (by Gateway or external source), spec cannot be modified to ensure: - Audit trail integrity (remediation matches original signal) - No signal metadata tampering during remediation lifecycle - Consistent signal data across all child CRDs (SignalProcessing, AIAnalysis, WorkflowExecution)
Cancellation: Delete the RemediationRequest CRD (Kubernetes-native pattern). Status updates: Controllers update .status fields (not affected by spec immutability).
Note: Individual field immutability (e.g., signalFingerprint) is redundant with full spec immutability, but retained for explicit documentation of critical fields.
Appears in: - RemediationRequest
| Field | Type | Description |
|---|---|---|
signalFingerprint |
string | Core Signal Identification Unique fingerprint for deduplication (SHA256 of alert/event key fields) This field is immutable and used for querying all occurrences of the same problem |
signalName |
string | Human-readable signal name (e.g., "HighMemoryUsage", "CrashLoopBackOff") |
severity |
string | Signal Classification Severity level (external value from signal provider) Examples: "Sev1", "P0", "critical", "HIGH", "warning" SignalProcessing will normalize via Rego policy |
signalType |
string | Signal type: "alert" (generic signal type; adapter-specific values are deprecated) Used for signal-aware remediation strategies |
signalSource |
string | Adapter that ingested the signal (e.g., "prometheus-adapter", "k8s-event-adapter") |
targetType |
string | Target system type: "kubernetes", "aws", "azure", "gcp", "datadog" Indicates which infrastructure system the signal targets |
targetResource |
ResourceIdentifier | TargetResource identifies the Kubernetes resource that triggered this signal. Populated by Gateway from NormalizedSignal.Resource - REQUIRED. Used by SignalProcessing for context enrichment and RO for workflow routing. For Kubernetes signals, this contains Kind, Name, Namespace of the affected resource. |
firingTime |
Time | Temporal Data When the signal first started firing (from upstream source) |
receivedTime |
Time | When Gateway received the signal |
signalLabels |
object (keys:string, values:string) | Signal labels and annotations extracted from provider-specific data These are populated by Gateway Service after parsing providerData |
signalAnnotations |
object (keys:string, values:string) | |
providerData |
string | Provider-specific fields in raw JSON format Gateway adapter populates this based on signal source Controllers parse this based on targetType/signalType For Kubernetes (targetType="kubernetes"): {"namespace": "...", "resource": {"kind": "...", "name": "..."}, "alertmanagerURL": "...", ...} For AWS (targetType="aws"): {"region": "...", "accountId": "...", "instanceId": "...", "resourceType": "...", ...} For Datadog (targetType="datadog"): {"monitorId": 123, "host": "...", "tags": [...], "metricQuery": "...", ...} |
originalPayload |
string | Complete original webhook payload for debugging and audit stored as string to avoid base64 encoding in CEL validation |
RemediationRequestStatus¶
RemediationRequestStatus defines the observed state of RemediationRequest.
Appears in: - RemediationRequest
| Field | Type | Description |
|---|---|---|
deduplication |
DeduplicationStatus | Deduplication tracks signal occurrence for this remediation. OWNER: Gateway Service (exclusive write access) |
observedGeneration |
integer | ObservedGeneration is the most recent generation observed by the controller. Used to prevent duplicate reconciliations and ensure idempotency. Per Standard pattern for all Kubernetes controllers. |
overallPhase |
RemediationPhase | Phase tracking for orchestration. Uses typed RemediationPhase constants for type safety and cross-service integration. Capitalized phase values per Kubernetes API conventions. |
message |
string | Human-readable message describing current status |
startTime |
Time | Timestamps |
completedAt |
Time | |
processingStartTime |
Time | ProcessingStartTime is when SignalProcessing phase started. Used for per-phase timeout detection (default: 5 minutes). |
analyzingStartTime |
Time | AnalyzingStartTime is when AIAnalysis phase started. Used for per-phase timeout detection (default: 10 minutes). |
executingStartTime |
Time | ExecutingStartTime is when WorkflowExecution phase started. Used for per-phase timeout detection (default: 30 minutes). |
verificationDeadline |
Time | VerificationDeadline is the deadline for the Verifying phase. Computed by RO as EA.Status.ValidityDeadline + 30s buffer. If exceeded, RR transitions to Completed with Outcome "VerificationTimedOut". |
signalProcessingRef |
ObjectReference | References to downstream CRDs |
remediationProcessingRef |
ObjectReference | |
aiAnalysisRef |
ObjectReference | |
workflowExecutionRef |
ObjectReference | |
notificationRequestRefs |
ObjectReference array | NotificationRequestRefs tracks all notification CRDs created for this remediation. Provides audit trail for compliance and instant visibility for debugging. |
effectivenessAssessmentRef |
ObjectReference | EffectivenessAssessmentRef tracks the EffectivenessAssessment CRD created for this remediation. Set by the RO after creating the EA CRD on terminal phase transitions. |
preRemediationSpecHash |
string | PreRemediationSpecHash is the canonical spec hash of the target resource captured by the RO BEFORE launching the remediation workflow. This enables the EM to compare pre vs post-remediation state without querying DataStorage audit events. Set once by the RO during the transition to WorkflowExecution phase; immutable after. |
approvalNotificationSent |
boolean | Approval notification tracking Prevents duplicate notifications when AIAnalysis requires approval |
skipReason |
SkipReason | SkipReason indicates why this remediation was skipped. Only set when OverallPhase = Skipped or Failed. |
skipMessage |
string | SkipMessage provides human-readable details about why remediation was skipped Examples: - "Same workflow executed recently. Cooldown: 3m15s remaining" - "Another workflow is running on target: wfe-abc123" - "Backoff active. Next allowed: 2025-12-15T10:30:00Z" Only set when OverallPhase = "Skipped" or "Failed" |
blockingWorkflowExecution |
string | BlockingWorkflowExecution references the WorkflowExecution causing the block Set for block reasons: ResourceBusy, RecentlyRemediated, ExponentialBackoff Nil for: ConsecutiveFailures, DuplicateInProgress Enables operators to investigate the blocking WFE for troubleshooting |
duplicateOf |
string | DuplicateOf references the parent RemediationRequest that this is a duplicate of V1.0: Set when OverallPhase = "Blocked" with BlockReason = "DuplicateInProgress" Old behavior: Set when OverallPhase = "Skipped" due to resource lock deduplication |
duplicateCount |
integer | DuplicateCount tracks the number of duplicate remediations that were skipped because this RR's workflow was already executing (resource lock) Only populated on parent RRs that have duplicates |
duplicateRefs |
string array | DuplicateRefs lists the names of RemediationRequests that were skipped because they targeted the same resource as this RR Only populated on parent RRs that have duplicates |
blockReason |
BlockReason | BlockReason indicates why this remediation is blocked (non-terminal) Valid values: - "ConsecutiveFailures": Max consecutive failures reached, in cooldown - "ResourceBusy": Another workflow is using the target resource - "RecentlyRemediated": Target recently remediated, cooldown active - "ExponentialBackoff": Pre-execution failures, backoff window active - "DuplicateInProgress": Duplicate of an active remediation Only set when OverallPhase = "Blocked" |
blockMessage |
string | BlockMessage provides human-readable details about why remediation is blocked Examples: - "Another workflow is running on target deployment/my-app: wfe-abc123" - "Recently remediated. Cooldown: 3m15s remaining" - "Backoff active. Next retry: 2025-12-15T10:30:00Z" - "Duplicate of active remediation rr-original-abc123" - "3 consecutive failures. Cooldown expires: 2025-12-15T11:00:00Z" Only set when OverallPhase = "Blocked" |
blockedUntil |
Time | BlockedUntil indicates when blocking expires (time-based blocks) Set for: ConsecutiveFailures, RecentlyRemediated, ExponentialBackoff Nil for: ResourceBusy, DuplicateInProgress (event-based, cleared when condition resolves) After this time passes, RR will retry or transition to Failed (for ConsecutiveFailures) |
nextAllowedExecution |
Time | NextAllowedExecution indicates when this RR can be retried after exponential backoff. Set when RR fails due to pre-execution failures (infrastructure, validation, etc.). Implements progressive delay: 1m, 2m, 4m, 8m, capped at 10m. Formula: min(Base × 2^(failures-1), Max) Nil means no exponential backoff is active. |
consecutiveFailureCount |
integer | ConsecutiveFailureCount tracks how many times this fingerprint has failed consecutively. Updated by RO when RR transitions to Failed phase. Reset to 0 when RR completes successfully. |
failurePhase |
FailurePhase | FailurePhase indicates which orchestration phase failed. Only set when OverallPhase = Failed. |
failureReason |
string | FailureReason provides a human-readable reason for the failure Only set when OverallPhase = "failed" |
requiresManualReview |
boolean | RequiresManualReview indicates that this remediation cannot proceed automatically and requires operator intervention. Set when: - WE skip reason is "ExhaustedRetries" (5+ consecutive pre-execution failures) - WE skip reason is "PreviousExecutionFailed" (execution failure, cluster state unknown) - AIAnalysis WorkflowResolutionFailed with LowConfidence or WorkflowNotFound |
outcome |
string | Outcome indicates the remediation result when completed. Values: - "Remediated": Workflow executed successfully - "NoActionRequired": AIAnalysis determined no action needed (problem self-resolved) - "ManualReviewRequired": Requires operator intervention - "VerificationTimedOut": EA assessment did not complete within deadline |
timeoutPhase |
RemediationPhase | TimeoutPhase indicates which orchestration phase timed out. Only set when OverallPhase = TimedOut. |
timeoutTime |
Time | TimeoutTime records when the timeout occurred Only set when OverallPhase = "timeout" |
retentionExpiryTime |
Time | RetentionExpiryTime indicates when this CRD should be cleaned up (24 hours after completion) |
notificationStatus |
string | NotificationStatus tracks the delivery status of notification(s) for this remediation. Values: "Pending", "InProgress", "Sent", "Failed", "Cancelled" Status Mapping from NotificationRequest.Status.Phase: - NotificationRequest Pending → "Pending" - NotificationRequest Sending → "InProgress" - NotificationRequest Sent → "Sent" - NotificationRequest Failed → "Failed" - NotificationRequest deleted by user → "Cancelled" For bulk notifications , this reflects the status of the consolidated notification. |
conditions |
Condition array | Conditions represent observations of RemediationRequest state. Standard condition types: - "NotificationDelivered": True if notification sent successfully, False if cancelled/failed - Reason "DeliverySucceeded": Notification sent - Reason "UserCancelled": User deleted NotificationRequest before delivery - Reason "DeliveryFailed": NotificationRequest failed to deliver Conditions follow Kubernetes API conventions (KEP-1623). |
timeoutConfig |
TimeoutConfig | TimeoutConfig provides operational timeout overrides for this remediation. OWNER: Remediation Orchestrator (sets defaults on first reconcile) MUTABLE BY: Operators (can adjust mid-remediation via kubectl edit) |
lastModifiedBy |
string | LastModifiedBy tracks the last operator who modified this RR's status. Populated by RemediationRequest mutating webhook. |
lastModifiedAt |
Time | LastModifiedAt tracks when the last status modification occurred. Populated by RemediationRequest mutating webhook. |
currentProcessingRef |
ObjectReference | CurrentProcessingRef references the current SignalProcessing CRD |
selectedWorkflowRef |
WorkflowReference | SelectedWorkflowRef captures the workflow selected by AI for this remediation. Populated from workflowexecution.selection.completed audit event. |
executionRef |
ObjectReference | ExecutionRef references the WorkflowExecution CRD for this remediation. Populated from workflowexecution.execution.started audit event. |
remediationTarget |
ResourceIdentifier | RemediationTarget identifies the Kubernetes resource the LLM determined should be remediated. Populated from AIAnalysis.Status.RootCauseAnalysis.AffectedResource. May differ from Spec.TargetResource (e.g., Deployment vs Pod). |
targetDisplay |
string | TargetDisplay is the Kubernetes-idiomatic Kind/Name of the RCA target (e.g., "Deployment/web-frontend"). Populated when RemediationTarget is set. |
confidence |
string | Confidence is the AI analysis confidence score as a display string (e.g., "0.97"). Populated from AIAnalysis.SelectedWorkflow.Confidence. |
workflowDisplayName |
string | WorkflowDisplayName is the human-readable workflow identifier (e.g., "GitRevertCommit:git-revert-v2"). Populated from AIAnalysis.SelectedWorkflow. |
signalTargetDisplay |
string | SignalTargetDisplay is the Kubernetes-idiomatic Kind/Name of the signal target (e.g., "Pod/web-frontend-cdbdbc4f8-6kn6j"). Populated from Spec.TargetResource. |
RemediationTarget¶
RemediationTarget identifies the Kubernetes resource identified by the LLM as the actual target for remediation. This may differ from the signal's source resource (e.g., the signal comes from a Pod, but the Deployment should be patched).
Appears in: - RootCauseAnalysis
| Field | Type | Description |
|---|---|---|
kind |
string | Kind is the Kubernetes resource kind (e.g., "Deployment", "StatefulSet", "DaemonSet") |
name |
string | Name is the resource name |
namespace |
string | Namespace is the resource namespace. Empty for cluster-scoped resources (e.g., Node, PersistentVolume). |
RemediationWorkflow¶
RemediationWorkflow is the Schema for the remediationworkflows API. Kubernetes-native workflow schema definition.
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | RemediationWorkflow |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
RemediationWorkflowSpec | |
status |
RemediationWorkflowStatus |
RemediationWorkflowDependencies¶
RemediationWorkflowDependencies declares infrastructure resources
Appears in: - RemediationWorkflowSpec
| Field | Type | Description |
|---|---|---|
secrets |
RemediationWorkflowResourceDependency array | |
configMaps |
RemediationWorkflowResourceDependency array |
RemediationWorkflowDescription¶
RemediationWorkflowDescription provides structured information about a workflow
Appears in: - RemediationWorkflowSpec
| Field | Type | Description |
|---|---|---|
what |
string | What describes what this workflow concretely does |
whenToUse |
string | WhenToUse describes conditions under which this workflow is appropriate |
whenNotToUse |
string | WhenNotToUse describes specific exclusion conditions |
preconditions |
string | Preconditions describes conditions that must be verified through investigation |
RemediationWorkflowExecution¶
RemediationWorkflowExecution contains execution engine configuration
Appears in: - RemediationWorkflowSpec
| Field | Type | Description |
|---|---|---|
engine |
string | Engine is the execution engine type |
bundle |
string | Bundle is the execution bundle or container image reference |
bundleDigest |
string | BundleDigest is the digest of the execution bundle |
engineConfig |
JSON | EngineConfig holds engine-specific configuration |
serviceAccountName |
string | ServiceAccountName is the pre-existing ServiceAccount for the execution resource (Job, PipelineRun, or Ansible TokenRequest). Operators pre-create SAs with appropriate RBAC in the execution namespace. If absent, K8s assigns the namespace's default SA (Job/Tekton) or the Ansible executor uses the controller's in-cluster credentials (#500 fallback). |
RemediationWorkflowLabels¶
RemediationWorkflowLabels contains mandatory matching/filtering criteria
Appears in: - RemediationWorkflowSpec
| Field | Type | Description |
|---|---|---|
severity |
string array | Severity is the severity level(s) |
environment |
string array | Environment is the target environment(s) |
component |
string | Component is the Kubernetes resource type |
priority |
string | Priority is the business priority level |
RemediationWorkflowMaintainer¶
RemediationWorkflowMaintainer contains maintainer contact information
Appears in: - RemediationWorkflowSpec
| Field | Type | Description |
|---|---|---|
name |
string | |
email |
string |
RemediationWorkflowParameter¶
RemediationWorkflowParameter defines a workflow input parameter
Appears in: - RemediationWorkflowSpec
| Field | Type | Description |
|---|---|---|
name |
string | |
type |
string | |
required |
boolean | |
description |
string | |
enum |
string array | |
pattern |
string | |
minimum |
float | |
maximum |
float | |
default |
JSON | |
dependsOn |
string array |
RemediationWorkflowResourceDependency¶
RemediationWorkflowResourceDependency identifies a Kubernetes resource by name
Appears in: - RemediationWorkflowDependencies
| Field | Type | Description |
|---|---|---|
name |
string |
RemediationWorkflowSpec¶
RemediationWorkflowSpec defines the desired state of RemediationWorkflow. Maps to the spec content of a workflow-schema.yaml file per . Workflow name is derived from the CRD's metadata.name (not duplicated in spec).
Appears in: - RemediationWorkflow
| Field | Type | Description |
|---|---|---|
version |
string | Version is the semantic version (e.g., "1.0.0") |
description |
RemediationWorkflowDescription | Description is a structured description for LLM and operator consumption |
actionType |
string | ActionType is the action type from the taxonomy (PascalCase). |
labels |
RemediationWorkflowLabels | Labels contains mandatory matching/filtering criteria for discovery |
customLabels |
object (keys:string, values:string) | CustomLabels contains operator-defined key-value labels for additional filtering |
detectedLabels |
JSON | DetectedLabels contains author-declared infrastructure requirements |
execution |
RemediationWorkflowExecution | Execution contains execution engine configuration |
dependencies |
RemediationWorkflowDependencies | Dependencies declares infrastructure resources required by the workflow |
maintainers |
RemediationWorkflowMaintainer array | Maintainers is optional maintainer information |
parameters |
RemediationWorkflowParameter array | Parameters defines the workflow input parameters |
rollbackParameters |
RemediationWorkflowParameter array | RollbackParameters defines parameters needed for rollback |
RemediationWorkflowStatus¶
RemediationWorkflowStatus defines the observed state of RemediationWorkflow
Appears in: - RemediationWorkflow
| Field | Type | Description |
|---|---|---|
workflowId |
string | WorkflowID is the UUID assigned by Data Storage upon registration |
catalogStatus |
CatalogStatus | CatalogStatus reflects the DS catalog lifecycle state. |
registeredBy |
string | RegisteredBy is the identity of the registrant |
registeredAt |
Time | RegisteredAt is the timestamp of initial registration |
previouslyExisted |
boolean | PreviouslyExisted indicates if this workflow was re-registered after deletion |
ResourceIdentifier¶
ResourceIdentifier identifies the target resource for remediation.
Appears in: - SignalData
| Field | Type | Description |
|---|---|---|
kind |
string | Resource kind (e.g., "Pod", "Deployment", "StatefulSet") |
name |
string | Resource name |
namespace |
string | Resource namespace. Empty for cluster-scoped resources (e.g., Node, PersistentVolume). |
RetryPolicy¶
RetryPolicy defines retry behavior for notification delivery
Appears in: - NotificationRequestSpec
| Field | Type | Description |
|---|---|---|
maxAttempts |
integer | Maximum number of delivery attempts |
initialBackoffSeconds |
integer | Initial backoff duration in seconds |
backoffMultiplier |
integer | Backoff multiplier (exponential backoff) |
maxBackoffSeconds |
integer | Maximum backoff duration in seconds |
ReviewContext¶
ReviewContext captures manual review details .
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
reason |
string | Reason is the high-level failure reason (e.g., "WorkflowResolutionFailed"). |
subReason |
string | SubReason provides granular detail (e.g., "WorkflowNotFound"). |
humanReviewReason |
string | HumanReviewReason from HAPI when needs_human_review=true . |
rootCauseAnalysis |
string | RootCauseAnalysis from AIAnalysis if available. |
ReviewSourceType¶
Underlying type: string
Appears in: - NotificationRequestSpec
Validation: - Enum: [AIAnalysis WorkflowExecution]
| Value | Description |
|---|---|
AIAnalysis |
|
WorkflowExecution |
RootCauseAnalysis¶
RootCauseAnalysis contains detailed RCA results
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
summary |
string | Brief summary of root cause |
severity |
string | Severity determined by RCA Aligned with HAPI/workflow catalog (critical, high, medium, low, unknown) |
signalType |
string | Signal type determined by RCA (may differ from input) |
contributingFactors |
string array | Contributing factors |
remediationTarget |
RemediationTarget | RemediationTarget identifies the actual resource the LLM determined should be remediated. The LLM may identify a higher-level resource (e.g., Deployment) rather than the Pod that generated the signal. The WFE creator should prefer this over the RR's TargetResource when available to ensure the correct resource is patched. |
SelectedWorkflow¶
SelectedWorkflow contains the AI-selected workflow for execution Output format for RO to create WorkflowExecution
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
workflowId |
string | Workflow identifier (catalog lookup key) |
actionType |
string | Action type from taxonomy (e.g., ScaleReplicas, RestartPod). Propagated from HAPI three-step discovery protocol to RO audit events. |
version |
string | Workflow version |
executionBundle |
string | Execution bundle OCI reference (digest-pinned) - resolved by HolmesGPT-API |
executionBundleDigest |
string | Execution bundle digest for audit trail |
confidence |
float | Confidence score (0.0-1.0) |
parameters |
object (keys:string, values:string) | Workflow parameters (UPPER_SNAKE_CASE keys) |
rationale |
string | Rationale explaining why this workflow was selected |
executionEngine |
string | ExecutionEngine specifies the backend engine for workflow execution. Populated from HolmesGPT-API workflow recommendation. When empty, defaults to "tekton" for backwards compatibility. |
engineConfig |
JSON | EngineConfig holds engine-specific configuration . For ansible: {"playbookPath": "...", "jobTemplateName": "...", "inventoryName": "..."}. |
serviceAccountName |
string | ServiceAccountName is the pre-existing ServiceAccount for the execution resource (Job, PipelineRun, or Ansible TokenRequest). Operators pre-create SAs with appropriate RBAC in the execution namespace. If absent, K8s assigns the namespace's default SA (Job/Tekton) or the Ansible executor uses the controller's in-cluster credentials (#500 fallback). |
SignalContextInput¶
SignalContextInput contains enriched signal context from SignalProcessing Structured types replace map[string]string anti-pattern
Appears in: - AnalysisRequest
| Field | Type | Description |
|---|---|---|
fingerprint |
string | Signal fingerprint for correlation |
severity |
string | Signal severity: critical, high, medium, low, unknown (normalized by SignalProcessing Rego - ) |
signalName |
string | Signal name (e.g., OOMKilled, CrashLoopBackOff) Normalized by SignalProcessing: proactive names mapped to base names |
signalMode |
string | SignalMode indicates whether this is a reactive or proactive signal. Proactive Signal Mode Prompt Strategy Copied from SignalProcessing status by RemediationOrchestrator. Used by HAPI to switch investigation prompt (RCA vs. predict & prevent). |
environment |
string | Environment classification Examples: "production", "staging", "development", "qa-eu", "canary" |
businessPriority |
string | Business priority Best practice examples: P0 (critical), P1 (high), P2 (normal), P3 (low) |
targetResource |
TargetResource | Target resource identification |
enrichmentResults |
EnrichmentResults | Complete enrichment results from SignalProcessing |
SignalData¶
SignalData contains all signal information copied from RemediationRequest. This makes SignalProcessing self-contained for processing.
Appears in: - SignalProcessingSpec
| Field | Type | Description |
|---|---|---|
fingerprint |
string | Unique fingerprint for deduplication (SHA256 of signal key fields) |
name |
string | Human-readable signal name (e.g., "HighMemoryUsage", "CrashLoopBackOff") |
severity |
string | Severity level (external/raw value from monitoring system) No enum restriction - allows external severity schemes (Sev1-4, P0-P4, etc.) Normalized severity is stored in Status.Severity |
type |
string | Signal type: "alert" (generic signal type; adapter-specific values like "prometheus-alert" or "kubernetes-event" are deprecated) |
source |
string | Adapter that ingested the signal |
targetType |
string | Target system type. V2.0 PLACEHOLDER: Currently only "kubernetes" is supported by the enricher. Non-kubernetes values are accepted by validation but enrichment will run in degraded mode. |
targetResource |
ResourceIdentifier | Target resource identification |
labels |
object (keys:string, values:string) | Signal labels extracted from provider-specific data |
annotations |
object (keys:string, values:string) | Signal annotations extracted from provider-specific data |
firingTime |
Time | When the signal first started firing |
receivedTime |
Time | When Gateway received the signal |
providerData |
string | Provider-specific fields in raw JSON format |
SignalProcessing¶
SignalProcessing is the Schema for the signalprocessings API.
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | SignalProcessing |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
SignalProcessingSpec | |
status |
SignalProcessingStatus |
SignalProcessingPhase¶
Underlying type: string
SignalProcessingPhase represents the current phase of SignalProcessing reconciliation. Phase State Machine Capitalized phase values per Kubernetes API conventions
Appears in: - SignalProcessingStatus
Validation: - Enum: [Pending Enriching Classifying Categorizing Completed Failed]
| Value | Description |
|---|---|
Pending |
PhasePending is the initial state when SignalProcessing is created. |
Enriching |
PhaseEnriching is when K8s context enrichment is in progress. |
Classifying |
PhaseClassifying is when environment/priority classification is in progress. |
Categorizing |
PhaseCategorizing is when business categorization is in progress. |
Completed |
PhaseCompleted is the terminal success state. |
Failed |
PhaseFailed is the terminal error state. |
SignalProcessingSpec¶
SignalProcessingSpec defines the desired state of SignalProcessing.
Spec Immutability SignalProcessing represents an immutable event (signal enrichment). Once created by RemediationOrchestrator, spec cannot be modified to ensure: - Audit trail integrity (processed signal matches original signal) - No signal data tampering during enrichment - Consistent context passed to AIAnalysis
To reprocess a signal, delete and recreate the SignalProcessing CRD.
Appears in: - SignalProcessing
| Field | Type | Description |
|---|---|---|
remediationRequestRef |
ObjectReference | Reference to parent RemediationRequest |
signal |
SignalData | Signal data (copied from RemediationRequest for processing) |
enrichmentConfig |
EnrichmentConfig | Configuration for processing |
SignalProcessingStatus¶
SignalProcessingStatus defines the observed state of SignalProcessing.
Appears in: - SignalProcessing
| Field | Type | Description |
|---|---|---|
observedGeneration |
integer | ObservedGeneration is the most recent generation observed by the controller. Used to prevent duplicate reconciliations and ensure idempotency. Per Standard pattern for all Kubernetes controllers. |
phase |
SignalProcessingPhase | Phase: Pending, Enriching, Classifying, Categorizing, Completed, Failed |
startTime |
Time | Processing timestamps |
completionTime |
Time | |
kubernetesContext |
KubernetesContext | Enrichment results |
environmentClassification |
EnvironmentClassification | Categorization results |
priorityAssignment |
PriorityAssignment | |
businessClassification |
BusinessClassification | |
severity |
string | Severity determination Normalized severity determined by Rego policy: "critical", "high", "medium", "low", or "unknown" Aligned with HAPI/workflow catalog severity levels for consistency across platform Enables downstream services (AIAnalysis, RemediationOrchestrator, Notification) to interpret alert urgency without understanding external severity schemes. |
policyHash |
string | PolicyHash is the SHA256 hash of the Rego policy used for severity determination Provides audit trail and policy version tracking for compliance requirements Expected format: 64-character hexadecimal string (SHA256 hash) |
signalMode |
string | SignalMode indicates whether this is a reactive or proactive signal. Proactive Signal Mode Classification Proactive Signal Mode Classification and Prompt Strategy Set during the Classifying phase alongside severity, environment, and priority. All signals MUST be classified — "reactive" is the default for unmapped types. |
signalName |
string | SignalName is the normalized signal name after proactive-to-base mapping. Signal Name Normalization For proactive signals (e.g., "PredictedOOMKill"), this is the base name (e.g., "OOMKilled"). For reactive signals, this matches Spec.Signal.Name unchanged. This is the AUTHORITATIVE signal name for all downstream consumers (RO, AA, HAPI). |
sourceSignalName |
string | SourceSignalName preserves the pre-normalization signal name for audit trail. Audit trail preservation (SOC2 CC7.4) Only populated for proactive signals (e.g., "PredictedOOMKill"). Empty for reactive signals. |
conditions |
Condition array | Conditions for detailed status |
error |
string | Error information |
consecutiveFailures |
integer | ConsecutiveFailures tracks the number of consecutive transient failures. Used with shared backoff for exponential retry delays . Reset to 0 on successful phase transition. |
lastFailureTime |
Time | LastFailureTime records when the last failure occurred. Used to determine if enough time has passed for retry. |
SkipReason¶
Underlying type: string
SkipReason represents the reason why a RemediationRequest was skipped.
Appears in: - RemediationRequestStatus
Validation: - Enum: [RecentlyRemediated ResourceBusy ExhaustedRetries PreviousExecutionFailed]
| Value | Description |
|---|---|
RecentlyRemediated |
|
ResourceBusy |
|
ExhaustedRetries |
|
PreviousExecutionFailed |
TargetContext¶
TargetContext captures target resource context.
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
targetResource |
string | TargetResource in "Kind/Name" format. |
TargetResource¶
TargetResource identifies a Kubernetes resource by kind, name, and namespace.
Appears in: - EffectivenessAssessmentSpec
| Field | Type | Description |
|---|---|---|
kind |
string | Kind is the Kubernetes resource kind (e.g., "Deployment", "StatefulSet"). |
name |
string | Name is the resource name. |
namespace |
string | Namespace is the resource namespace. Empty for cluster-scoped resources (e.g., Node, PersistentVolume). |
TimeoutConfig¶
TimeoutConfig provides fine-grained timeout configuration for remediations. Supports both global workflow timeout and per-phase timeouts for granular control.
Appears in: - RemediationRequestStatus
| Field | Type | Description |
|---|---|---|
global |
Duration | Global timeout for entire remediation workflow. Overrides controller-level default (1 hour). |
processing |
Duration | Processing phase timeout (SignalProcessing enrichment). Overrides controller-level default (5 minutes). |
analyzing |
Duration | Analyzing phase timeout (AIAnalysis investigation). Overrides controller-level default (10 minutes). |
executing |
Duration | Executing phase timeout (WorkflowExecution remediation). Overrides controller-level default (30 minutes). |
ValidationAttempt¶
ValidationAttempt contains details of a single HAPI validation attempt Per HAPI retries up to 3 times with LLM self-correction Each attempt feeds validation errors back to the LLM for correction
Appears in: - AIAnalysisStatus
| Field | Type | Description |
|---|---|---|
attempt |
integer | Attempt number (1, 2, or 3) |
workflowId |
string | WorkflowID that the LLM tried in this attempt |
isValid |
boolean | Whether validation passed (always false for failed attempts in history) |
errors |
string array | Validation errors encountered |
timestamp |
Time | When this attempt occurred |
VerificationContext¶
VerificationContext captures EA verification results for completion notifications . Enables programmatic routing (e.g., inconclusive outcomes -> escalation channel).
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
assessed |
boolean | Assessed indicates whether verification was performed at all. |
outcome |
string | Outcome is the high-level result: "passed", "completed", "partial", "inconclusive", "unavailable". "completed" indicates all components were assessed but some scores < 1.0 . |
reason |
string | Reason maps to EffectivenessAssessment.Status.AssessmentReason. |
summary |
string | Summary is the operator-facing human-readable message. |
degraded |
boolean | Degraded indicates that the EA was unable to reliably compare pre- and post-remediation state because hash capture failed . Routing rules can match on this to escalate degraded notifications. |
degradedReason |
string | DegradedReason describes why the EA is degraded (e.g., RBAC Forbidden for the target CRD). Empty when Degraded is false. |
WorkflowContext¶
WorkflowContext captures selected workflow details.
Appears in: - NotificationContext
| Field | Type | Description |
|---|---|---|
selectedWorkflow |
string | SelectedWorkflow is the ID of the workflow selected by AI. |
confidence |
string | Confidence is the AI confidence score (as string, e.g. "0.95"). |
workflowId |
string | WorkflowID is the ID of the executed workflow. |
executionEngine |
string | ExecutionEngine is the engine used to execute the workflow. |
WorkflowExecution¶
WorkflowExecution is the Schema for the workflowexecutions API
| Field | Type | Description |
|---|---|---|
apiVersion |
string | kubernaut.ai/v1alpha1 |
kind |
string | WorkflowExecution |
metadata |
ObjectMeta | Refer to the Kubernetes API documentation for fields of metadata. |
spec |
WorkflowExecutionSpec | |
status |
WorkflowExecutionStatus |
WorkflowExecutionSpec¶
WorkflowExecutionSpec defines the desired state of WorkflowExecution
Spec Immutability WorkflowExecution represents an immutable event (workflow execution attempt). Once created by RemediationOrchestrator, spec cannot be modified to ensure: - Audit trail integrity (executed spec matches approved spec) - No parameter tampering after HAPI validation - No target resource changes after routing decisions
To change execution parameters, delete and recreate the WorkflowExecution.
Appears in: - WorkflowExecution
| Field | Type | Description |
|---|---|---|
remediationRequestRef |
ObjectReference | RemediationRequestRef references the parent RemediationRequest CRD |
workflowRef |
WorkflowRef | WorkflowRef contains the workflow catalog reference Resolved from AIAnalysis.Status.SelectedWorkflow by RemediationOrchestrator |
targetResource |
string | TargetResource identifies the K8s resource being remediated Used for resource locking - prevents parallel workflows on same target Format: "namespace/kind/name" for namespaced resources "kind/name" for cluster-scoped resources Example: "payment/deployment/payment-api", "node/worker-node-1" |
parameters |
object (keys:string, values:string) | Parameters from LLM selection Keys are UPPER_SNAKE_CASE for Tekton PipelineRun params |
confidence |
float | Confidence score from LLM (for audit trail) |
rationale |
string | Rationale from LLM (for audit trail) |
serviceAccountName |
string | ServiceAccountName is the pre-existing ServiceAccount for the execution resource (Job, PipelineRun, or Ansible TokenRequest). Operators pre-create SAs with appropriate RBAC in the execution namespace. If absent, K8s assigns the namespace's default SA (Job/Tekton) or the Ansible executor falls back to the controller's in-cluster credentials. |
executionConfig |
ExecutionConfig | ExecutionConfig contains minimal execution settings |
WorkflowExecutionStatus¶
WorkflowExecutionStatus defines the observed state
Appears in: - WorkflowExecution
| Field | Type | Description |
|---|---|---|
observedGeneration |
integer | ObservedGeneration is the most recent generation observed by the controller. Used to prevent duplicate reconciliations and ensure idempotency. Per Standard pattern for all Kubernetes controllers. |
phase |
string | Phase tracks current execution stage V1.0: Skipped phase removed - RO makes routing decisions before WFE creation |
startTime |
Time | StartTime when execution started |
completionTime |
Time | CompletionTime when execution completed (success or failure) |
duration |
Duration | Duration of the execution |
executionRef |
LocalObjectReference | ExecutionRef references the created execution resource (PipelineRun or Job) |
executionStatus |
ExecutionStatusSummary | ExecutionStatus mirrors key execution resource status fields |
failureReason |
string | FailureReason explains why execution failed (if applicable) DEPRECATED: Use FailureDetails for structured failure information |
failureDetails |
FailureDetails | FailureDetails contains structured failure information Populated when Phase=Failed |
blockClearance |
BlockClearanceDetails | BlockClearance tracks the clearing of PreviousExecutionFailed blocks When set, allows new executions despite previous execution failure Preserves audit trail of WHO cleared the block and WHY |
ephemeralCredentialIDs |
integer array | EphemeralCredentialIDs stores AWX credential IDs created by the ansible executor for cleanup after execution . Written via the status subresource to avoid violating spec immutability . |
executionEngine |
string | ExecutionEngine is the backend engine resolved from the DS workflow catalog at runtime by the WE controller. Set once during Pending phase via WorkflowQuerier.GetWorkflowExecutionEngine; immutable thereafter. Values: "tekton", "job", "ansible". |
conditions |
Condition array | Conditions provide detailed status information |
WorkflowRef¶
WorkflowRef contains catalog-resolved workflow reference
Appears in: - WorkflowExecutionSpec
| Field | Type | Description |
|---|---|---|
workflowId |
string | WorkflowID is the catalog lookup key |
version |
string | Version of the workflow |
executionBundle |
string | ExecutionBundle resolved from workflow catalog (Data Storage API) OCI bundle reference for Tekton PipelineRun |
executionBundleDigest |
string | ExecutionBundleDigest for audit trail and reproducibility |
engineConfig |
JSON | EngineConfig holds engine-specific configuration . For ansible: {"playbookPath": "...", "jobTemplateName": "...", "inventoryName": "..."} For tekton/job: nil. |
WorkflowReference¶
WorkflowReference captures workflow catalog information for audit trail. Used in RemediationRequestStatus.SelectedWorkflowRef .
Appears in: - RemediationRequestStatus
| Field | Type | Description |
|---|---|---|
workflowId |
string | WorkflowID is the catalog lookup key |
version |
string | Version of the workflow |
executionBundle |
string | ExecutionBundle resolved from workflow catalog OCI bundle reference for Tekton PipelineRun |
executionBundleDigest |
string | ExecutionBundleDigest for audit trail and reproducibility |