Dual execution framework roadmap

Living document for refactoring Agentic Orchestration so workflow execution is pluggable: CrewAI in-process remains the default, with optional backends for subprocess workers and Kubernetes Jobs.

Status: F0–F5 MVP implemented (in-process default). Subprocess: AGENTIC_SUBPROCESS_WORKERS=1. Kubernetes: AGENTIC_EXECUTION_BACKEND=kubernetes + K8s env vars (cluster e2e manual).

Companion plan: Kubernetes execution upgrade — distributed workers, cluster manifests, MCP sidecars, and K8s-specific ops. This page owns the code seam; the K8s page owns cluster delivery. Future: Agent societies roadmap (K6) for multi-agent societies on top of the execution backends.

Related: Architecture, Dynamic planning, Configuration, Kubernetes execution upgrade


How the two plans relate

Topic This plan (dual framework) Kubernetes execution upgrade
Scope Python refactor: interfaces, backends, main.py wiring Pods, Jobs, PVCs, sidecars, cluster testing
Default path CrewAIExecutionBackend — identical to today Not required for local dev
Shared contracts ExecutionBackend, StepSpec, StepResult Consumes those contracts; defines worker CLI + K8s manifests
Ship value without K8s ✅ Yes (F0–F4 complete) ❌ Cluster phases only
Prerequisite order F0–F4 complete K0.6 + K2.3 worker image, then K3 (F5)
flowchart LR
  subgraph framework [Dual execution framework]
    F0[F0 Contracts]
    F1[F1 CrewAI backend extract]
    F2[F2 Step spec + coordinator]
    F3[F3 main.py wiring]
    F4[F4 Subprocess backend]
  end

  subgraph k8s [Kubernetes upgrade plan]
    K0[K0 Design lock]
    K1[K1 Run store]
    K2[K2 Worker entrypoint]
    K3[K3 K8s backend]
    K4[K4 MCP sidecars]
  end

  F0 --> F1
  F0 --> K0
  F1 --> F2
  F2 --> F3
  F2 --> K1
  F0 -.-> K0
  F2 --> K2
  F3 --> F4
  F4 --> K3
  F2 --> K3
  K2 --> K3
  K3 --> K4

Rule: Finish F0–F3 before implementing KubernetesExecutionBackend (K8s Phase 3). F4 (subprocess) is recommended before K8s but not strictly required.


Goals

Goal Notes
Two orchestration frameworks CrewAI in-process (default) + pluggable remote backends
No regression AGENTIC_EXECUTION_BACKEND=inprocess preserves current behavior
Clear seam Planning, catalogs, sessions, artifacts stay backend-agnostic
CrewAI stays the agent runtime Both backends use CrewAI inside the execution unit (whole crew in-process; mini-Crew per worker remotely)
Incremental First PR can be extract-only with zero behavior change
Shared YAML catalogs Agent, MCP, and workflow YAML formats unchanged — one config surface for all backends

Non-goals (initial phases)

  • Replacing CrewAI as the agent/tool loop inside workers
  • Abstracting AgentProvider away from CrewAI Agent (defer)
  • Third orchestration engines (Argo, Temporal) — interface should allow later, not build now
  • New agent or MCP YAML schemas for subprocess/K8s backends (see Shared configuration)
  • Parallel catalog directories per execution backend

Shared configuration (YAML unchanged)

Principle: All execution backends consume the same YAML catalogs and workflow shapes. The dual-framework refactor changes how steps run, not how agents and MCPs are defined.

What stays identical (no backend-specific forks)

Config Location Used by
Agent providers config/agent_providers/*.yaml, AGENTIC_EXTRA_AGENT_PROVIDERS_PATH Planner, all backends
MCP providers config/mcp_providers/*.yaml, AGENTIC_EXTRA_MCP_PROVIDERS_PATH Planner, all backends
Static workflows config/workflows/*.yaml All backends
Dynamic plan output Ephemeral WorkflowConfig (same fields as workflow YAML) All backends
Catalog loaders catalog_loader.py, mcp_providers_catalog.py, config_loader.py Shared; not duplicated per backend
Provider factory agent_provider_from_dict()AgentProvider In-process and worker pods

Existing YAML fields (id, type, role, goal, backstory, model, stdio / streamable_http MCP blocks, required_env, planner_hint, etc.) remain authoritative. See Agent provider catalog, MCP providers, Workflows and router.

What is new (runtime only — not user-facing config)

Artifact Purpose Replaces YAML?
StepSpec Resolved, per-step payload passed to a worker or in-process executor No — derived at runtime from WorkflowConfig + catalog resolution
StepResult Per-step output written to run store No
WorkflowExecutionResult Final outcome for sessions/artifacts No

StepSpec embeds a copy of the resolved agent provider dict and resolved MCP configs (same structures build_workflow() uses today after filter_entries_by_api_credentials and resolve_workflow_mcp_refs). It is a transport format, not a second catalog format.

Data flow (one YAML path, many backends)

flowchart LR
  YAML[YAML catalogs + workflows]
  Load[config_loader / catalog_loader / mcp_providers_catalog]
  WC[WorkflowConfig]
  Mat[workflow_materializer]
  SS[StepSpec JSON]
  BE[Execution backend]

  YAML --> Load --> WC
  WC --> Mat --> SS
  SS --> BE

Workers and coordinator never read alternate agent YAML layouts. Workers receive StepSpec (or rebuild from the same resolved dicts via shared Python loaders if spec points at catalog ids only — prefer embedded resolved dict in v1 to avoid drift).

Backend-specific behavior without YAML forks

When a backend cannot support an MCP transport (e.g. stdio MCP on K8s without sidecar), handle via runtime policy, not new YAML:

Situation Approach YAML change?
Stdio MCP unavailable in K8s Filter catalog at plan time (AGENTIC_EXECUTION_BACKEND=kubernetes) or env gates already on MCP YAML No
Missing credentials Existing filter_entries_by_api_credentials / filter_mcp_entries_by_api_credentials No
GPU / resource limits for K8s Optional scheduling fields added later must be optional and ignored by in-process backend Only if added; in-process ignores

If scheduling metadata is ever added to agent provider YAML (e.g. min_vram_gb already exists for hardware filtering), it must remain optional so CrewAI in-process and subprocess backends behave as today when unset.

Implementation guardrails

  • G1 No config/agent_providers_k8s/ or duplicate MCP catalog trees.
  • G2 workflow_materializer calls existing loaders/resolvers — no parallel YAML parsers.
  • G3 Worker --execute-step uses build_workflow() + resolved MCP payloads from spec (task_mcp_overrides).
  • G4 Regression tests: same workflow YAML resolves identically for inprocess and subprocess step specs (step_specs_resolution_fingerprint, materializer parity tests).
  • G5 Wiki/catalog docs (Agent provider catalog, MCP providers) stay single source of truth; backends do not get separate catalog pages.
  • G6 New execution-backend behavior covered by Testing and CI markers (backend_inprocess + backend_subprocess shipped; backend_kubernetes deferred to F5).

Two layers (do not conflate)

Layer CrewAI path K8s path
Orchestration One process, crew.kickoff() Coordinator loop, Job per step
Agent runtime CrewAI Agent + MCP Same — mini-Crew in worker pod

The dual-framework refactor is about the orchestration layer. Kubernetes is one orchestration backend, not a replacement for CrewAI as the LLM/MCP engine.


Current architecture (after F0–F3)

WorkflowConfig
  → execution_backend_from_env()     # factory.py
  → CrewAIExecutionBackend           # default: build_workflow() + crew.kickoff()
  → SubprocessExecutionBackend       # optional: StepCoordinator + --execute-step workers
  → post-run                         # artifacts, sessions, learning (unchanged)
Module Role today
orchestration/backends/crewai.py In-process kickoff (extracted from main.py)
orchestration/backends/factory.py AGENTIC_EXECUTION_BACKEND selection
orchestration/workflow_materializer.py WorkflowConfiglist[StepSpec]
orchestration/step_coordinator.py Sequential step loop (subprocess/K8s)
orchestration/run_store.py Filesystem {run_id}/{step_id}/result.json
orchestration/execute_step.py Worker: one step from spec JSON
orchestration/execution_dispatch.py Routes config → distributed execute_config or in-process execute_built (F4)
orchestration/runner.py Still builds Crew + Agents + Tasks for in-process path
main.py execute_workflow_from_config() CLI/static/dynamic entry → execute_workflow_config_resolved

Already framework-agnostic: planner, catalogs, sessions, KB, learning, web UI spawn.

Previous coupling (pre-refactor) ``` WorkflowConfig → build_workflow() # runner.py — builds Crew, Agents, Tasks → run_built_workflow() # main.py — crew.kickoff(), retries, lifecycle → post-run # artifacts, sessions, learning ``` | Module | Problem (resolved in F1–F3) | |--------|----------------------------| | `main.py` `run_built_workflow()` | Kickoff not swappable → now delegates to backend | | `BuiltWorkflow` | CrewAI-shaped at API boundary → internal to `CrewAIExecutionBackend` only |

Target architecture

flowchart TB
  subgraph shared [Shared — all backends]
    Planner[Dynamic planner]
    Config[WorkflowConfig]
    Catalogs[Agent + MCP catalogs]
    Materializer[Workflow materializer / step specs]
    Post[Artifacts / sessions / KB]
  end

  subgraph backends [Execution backends]
    CrewAI[CrewAIExecutionBackend]
    Sub[SubprocessExecutionBackend]
    K8s[KubernetesExecutionBackend]
  end

  Planner --> Config
  Config --> Materializer
  Materializer --> CrewAI
  Materializer --> Sub
  Materializer --> K8s
  CrewAI --> Post
  Sub --> Post
  K8s --> Post

Backend selection

AGENTIC_EXECUTION_BACKEND=inprocess | subprocess | kubernetes

Default: inprocessCrewAIExecutionBackend.

Factory: orchestration/backends/factory.pyexecution_backend_from_env().


Proposed module layout

agentic-orchestration-tool/orchestration/
├── backends/
│   ├── __init__.py
│   ├── base.py              # ExecutionBackend protocol, RunOptions, WorkflowExecutionResult
│   ├── factory.py           # execution_backend_from_env()
│   ├── crewai.py            # CrewAIExecutionBackend (current kickoff path)
│   ├── subprocess.py        # SubprocessExecutionBackend
│   └── kubernetes.py        # KubernetesExecutionBackend (thin; details in K8s plan)
├── execution_dispatch.py    # execute_workflow_config_resolved (F4 CLI routing)
├── step_coordinator.py      # Sequential loop, inject prior output, shared lifecycle hooks
├── workflow_materializer.py # WorkflowConfig → list[StepSpec]; MCP resolution
├── run_store.py             # Abstract store + filesystem impl
├── runner.py                # Slim: build_workflow for CrewAI backend internal use (or moved into crewai.py)
└── ...                      # existing modules unchanged

main.py changes:

options = run_options_from_legacy(...)
result = execute_workflow_config_resolved(config, options=options)
# subprocess + AGENTIC_SUBPROCESS_WORKERS=1 → backend.execute_config (StepCoordinator)
# otherwise → build_workflow + backend.execute_built
# existing post-run unchanged

Core types (contracts)

Shared with Kubernetes execution upgrade — canonical JSON schemas live there; Python types mirror them.

StepSpec

Backend-agnostic, serializable description of one planned step. Built by workflow_materializer.build_step_specs(config).

Not a new config format. Fields are populated from existing WorkflowConfig + shared catalog resolution (same code path as build_workflow()). The agent_provider object in a step spec matches a resolved catalog entry dict; mcp_providers[].resolved matches resolve_workflow_mcp_refs() output.

Fields align with K8s plan step spec JSON schema for worker transport only.

StepResult

One step outcome. Aligns with K8s plan worker result contract.

WorkflowExecutionResult

@dataclass
class WorkflowExecutionResult:
    exit_code: int
    result_text: str | None
    error: BaseException | None
    step_results: list[StepResult] = field(default_factory=list)

Callers (main.py, sessions, artifacts) use this — never BuiltWorkflow.crew.

RunOptions

@dataclass
class RunOptions:
    quiet: bool = False
    emit_stdout_summary: bool = True
    emit_progress_lines: bool = True
    execution_error_sink: list[str] | None = None
    log_terminal_execution_failure: bool = True
    run_id: str = ""  # generated if empty

ExecutionBackend protocol

class ExecutionBackend(Protocol):
    def execute_built(
        self,
        built: BuiltWorkflow,
        *,
        options: RunOptions,
    ) -> WorkflowExecutionResult: ...

    def execute_config(
        self,
        config: WorkflowConfig,
        *,
        options: RunOptions,
    ) -> WorkflowExecutionResult: ...

Optional capability flag:

@property
def supports_distributed_steps(self) -> bool: ...

Backend implementations

CrewAIExecutionBackend (default)

Maps to: current build_workflow() + run_built_workflow().

Aspect Behavior
Execution unit Whole Crew in one process
Step handoff Existing _serial_crew_task_callback + inject
MCP stdio Works as today
HF fallback In-process rebuild + re-kickoff
Provider recovery In-process retry

Phase F1 is extract-only — no behavior change.

SubprocessExecutionBackend

Maps to: K8s plan Phase 2 bridge — no cluster required.

Aspect Behavior  
Execution unit python main.py --execute-step spec.json per step  
Step handoff StepCoordinator + filesystem run store  
MCP stdio Works in worker subprocess  
HF fallback Workflow-level via main._run_dynamic_workflow_with_hf_fallback Per-step via step_recovery + Job/subprocess retry

Proves the worker contract before K8s Phase 3.

KubernetesExecutionBackend

Maps to: K8s plan Phase 3 — implementation details in Kubernetes execution upgrade.

Aspect Behavior
Execution unit K8s Job → worker pod per step (kubernetes_runner.py, mirrors subprocess_runner.py)
Step handoff StepCoordinator + PVC-mounted FileSystemRunStore
MCP stdio Sidecars / HTTP MCPs (K8s plan Phase 4)
HF fallback Workflow-level via main for K3 MVP; per-step Job retry via step_recovery

This backend is a thin adapter over StepCoordinator + K8s client (same pattern as subprocess_runner.py); cluster manifests and worker image live in the K8s plan.


Shared: StepCoordinator

Both distributed backends use the same sequential loop logic (extracted from runner.py callbacks):

class StepCoordinator:
    def run_workflow(
        self,
        steps: list[StepSpec],
        *,
        execute_step: Callable[[StepSpec], StepResult],
        run_store: RunStore,
        run_options: RunOptions,
    ) -> WorkflowExecutionResult: ...

Responsibilities:

  • Iterate task_sequence
  • prepare_step_description(step, prior_output) — port of _inject_previous_output_into_next_task
  • Call AgentProvider.before_task / after_task via provider instances
  • Emit (progress) lines
  • Invoke per-step execution_fallback / provider recovery — deferred (workflow-level retry in main today; see Kubernetes execution upgrade)
  • Write/read run store between steps

CrewAI backend may use StepCoordinator with an in-process execute_step closure, or keep whole-crew kickoff until F2 — see phased roadmap.


What stays CrewAI-specific

Component CrewAI-only? Notes
agent_providers/* Yes (for now) Workers use same code
crewai_mcp_hotfix.py Yes Worker subprocess/pods
BuiltWorkflow.crew Yes Internal to CrewAIExecutionBackend only
Planner, catalogs, sessions No All backends
WorkflowConfig No All backends
StepSpec / StepResult No All backends — runtime transport, derived from YAML
Agent/MCP/workflow YAML files No Unchanged — see Shared configuration

Phased roadmap

Track progress by checking boxes. Phases prefixed F belong to this plan only.

Phase F0 — Contracts (no behavior change)

Standalone: ✅ Yes.

Parallel with: K8s Phase 0 (schema review — pair in one session).

  • F0.1 Add orchestration/backends/base.pyExecutionBackend, RunOptions, WorkflowExecutionResult, StepSpec, StepResult dataclasses.
  • F0.2 Align field names with step spec and result JSON in K8s plan.
  • F0.3 Add execution_backend_from_env() (returns CrewAIExecutionBackend by default; lazy-loads other backends).
  • F0.4 Document AGENTIC_EXECUTION_BACKEND in Configuration and .env.example.

Exit criteria: Types merged; K8s plan schemas and Python types match. ✅


Phase F1 — Extract CrewAIExecutionBackend (pure refactor)

Standalone: ✅ Yes — recommended first implementation PR.

Depends on: F0.

  • F1.1 Move run_built_workflow() kickoff logic → orchestration/backends/crewai.py.
  • F1.2 CrewAIExecutionBackend.execute_built() calls existing build_workflow() + kickoff path unchanged.
  • F1.3 main.py calls backend via factory; behavior identical.
  • F1.4 Regression: static workflows + --dynamic pass with AGENTIC_EXECUTION_BACKEND=inprocess (mocked tests in default CI + live LLM workflow opt-in).

Exit criteria: Zero user-visible change; all execution goes through ExecutionBackend. ✅ (default path)


Phase F2 — Materializer + StepCoordinator

Standalone: ✅ Yes — enables step-based execution for all backends.

Depends on: F1.

Parallel with: K8s Phase 1 (run store).

  • F2.1 Add workflow_materializer.pybuild_step_specs(config) -> list[StepSpec] using existing catalog loaders and MCP resolvers (no new YAML schema).
  • F2.2 Port prior-output inject → prepare_step_description() (orchestration/step_context.py).
  • F2.3 Add step_coordinator.py — sequential loop, progress.
  • F2.4 Add run_store.py — filesystem implementation.
  • F2.5 Decision: keep whole-crew kickoff in CrewAIExecutionBackend until F4; distributed backends use StepCoordinator (see Decision log).
  • F2.6 Unit tests: materializer, inject, coordinator loop (run store round-trip).
  • F2.7 Guardrail G4: same workflow YAML resolves identically for distributed step specs (tests/test_workflow_materializer.py).

Exit criteria: Step specs produced from any WorkflowConfig; coordinator tested in isolation. ✅


Phase F3 — main.py wiring + post-run adapter

Standalone: ✅ Yes.

Depends on: F1 (required); F2 (recommended).

  • F3.1 Single entry: backend.execute_built() / execute_config() via factory.
  • F3.2 HF fallback + provider recovery remain in CrewAIExecutionBackend (not duplicated in main.py).
  • F3.3 output_artifacts.py accepts WorkflowExecutionResult / StepResult (extractable_text_from_*, offer_save_extracted_files_from_execution).
  • F3.4 Session JSON records optional last_execution_backend; dynamic paths pass backend name on update.

Exit criteria: main.py has no direct crew.kickoff() calls; post-run pipeline backend-agnostic. ✅


Phase F4 — SubprocessExecutionBackend

Standalone: ✅ Yes — no K8s.

Depends on: F2, F3.

Parallel with: K8s Phase 2 (worker entrypoint — coordinate on --execute-step CLI).

  • F4.1 Implement SubprocessExecutionBackend using StepCoordinator + run store (subprocess_runner.py).
  • F4.2 Spawn python main.py --execute-step (CLI + execute_step.py worker).
  • F4.3 Integration test: 2-step workflow via subprocess backend locally (tests/test_backend_subprocess.py, mocked worker).
  • F4.4 AGENTIC_EXECUTION_BACKEND=subprocess documented (.env.example, Configuration).
  • F4.5 CLI/static/dynamic paths route through execute_workflow_config_resolved when subprocess workers enabled (execution_dispatch.py).

Note: Subprocess workers are opt-in via AGENTIC_SUBPROCESS_WORKERS=1. Without it, the subprocess backend falls back to in-process CrewAI (safe default). With workers enabled, normal CLI (run_workflow, --dynamic, etc.) uses distributed step execution — not only direct execute_config calls.

Exit criteria: Full dynamic run via subprocess without cluster; proves distributed contract. ✅


Phase F5 — KubernetesExecutionBackend stub → full

Standalone: ❌ No — requires K8s plan Phases 2–3.

Depends on: F4 ✅; K8s plan Phases K2.3 + K3.

  • F5.0 Extend execution_dispatch.py so kubernetes backend routes to execute_config.
  • F5.2 Implement kubernetes_runner.py + kubernetes_jobs.py (Job spawn/wait/read).
  • F5.3 Feature parity matrix vs in-process — kind e2e in CI (scripts/k8s-kind-e2e.sh, stub worker; live LLM optional manual).

Exit criteria: AGENTIC_EXECUTION_BACKEND=kubernetes runs on kind — CI validated (stub worker); live LLM manual on Windows via k8s-kind-up.ps1.


Feature parity matrix (backends)

Track gaps when testing non-default backends.

Capability CrewAI in-process Subprocess Kubernetes
Static workflows ✅ today ✅ (workers opt-in) ✅ kind CI (stub) + manual LLM
--dynamic ✅ today ✅ (workers opt-in) ✅ mocked Jobs + kind CI (stub)
--dynamic-iterative ✅ today TBD TBD
Stdio MCPs K4 sidecars
HF execution fallback ✅ workflow-level ✅ workflow-level ✅ workflow-level (K3 MVP)
Provider recovery ✅ workflow-level ✅ workflow-level ✅ per-step (K3.5)
Web UI progress lines ✅ today K3
Session / KB / learning ✅ today K3

Testing strategy

Phase Tests  
F0 Type/import smoke See Testing and CI
F1 Full regression suite, inprocess only CI unit + import jobs
F2 Unit: materializer, coordinator, inject, run store ✅ shipped
F3 Integration: dynamic + static via factory ✅ post-run adapters + dispatch
F4 Subprocess 2-step e2e backend_subprocess
F5 kind/minikube (with K8s plan) ✅ CI stub e2e

Regression bar: After every phase, AGENTIC_EXECUTION_BACKEND=inprocess must match pre-refactor behavior.


Risks and mitigations

Risk Mitigation
BuiltWorkflow leaks outside CrewAI backend Callers only see WorkflowExecutionResult
Duplicated retry/fallback Shared StepCoordinator + existing execution_fallback.py
F1 refactor breaks web UI spawn Run web integration smoke after F1
Two plans drift Cross-link checkboxes; F0/K0 schema review together
Over-abstraction before K8s Ship F1 extract-only before F2

Decision log

Date Decision Rationale
2025-06-26 Dual plans: this page (framework seam) + Kubernetes execution upgrade (cluster) Separate concerns; framework ships value without K8s
2025-06-26 CrewAI remains agent runtime in all backends Minimize MCP/provider rewrite
2025-06-26 Default backend = CrewAIExecutionBackend / inprocess No regression for dev and existing deploys
2025-06-26 F0–F3 before K8s Phase 3 K8s backend plugs into seam, not a fork of main.py
2025-06-26 YAML catalogs unchanged — shared agent/MCP/workflow format for all backends One config surface; StepSpec is runtime transport only
2025-06-26 F2: whole-crew vs mini-Crew per step in in-process backend Whole-crew until F4CrewAIExecutionBackend unchanged; distributed backends use StepCoordinator
2026-06-26 F0–F4 merged to main branch Default inprocess; subprocess workers behind AGENTIC_SUBPROCESS_WORKERS=1
2026-06-26 K8s plan tightened post-F4 PVC run store; kubernetes_runner mirrors subprocess; K3.0 dispatch gate; per-step retry deferred

Open questions

  1. F2 timing: Switch in-process backend to step-based mini-Crew in F2, or keep whole-crew kickoff until F4? Resolved: whole-crew until F4.
  2. BuiltWorkflow fate: Keep as internal type in crewai.py only, or deprecate name in favor of StepSpec list?
  3. Iterative dynamic mode: Does coordinator replan require backend capability flag or separate code path in main.py?
  4. CLI flag vs env only: Support --execution-backend subprocess for one-off runs?

Suggested work order

Completed: F0 → F1 → F2 → F3 → F4 ✅

Next (K8s / F5): See Kubernetes execution upgrade — K0.6 → K2.3 worker image → K3.0 dispatch → K3.1–3.8 kind test → K4 sidecars.

Valid stopping points: After F4 (distributed without K8s) ✅ current stopping point. After K3 (K8s with HTTP MCPs).


Wiki maintenance