Testing and CI

How to run automated checks locally and on GitHub or GitLab for agentic-orchestration. Designed to grow with Dual execution framework and Kubernetes execution upgrade without requiring API keys or a cluster for the default tier.

Quick

Tier	When	Needs API keys?	Runs in CI?
Unit	Every commit / MR	No	✅ Yes (default)
Import smoke	Every commit / MR	No	✅ Yes
Docker worker smoke	Every commit / MR	No	✅ Yes (K8s Phase 2.3)
kind Kubernetes e2e	Every commit / MR	No	✅ Yes (stub worker, no LLM; includes agent skills spec handoff)
Integration	Manual / with `AGENTIC_KIND_E2E=1` locally	Sometimes	❌ Excluded from default pytest
Live LLM	Local only unless secrets configured	Yes	❌ Never by default
Agent harness L0 (static catalog)	Every commit / MR	No	✅ Yes (`agent-harness-static` job)
Agent harness L1 (connectivity)	Every commit / MR (credentialed subset)	Sometimes	✅ Yes (`agent-harness-connectivity` job)
Agent harness L2+ (smoke / capability)	Nightly / manual	Yes	❌ By default (weekly `agent-harness-smoke-nightly.yml`)
Backend parity (F2.7 / dual framework)	After refactor lands	No for resolution tests	✅ Planned

Default CI command excludes integration, live_llm, and agent_harness markers (see pytest.ini).

Agent harness (local)

cd agentic-orchestration-tool
python main.py --harness-batch --harness-tier static          # L0 full catalog
python main.py --harness-agent gpt_research --harness-tier smoke
pytest -m agent_harness -o addopts="-ra"
powershell -File scripts/run-agent-harness.ps1 -Tier static -Filter "gpt_*"
python scripts/harness-report.py

Local — before you commit

From agentic-orchestration-tool/:

# One-time (or after dependency changes)
pip install -r requirements.txt -r requirements-dev.txt

# Same as CI
pytest

Helpers:

Script	Platform
`scripts/run-tests.sh`	Linux / macOS / Git Bash
`scripts/run-tests.ps1`	Windows PowerShell

Examples:

pytest -m unit                    # explicit unit tier
pytest tests/test_config_loader.py # one file
pytest -m integration           # include integration (when added)
pytest -m "live_llm"            # real LLM calls — local only

GitHub Actions

Workflow: .github/workflows/ci.yml (on push to main/master and on pull requests).

Job	What it does
python-unit	`pip install` + `pytest` (unit tier)
python-smoke-import	Install runtime deps; import `orchestration.*` and `main`
docker-worker-smoke	Build `docker/Dockerfile.worker`; invalid spec → exit 2 (`scripts/docker-worker-smoke.sh`)
kind-kubernetes-e2e	kind cluster + hostPath PVC + stub worker Jobs (`scripts/k8s-kind-e2e.sh`; `tests/test_kind_kubernetes_e2e.py`, including `test_agent_skills_smoke_kind_kubernetes_workflow`)

No repository secrets required for default CI.

Local Docker smoke (Windows / Linux):

# Windows
powershell -File scripts/docker-worker-smoke.ps1

# Linux / macOS / Git Bash
bash scripts/docker-worker-smoke.sh

kind Kubernetes e2e (Linux / Git Bash — same as CI):

cd agentic-orchestration-tool
bash scripts/k8s-kind-e2e.sh

Windows local cluster (real LLM, manual):

.\scripts\k8s-kind-up.ps1
$env:AGENTIC_K8S_RUN_STORE_VOLUME = "hostpath"
.\scripts\k8s-apply-run-store.ps1
.\scripts\docker-worker-smoke.ps1   # builds agentic-orchestrator-worker:local
# load into kind: .\.tools\kind.exe load docker-image agentic-orchestrator-worker:local --name agentic
python main.py config/workflows/workflow_brainstorm.yaml --quiet

GitLab CI

Pipeline: .gitlab-ci.yml at the monorepo root (runs on merge requests and on pushes to the default branch, main, or master).

Job	What it does
unit-tests	`pip install -r requirements-test.txt` + `pytest`
import-smoke	Install full `requirements.txt`; import `orchestration.*` and `main`
docker-worker-smoke	DinD: build worker image + smoke script (same as GitHub job)
kind-kubernetes-e2e	kind + hostPath PVC + stub worker (`scripts/k8s-kind-e2e.sh`)

Enable: push .gitlab-ci.yml to your GitLab project — GitLab picks it up automatically when CI/CD is enabled (Settings → General → Visibility and Build → Pipelines).

Runners: uses shared GitLab.com runners (python:3.12-slim) by default. For self-hosted GitLab, ensure a runner with Docker executor is available.

Merge requests: pipelines run on MR open/update; enable Pipelines must succeed under Settings → Merge requests if you want tests to block merges.

No CI variables or secrets required for the default jobs.

Layout

agentic-orchestration-tool/
├── pytest.ini              # markers, default -m filter
├── requirements-dev.txt    # pytest
├── tests/
│   ├── conftest.py         # tool_root, config_dir fixtures
│   ├── test_config_loader.py
│   ├── test_catalog_loader.py
│   ├── test_catalog_credentials.py
│   ├── test_mcp_catalog.py
│   ├── test_goal_format_hints.py
│   └── test_provider_goal_match.py
└── scripts/
    ├── run-tests.sh
    ├── run-tests.ps1
    ├── docker-worker-smoke.sh   # CI + Linux local
    ├── docker-worker-smoke.ps1  # Windows local
    ├── k8s-kind-up.sh           # kind cluster + host bind mount
    ├── k8s-kind-up.ps1          # Windows
    ├── k8s-apply-run-store.sh   # PVC backend (hostpath | nfs | filestore)
    └── k8s-kind-e2e.sh          # CI kind e2e (stub worker)

Add new tests under tests/; keep fast pure logic in @pytest.mark.unit.

Pytest markers (convention)

Marker	Use
`unit`	Pure functions, YAML load, catalog filter — no network
`integration`	Subprocess worker, mini end-to-end without real LLM
`live_llm`	Real OpenAI/Ollama/HF calls
`backend_inprocess`	Full crew kickoff regression (future)
`backend_subprocess`	Subprocess execution backend (framework F4)
`backend_kubernetes`	kind/cluster integration
`kind_e2e`	Live kind Jobs via stub worker (`AGENTIC_KIND_E2E=1`)

Roadmap — tests for dual execution framework

Align with Dual execution framework guardrails and phases.

Now (shipped)

pytest + CI scaffold
Unit tests: config load, catalog credentials, MCP env substitution, planner guardrails
Import smoke job

Framework F1 — CrewAI backend extract

T-F1 run_built_workflow / backend factory smoke with mocked kickoff
T-F1 Regression: AGENTIC_EXECUTION_BACKEND=inprocess golden path (mocked LLM; tests/test_backend_inprocess_regression.py)

Framework F2 — Step spec + coordinator

T-F2 build_step_specs() from fixture WorkflowConfig
T-F2 prepare_step_description() inject / cap chars
T-F2 Run store round-trip
T-F2.7 / G4 Same workflow YAML → identical step spec resolution (materializer parity; step_specs_resolution_fingerprint)
T-F2 StepCoordinator sequential loop, failure stop, run store writes (tests/test_step_coordinator.py)

Framework F4 — Subprocess backend

T-F4 @pytest.mark.backend_subprocess two-step subprocess run (mock worker; tests/test_backend_subprocess.py)
T-F4-docker Docker worker image smoke in CI (scripts/docker-worker-smoke.sh, job docker-worker-smoke)

K8s plan K3+

T-K3 @pytest.mark.backend_kubernetes — mocked Job integration (tests/test_backend_kubernetes.py)
T-K3-kind @pytest.mark.kind_e2e — live kind cluster in CI (scripts/k8s-kind-e2e.sh, stub worker)

Platform agent harness (shipped v1.4.0)

See Agent harness roadmap for design.

T-H0 @pytest.mark.agent_harness — L0 static validation for full Agent provider catalog
T-H1 L1 connectivity for credentialed catalog subset in CI
T-H2 L2 smoke — nightly workflow (agent-harness-smoke-nightly.yml)
main.py --harness-agent / --harness-batch CLI (platform tiers)
scripts/run-agent-harness.ps1 / .sh and scripts/harness-report.py

User agent harness packs (shipped v1.5.0)

See User agent harnesses — domain scenario libraries (adopters maintain packs outside core; healthcare example in-repo).

T-UH1 --harness-dir / AGENTIC_EXTRA_AGENT_HARNESS_DIRS discovery
T-UH2 @pytest.mark.user_harness unit tests (mocked kickoff; harness CI job)
Scenario YAML + deterministic assertions + optional rubric (orchestration/user_agent_harness.py)
Healthcare vertical example pack under examples/verticals/healthcare/harnesses/gpt_research/
scripts/run-user-harness.ps1 / .sh; --example overlay merges vertical harnesses/

Optional: pre-commit (local)

Not required for GitHub CI. To run unit tests before every local commit:

pip install pre-commit
# .pre-commit-config.yaml (future): pytest -m unit from agentic-orchestration-tool

Defer until the team wants local hooks; GitHub Actions is the shared gate.

Optional: live LLM CI job

Separate workflow: .github/workflows/live-llm.yml. Not required for PR merges.

One-time GitHub setup

Open https://github.com/zlatko-lakisic/agentic-orchestration/settings/secrets/actions
New repository secret → name OPENAI_API_KEY, value = your OpenAI key
(Optional) add OPENAI_MODEL_NAME as a secret or rely on workflow env defaults

Run the job

Method	Command / action
GitHub UI	Actions → Live LLM → Run workflow
CLI	`gh workflow run live-llm.yml`

Weekly schedule: Mondays 06:00 UTC (disable by removing the schedule block in the workflow file).

What it runs

tests/test_live_llm_smoke.py (@pytest.mark.live_llm):

Static workflow.yaml through in-process backend
Short --dynamic goal with AGENTIC_PLANNER_MAX_STEPS=2

Local (same tests)

cd agentic-orchestration-tool
pip install pytest-timeout   # optional but used in CI
pytest -m live_llm -o addopts="-ra"

Requires OPENAI_API_KEY in .env or the shell environment. Forks and PRs from contributors do not receive your secrets; the job skips cleanly when the key is missing.

Adding a new unit test (checklist)

Prefer testing pure orchestration code (no crew.kickoff()).
Mark with @pytest.mark.unit.
Use tool_root / config_dir fixtures from conftest.py.
Avoid real network; use monkeypatch for env vars.
Run pytest locally before push.

Wiki maintenance

When adding backend tests, update:

This page — marker table and roadmap checkboxes
Dual execution framework — guardrail G4 / F2.7
Kubernetes execution upgrade — testing strategy table