Testing and CI
Testing and CI
How to run automated checks locally and on GitHub or GitLab for agentic-orchestration. Designed to grow with Dual execution framework and Kubernetes execution upgrade without requiring API keys or a cluster for the default tier.
Related: Configuration, Architecture, Dual execution framework, Kubernetes execution upgrade, Agent harness roadmap, User agent harnesses
Quick
| Tier | When | Needs API keys? | Runs in CI? |
|---|---|---|---|
| Unit | Every commit / MR | No | ✅ Yes (default) |
| Import smoke | Every commit / MR | No | ✅ Yes |
| Docker worker smoke | Every commit / MR | No | ✅ Yes (K8s Phase 2.3) |
| kind Kubernetes e2e | Every commit / MR | No | ✅ Yes (stub worker, no LLM; includes agent skills spec handoff) |
| Integration | Manual / with AGENTIC_KIND_E2E=1 locally |
Sometimes | ❌ Excluded from default pytest |
| Live LLM | Local only unless secrets configured | Yes | ❌ Never by default |
| Agent harness L0 (static catalog) | Every commit / MR | No | ✅ Yes (agent-harness-static job) |
| Agent harness L1 (connectivity) | Every commit / MR (credentialed subset) | Sometimes | ✅ Yes (agent-harness-connectivity job) |
| Agent harness L2+ (smoke / capability) | Nightly / manual | Yes | ❌ By default (weekly agent-harness-smoke-nightly.yml) |
| Backend parity (F2.7 / dual framework) | After refactor lands | No for resolution tests | ✅ Planned |
Default CI command excludes integration, live_llm, and agent_harness markers (see pytest.ini).
Agent harness (local)
cd agentic-orchestration-tool
python main.py --harness-batch --harness-tier static # L0 full catalog
python main.py --harness-agent gpt_research --harness-tier smoke
pytest -m agent_harness -o addopts="-ra"
powershell -File scripts/run-agent-harness.ps1 -Tier static -Filter "gpt_*"
python scripts/harness-report.py
Local — before you commit
From agentic-orchestration-tool/:
# One-time (or after dependency changes)
pip install -r requirements.txt -r requirements-dev.txt
# Same as CI
pytest
Helpers:
| Script | Platform |
|---|---|
scripts/run-tests.sh |
Linux / macOS / Git Bash |
scripts/run-tests.ps1 |
Windows PowerShell |
Examples:
pytest -m unit # explicit unit tier
pytest tests/test_config_loader.py # one file
pytest -m integration # include integration (when added)
pytest -m "live_llm" # real LLM calls — local only
GitHub Actions
Workflow: .github/workflows/ci.yml (on push to main/master and on pull requests).
| Job | What it does |
|---|---|
| python-unit | pip install + pytest (unit tier) |
| python-smoke-import | Install runtime deps; import orchestration.* and main |
| docker-worker-smoke | Build docker/Dockerfile.worker; invalid spec → exit 2 (scripts/docker-worker-smoke.sh) |
| kind-kubernetes-e2e | kind cluster + hostPath PVC + stub worker Jobs (scripts/k8s-kind-e2e.sh; tests/test_kind_kubernetes_e2e.py, including test_agent_skills_smoke_kind_kubernetes_workflow) |
No repository secrets required for default CI.
Local Docker smoke (Windows / Linux):
# Windows
powershell -File scripts/docker-worker-smoke.ps1
# Linux / macOS / Git Bash
bash scripts/docker-worker-smoke.sh
kind Kubernetes e2e (Linux / Git Bash — same as CI):
cd agentic-orchestration-tool
bash scripts/k8s-kind-e2e.sh
Windows local cluster (real LLM, manual):
.\scripts\k8s-kind-up.ps1
$env:AGENTIC_K8S_RUN_STORE_VOLUME = "hostpath"
.\scripts\k8s-apply-run-store.ps1
.\scripts\docker-worker-smoke.ps1 # builds agentic-orchestrator-worker:local
# load into kind: .\.tools\kind.exe load docker-image agentic-orchestrator-worker:local --name agentic
python main.py config/workflows/workflow_brainstorm.yaml --quiet
GitLab CI
Pipeline: .gitlab-ci.yml at the monorepo root (runs on merge requests and on pushes to the default branch, main, or master).
| Job | What it does |
|---|---|
| unit-tests | pip install -r requirements-test.txt + pytest |
| import-smoke | Install full requirements.txt; import orchestration.* and main |
| docker-worker-smoke | DinD: build worker image + smoke script (same as GitHub job) |
| kind-kubernetes-e2e | kind + hostPath PVC + stub worker (scripts/k8s-kind-e2e.sh) |
Enable: push .gitlab-ci.yml to your GitLab project — GitLab picks it up automatically when CI/CD is enabled (Settings → General → Visibility and Build → Pipelines).
Runners: uses shared GitLab.com runners (python:3.12-slim) by default. For self-hosted GitLab, ensure a runner with Docker executor is available.
Merge requests: pipelines run on MR open/update; enable Pipelines must succeed under Settings → Merge requests if you want tests to block merges.
No CI variables or secrets required for the default jobs.
Layout
agentic-orchestration-tool/
├── pytest.ini # markers, default -m filter
├── requirements-dev.txt # pytest
├── tests/
│ ├── conftest.py # tool_root, config_dir fixtures
│ ├── test_config_loader.py
│ ├── test_catalog_loader.py
│ ├── test_catalog_credentials.py
│ ├── test_mcp_catalog.py
│ ├── test_goal_format_hints.py
│ └── test_provider_goal_match.py
└── scripts/
├── run-tests.sh
├── run-tests.ps1
├── docker-worker-smoke.sh # CI + Linux local
├── docker-worker-smoke.ps1 # Windows local
├── k8s-kind-up.sh # kind cluster + host bind mount
├── k8s-kind-up.ps1 # Windows
├── k8s-apply-run-store.sh # PVC backend (hostpath | nfs | filestore)
└── k8s-kind-e2e.sh # CI kind e2e (stub worker)
Add new tests under tests/; keep fast pure logic in @pytest.mark.unit.
Pytest markers (convention)
| Marker | Use |
|---|---|
unit |
Pure functions, YAML load, catalog filter — no network |
integration |
Subprocess worker, mini end-to-end without real LLM |
live_llm |
Real OpenAI/Ollama/HF calls |
backend_inprocess |
Full crew kickoff regression (future) |
backend_subprocess |
Subprocess execution backend (framework F4) |
backend_kubernetes |
kind/cluster integration |
kind_e2e |
Live kind Jobs via stub worker (AGENTIC_KIND_E2E=1) |
Register new markers in pytest.ini when adding tiers.
Roadmap — tests for dual execution framework
Align with Dual execution framework guardrails and phases.
Now (shipped)
- pytest + CI scaffold
- Unit tests: config load, catalog credentials, MCP env substitution, planner guardrails
- Import smoke job
Framework F1 — CrewAI backend extract
- T-F1
run_built_workflow/ backend factory smoke with mocked kickoff - T-F1 Regression:
AGENTIC_EXECUTION_BACKEND=inprocessgolden path (mocked LLM;tests/test_backend_inprocess_regression.py)
Framework F2 — Step spec + coordinator
- T-F2
build_step_specs()from fixtureWorkflowConfig - T-F2
prepare_step_description()inject / cap chars - T-F2 Run store round-trip
- T-F2.7 / G4 Same workflow YAML → identical step spec resolution (materializer parity;
step_specs_resolution_fingerprint) - T-F2
StepCoordinatorsequential loop, failure stop, run store writes (tests/test_step_coordinator.py)
Framework F4 — Subprocess backend
- T-F4
@pytest.mark.backend_subprocesstwo-step subprocess run (mock worker;tests/test_backend_subprocess.py) - T-F4-docker Docker worker image smoke in CI (
scripts/docker-worker-smoke.sh, jobdocker-worker-smoke)
K8s plan K3+
- T-K3
@pytest.mark.backend_kubernetes— mocked Job integration (tests/test_backend_kubernetes.py) - T-K3-kind
@pytest.mark.kind_e2e— live kind cluster in CI (scripts/k8s-kind-e2e.sh, stub worker)
Platform agent harness (shipped v1.4.0)
See Agent harness roadmap for design.
- T-H0
@pytest.mark.agent_harness— L0 static validation for full Agent provider catalog - T-H1 L1 connectivity for credentialed catalog subset in CI
- T-H2 L2 smoke — nightly workflow (
agent-harness-smoke-nightly.yml) main.py --harness-agent/--harness-batchCLI (platform tiers)scripts/run-agent-harness.ps1/.shandscripts/harness-report.py
User agent harness packs (shipped v1.5.0)
See User agent harnesses — domain scenario libraries (adopters maintain packs outside core; healthcare example in-repo).
- T-UH1
--harness-dir/AGENTIC_EXTRA_AGENT_HARNESS_DIRSdiscovery - T-UH2
@pytest.mark.user_harnessunit tests (mocked kickoff; harness CI job) - Scenario YAML + deterministic assertions + optional rubric (
orchestration/user_agent_harness.py) - Healthcare vertical example pack under
examples/verticals/healthcare/harnesses/gpt_research/ scripts/run-user-harness.ps1/.sh;--exampleoverlay merges verticalharnesses/
Optional: pre-commit (local)
Not required for GitHub CI. To run unit tests before every local commit:
pip install pre-commit
# .pre-commit-config.yaml (future): pytest -m unit from agentic-orchestration-tool
Defer until the team wants local hooks; GitHub Actions is the shared gate.
Optional: live LLM CI job
Separate workflow: .github/workflows/live-llm.yml. Not required for PR merges.
One-time GitHub setup
- Open https://github.com/zlatko-lakisic/agentic-orchestration/settings/secrets/actions
- New repository secret → name
OPENAI_API_KEY, value = your OpenAI key - (Optional) add
OPENAI_MODEL_NAMEas a secret or rely on workflow env defaults
Run the job
| Method | Command / action |
|---|---|
| GitHub UI | Actions → Live LLM → Run workflow |
| CLI | gh workflow run live-llm.yml |
Weekly schedule: Mondays 06:00 UTC (disable by removing the schedule block in the workflow file).
What it runs
tests/test_live_llm_smoke.py (@pytest.mark.live_llm):
- Static
workflow.yamlthrough in-process backend - Short
--dynamicgoal withAGENTIC_PLANNER_MAX_STEPS=2
Local (same tests)
cd agentic-orchestration-tool
pip install pytest-timeout # optional but used in CI
pytest -m live_llm -o addopts="-ra"
Requires OPENAI_API_KEY in .env or the shell environment. Forks and PRs from contributors do not receive your secrets; the job skips cleanly when the key is missing.
Adding a new unit test (checklist)
- Prefer testing pure orchestration code (no
crew.kickoff()). - Mark with
@pytest.mark.unit. - Use
tool_root/config_dirfixtures fromconftest.py. - Avoid real network; use
monkeypatchfor env vars. - Run
pytestlocally before push.
Wiki maintenance
When adding backend tests, update:
- This page — marker table and roadmap checkboxes
- Dual execution framework — guardrail G4 / F2.7
- Kubernetes execution upgrade — testing strategy table