Zlatko Lakisic

Logo

Private Networks Architect & Practice Lead · Enterprise Presales & Connected Solutions

View the Project on GitHub zlatko-lakisic/zlatko-lakisic

Local AI & MCP Architecture

← Back to Main Portfolio

Why MCP Matters

Model Context Protocol (MCP) standardizes how AI agents discover and invoke tools — the same integration problem enterprise architects have solved with API gateways and ESBs, now applied to agentic workflows. Without a protocol boundary, every agent hard-codes tool integrations; with MCP, tools become pluggable services with explicit capability contracts.

For organizations evaluating AI automation, MCP provides:


Enterprise Use Cases

Use case MCP role Enterprise parallel
Operational automation Agents invoke Home Assistant, MQTT, or internal REST APIs via MCP servers Integration bus connecting line-of-business systems
Knowledge retrieval Docs/search MCP servers feed grounded context into planners Enterprise search and RAG pipelines
Multi-step workflows CrewAI crews chain tool calls through a catalog BPM/orchestration with human approval gates
Vertical packs Domain overlays under examples/verticals/ Industry solution accelerators
Discovery workshops YAML + env-var catalogs validate workflows before custom code HLSD and POC scoping

Security Considerations

MCP introduces the same trust boundaries as any integration layer:

Enterprise deployments should treat MCP servers like microservices: authenticated endpoints, segmented networks, and logged invocation trails.


Local vs Cloud MCP

Dimension Local MCP (this portfolio) Cloud-hosted MCP
Data sovereignty Telemetry, video, and automation state stay on owned hardware Data crosses provider boundaries
Latency Sub-second tool round-trips on LAN Depends on WAN and provider region
Cost model Recycle-first bare-metal; no per-token egress surprises Usage-based billing at scale
Model choice Ollama, vLLM, JetStream on local GPU/TPU Managed APIs (OpenAI, Anthropic, etc.)
Best fit Edge AI, home-lab validation, regulated or air-gapped patterns Burst capacity, frontier models, global teams

The agentic-orchestration stack supports both — local backends by default, commercial APIs when credentials and latency profiles justify them.


Lessons Learned

Lesson Detail
Catalog before code YAML workflows and env-var backend catalogs reduce POC friction more than bespoke planner glue
Separate planning from execution LiteLLM planner + CrewAI crew mirrors enterprise separation of orchestration and worker services
VRAM-aware routing Smaller models for planning steps; reserve large models for synthesis — directly reduces hardware spend
MCP is the integration hub Resist one-off tool imports; every new capability should register as an MCP server
Pressure-test locally Patterns validated on Proxmox clusters translate to client recommendations with real utilization data
Feedback loops matter Session history and learning loops in the orchestrator mirror enterprise voice-of-customer pipelines

Overview

MCP Architecture Blueprint — client, server, and data source request flow

Standard MCP request/response flow: the client translates AI requests into protocol format; servers fetch from external data sources and return structured context.

This deep-dive covers self-hosted AI systems that extend enterprise integration thinking into edge inference: model-agnostic orchestration, multi-modal vision pipelines, and MCP tool servers that connect agents to real environments (Home Assistant, documentation, search, and custom catalogs).

Primary repositories:


System Architecture

Conceptual MCP flow

How context moves from data sources through MCP into local execution — the pattern this repository implements end to end.

graph TB
    subgraph DataLayer["Data Layer"]
        A[Context Data Sources]
    end

    subgraph OrchestrationEngine["Orchestration Engine"]
        B(Agentic Orchestration Layer)
        C[Ollama Inference Engine]
    end

    subgraph SecurityIsolation["Security Isolation"]
        D[Localized Execution Sandbox]
    end

    A -->|Model Context Protocol| B
    B -->|Secure Local Payload| C
    C -->|Multi-Modal LLMs| D

Orchestration blueprint

Detailed component flow across planning, model routing, backends, and MCP tool servers.

flowchart LR
    subgraph Input
        UI[Web UI / YAML Goals]
        API[REST / WebSocket]
    end

    subgraph Orchestration
        Planner[LiteLLM Planner]
        Crew[CrewAI Agent Crew]
        Router[Model Router]
    end

    subgraph Backends
        Ollama[Ollama Local]
        Cloud[OpenAI / Anthropic / HF]
        TPU[vLLM / JetStream]
    end

    subgraph Tools
        MCP[MCP Tool Servers]
        HA[Home Assistant]
        Docs[Docs / Search]
    end

    UI --> API
    API --> Planner
    Planner --> Crew
    Crew --> Router
    Router --> Ollama
    Router --> Cloud
    Router --> TPU
    Crew --> MCP
    MCP --> HA
    MCP --> Docs

The orchestration layer separates planning (which model and which steps) from execution (agent roles and tool calls). MCP servers act as the integration boundary — the same pattern as REST API adapters in enterprise architecture, applied to agent tooling.


Agentic Orchestration

Repository: github.com/zlatko-lakisic/agentic-orchestration

Architectural blueprint

Tech stack

Layer Components
Orchestration CrewAI, YAML workflow definitions, dynamic planning modes
Model routing Ollama, OpenAI-compatible APIs, Anthropic Claude, Hugging Face, vLLM, JetStream
Tooling Model Context Protocol (MCP) catalog, Home Assistant, docs/search servers
Interfaces CLI tool package, Web UI with local WebSockets, session and learning loops

Key outcomes

Design lessons

Challenge Approach
Latency on local hardware Per-task backend selection with VRAM heuristics; prefer smaller models for planning steps
Context window limits Session management and knowledge-base retrieval instead of stuffing full history into prompts
Tool sprawl MCP catalog as integration hub — same role as an API gateway in distributed systems
Proof-of-concept friction YAML + env-var catalogs so teams validate workflows before committing to custom code

Multi-Modal LLM Integration (CodeProject.AI)

Repository: github.com/zlatko-lakisic/CodeProjectAI-OmegaOllamaMLLM

Architectural blueprint

Plugin for CodeProject.AI Server that routes image and video analysis through Ollama vision models. Video is handled via frame sampling and summarization rather than sending full streams to the model.

Tech stack

Ollama · CodeProject.AI module pipeline · Moondream (default vision model) · containerized execution

Key outcomes


Relationship to Enterprise Work

Enterprise pattern Local AI equivalent
API gateway / integration bus MCP catalog and model router
Credential-scoped service catalog Backend catalog filtered by env credentials
HLSD and discovery artifacts YAML workflows and vertical overlays
Feedback loop to product roadmap Learning loop and session history in orchestrator

The same architectural instincts — bounded integrations, catalog-driven adoption, outcome-first scoping — apply whether the deployment target is a Fortune 500 private network or a Proxmox cluster in a home lab.


← Back to Main Portfolio · Infrastructure & Home Lab