The Orchestrator: Owning the Loop
In the previous article, we established a simple but often misunderstood idea: an agent is not an autonomous entity. An agent is a loop. It observes input, reasons about the current situation, proposes a next step, executes that step through predefined mechanisms, observes the result, and then either continues or terminates. The language model participates in that loop, but it does not own it. Authority, state, and execution remain outside the model, in software that is explicitly designed and controlled. This distinction matters more in practice than it does in theory. Without it, systems drift toward hidden control paths, unclear ownership, and behavior that feels unpredictable. With it, agentic systems become inspectable, debuggable, and safe to evolve.
In this article, we move from abstraction to something tangible. We will describe a minimal on-prem agentic system that does almost nothing, by design. Its purpose is not to be impressive, but to make the control model visible.
A deliberately small system
The system we will build can perform exactly two operations. It can add numbers. It can multiply numbers. There are no other capabilities. If a request falls outside those boundaries, the system will say so plainly. This constraint is intentional. When capabilities are trivial, structure becomes obvious. There is nowhere for complexity to hide, and no room for accidental intelligence to creep in. The system consists of three conceptual pieces: an orchestrator, a local language model used only for reasoning, and two execution services that perform real work. The orchestrator sits at the center. Every request enters through it, every decision passes through it, and nothing executes unless it explicitly allows it.
flowchart TD
U[User Request / Intent]
O[Orchestrator Service]
L[Local LLM Runtime]
R[Tool Registry]
A[Addition Service]
M[Multiplication Service]
S[State & Observability]
U --> O
O -->|reasoning prompt| L
L -->|proposed action| O
O --> R
R -->|allowed tools| O
O -->|execute| A
O -->|execute| M
A -->|result| O
M -->|result| O
O --> S
O -->|final response| U
%% Styling
classDef orchestrator fill:#1f2937,color:#ffffff,stroke:#111827,stroke-width:2px
classDef llm fill:#7c3aed,color:#ffffff,stroke:#5b21b6
classDef tool fill:#2563eb,color:#ffffff,stroke:#1e40af
classDef registry fill:#0f766e,color:#ffffff,stroke:#134e4a
classDef state fill:#6b7280,color:#ffffff,stroke:#374151
classDef user fill:#f59e0b,color:#111827,stroke:#d97706
class O orchestrator
class L llm
class A,M tool
class R registry
class S state
class U user
The orchestrator receives the user’s intent and treats it as input to a controlled process rather than a command to be executed. It maintains task state, determines what information is passed to the model, and decides when reasoning is required. When the orchestrator invokes the language model, it does so with clear constraints. The model is told what services exist and what those services do. It is not asked to solve the problem directly. It is asked to propose what should happen next given the available capabilities. Crucially, the orchestrator does not trust the model blindly. Any proposal returned by the model is validated. If the proposal refers to a known service with valid inputs, the orchestrator may execute it. If it does not, the orchestrator declines and terminates the loop.
In this system, the orchestrator decides what is allowed, when execution occurs, and when the task is complete. The model never bypasses that control.
The role of the language model
The language model in this system runs locally, on-prem, using a small, general-purpose model. Its job is not to compute results or perform actions. Its job is to interpret intent and suggest a structured next step. The model does not call services. It does not open files. It does not store memory. It does not loop on its own. It receives context and returns a proposal. That proposal may be correct or incorrect, but it never has the power to act on its own. This narrow role makes the model easy to reason about and easy to replace. If the model behaves poorly, the system remains intact. The orchestrator still owns state, execution, and termination.
Execution services as tools, not agents
The addition and multiplication services are ordinary microservices. They do one thing and return a result. They have no awareness of the orchestrator, the model, or the larger system. They are not intelligent components and they do not participate in reasoning. This is an important design choice. In an agentic system, tools should be simple, predictable, and explicit. Intelligence belongs in reasoning. Authority belongs in orchestration. When tools remain simple, systems remain stable. Because these services have narrow, well-defined contracts, the orchestrator can safely expose them to the model as capabilities without risking unintended behavior.
If a user asks the system to perform an operation that is not supported, the correct behavior is not to guess or approximate. The model, constrained by the tool descriptions it has been given, should conclude that no valid action exists. The orchestrator then ends the loop and returns a clear response indicating that the request cannot be answered with the available services.
Agentic systems become dangerous when models are encouraged to improvise beyond the system’s actual capabilities. By contrast, a system that clearly understands and enforces its own limits is predictable and trustworthy.
Although this system is trivial, it demonstrates the most important property of a well-designed agentic architecture: control flows outward from the orchestrator, not inward from the model. The model proposes. The system disposes.
Establishing the LLM runtime
Before building an orchestrator or defining any execution services, we need a reasoning engine that the system can invoke in a controlled and predictable way. For this series, that role will be filled by a local language model running entirely on-prem. We will use Ollama not because it is sophisticated, but because it is deliberately simple. It exposes a clean interface, runs comfortably on developer hardware, and does not impose architectural assumptions on how it should be used.
The choice of model itself is intentionally modest. We do not need deep reasoning or broad knowledge for this system to function correctly. The model’s task is narrow: map user intent to a small, explicitly declared set of actions, or conclude that no valid action exists. If the model occasionally proposes the wrong thing, the system remains safe, because the orchestrator retains full control over what is executed.
All of the work in this series will be done on Ubuntu 25, using a standard developer environment with no special assumptions. This keeps the system grounded in a realistic, reproducible setup and ensures that nothing we build depends on hidden platform behavior.
To install and activate the local language model runtime, we will use Ollama. Ollama provides a simple way to run language models locally and exposes them over a lightweight HTTP interface, which is exactly what we need for a controlled, on-prem reasoning dependency.
On Ubuntu 25, installation is a single step:
curl -fsSL https://ollama.com/install.sh | sh
Once installed, you can verify that the runtime is available by checking its version:
ollama --version
To start the Ollama service manually, run:
ollama serve
At this point, the language model runtime is active and listening locally for requests. Nothing else in the system depends on it yet. We are simply ensuring that a reasoning engine exists and can be invoked when needed.
For convenience during development, Ollama can also be run as a background service managed by systemd:
sudo systemctl enable ollama
sudo systemctl start ollama
With this in place, the language model runtime becomes a stable, always-available dependency. From the orchestrator’s perspective, it is just another service: callable, observable, and replaceable. No control logic lives here, and no authority is granted. The model will reason when asked, and remain idle otherwise.
Pulling a modest model and testing the reasoning loop
With the Ollama service running, the next step is to download a small, modest model and confirm that it behaves the way we expect. At this stage, we are not evaluating intelligence or depth of knowledge. We are validating that the model can read instructions, respect constraints, and return structured, predictable output.
For this system, a lightweight general-purpose model is sufficient.
ollama pull llama3.2:3bOnce the model is available, we can interact with it directly to establish a baseline. This is not yet part of the agentic system; it is simply a sanity check that the reasoning engine is alive and responsive.
ollama run llama3.2:3bYou can now type a prompt and observe the response. For example, a simple instruction like:
You can add numbers and multiply numbers. What should you do with 4 and 5 if the user asks for multiplication?
The exact wording of the response is not important. What matters is that the model demonstrates it can interpret intent and map it to an action conceptually, rather than attempting to compute the result itself. If the model starts explaining or calculating, that is not a failure, but it does tell us something about how explicit we will need to be later.
This is where prompt shaping begins.
For the orchestrator, we are not interested in conversational answers. We want proposals. That means the prompt we eventually send to the model will be far more constrained and far less friendly. Instead of asking open questions, the orchestrator will describe the world as it exists and ask the model to choose from it.
A representative prompt shape looks like this in plain language:
The user asked: “multiply 4 and 5.”
The available services are add(a, b) and multiply(a, b).
Respond with which service should be used and the arguments, or state that no valid service exists.
When tested interactively, you should see the model gravitate toward naming the correct operation rather than inventing new ones. If it occasionally does the wrong thing, that is acceptable. The orchestrator will handle validation. The model’s job is to propose, not to be perfect.
The user asked: “multiply 4 and 5.”
… The available services are add(a, b) and multiply(a, b).
… Respond with which service should be used and the arguments, or state that no valid service exists.
Since the user asked for multiplication, we should use the "multiply" service.
.::.The service to use is: multiply(4, 5)
This step is important because it reinforces a core design principle of the system: prompts are not conversations, they are contracts. The more explicit the constraints, the more predictable the model’s output becomes.
Defining the orchestrator
Once you are satisfied that the model responds sensibly under explicit constraints, there is no reason to interact with it directly again. From this point onward, the language model becomes an internal dependency. Humans stop prompting it. The system does. This transition is important. Manual interaction is useful for validation, but it is not how agentic systems operate in practice. In a real system, the model does not answer users. It answers the orchestrator. The orchestrator decides when reasoning is required, what context is provided, and whether the model’s response is even admissible.
The orchestrator is where intent becomes behavior.
Conceptually, the orchestrator is responsible for owning the entire task lifecycle. It receives the user request, determines the current state of the task, invokes the language model when a decision is needed, validates the proposed action, executes permitted services, records outcomes, and decides when the loop should terminate. Nothing else in the system has that authority.
This is also where agentic systems differ most sharply from prompt-driven applications. The model is no longer the center of gravity. It is a component that participates in a larger control loop. The orchestrator does not ask the model to “solve” the problem. It asks the model to propose what should happen next, given a world that the orchestrator defines.
From the orchestrator’s perspective, a model response is just data. It may be useful data, but it is not trusted by default. Every proposal is checked against known tools, allowed actions, and system rules. If the proposal is valid, the orchestrator executes it. If it is not, the orchestrator rejects it and either terminates the task or asks for another proposal. The model never executes code, and it never advances the system on its own.
In the sections that follow, we will build this orchestrator.
import requests
import json
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.2:3b"
class Orchestrator:
def __init__(self):
self.tools = {
"add": self.add,
"multiply": self.multiply,
}
def handle_request(self, user_input: str):
state = {
"user_input": user_input,
"result": None,
}
proposal = self.invoke_model(state)
action = proposal.get("action")
arguments = proposal.get("arguments", {})
if action not in self.tools:
return "Sorry, I do not have a service capable of handling this request."
try:
result = self.tools[action](**arguments)
except Exception:
return "The proposed action was invalid."
state["result"] = result
return result
def invoke_model(self, state):
prompt = f"""
You are a reasoning component inside a controlled system.
Available services:
- add(a: number, b: number)
- multiply(a: number, b: number)
User request:
"{state['user_input']}"
Respond ONLY in JSON.
If a service applies, respond with:
{{ "action": "<service_name>", "arguments": {{ "a": number, "b": number }} }}
If no service applies, respond with:
{{ "action": "none" }}
"""
response = requests.post(
OLLAMA_URL,
json={
"model": MODEL_NAME,
"prompt": prompt,
"stream": False,
},
)
raw_output = response.json()["response"]
try:
return json.loads(raw_output)
except json.JSONDecodeError:
return {"action": "none"}
def add(self, a: float, b: float):
return a + b
def multiply(self, a: float, b: float):
return a * b
The orchestrator is readable in one sitting. You can trace a request from entry to termination without guessing. You can log every step without reverse-engineering model behavior. You can reject bad proposals without special handling. Most importantly, the model cannot exceed its authority.
This implementation mirrors the conceptual model described earlier
Tool registration defines the execution boundary
self.tools = {
"add": self.add,
"multiply": self.multiply,
}
This dictionary is the complete set of executable capabilities. Nothing outside this map can ever run. The language model does not call functions directly; it can only propose an action name. If that name does not exist here, execution is impossible.
Adding or removing system capabilities is a code change, not a prompt change.
handle_request owns the full request lifecycle
def handle_request(self, user_input: str):
This method is the single entry point into the system. It creates minimal task state, invokes the model once, validates the proposal, executes an allowed tool if applicable, and returns the result.
There is no retry logic, no implicit looping, and no hidden control flow. The request either completes successfully or terminates explicitly.
Model output is treated as untrusted input
proposal = self.invoke_model(state)
action = proposal.get("action")
arguments = proposal.get("arguments", {})
The model’s response is parsed as data, not behavior. Only two fields are inspected: the proposed action and its arguments. Everything else is ignored.
If the action does not exist in the tool registry, the request ends immediately. If the arguments are malformed, execution fails safely. The model never advances the system on its own.
invoke_model is the only model interaction
def invoke_model(self, state):
This function encapsulates all interaction with the language model. The prompt defines a closed world, lists available services, and requires a strictly formatted JSON response.
Any deviation—hallucinated services, malformed JSON, or explanatory text—results in a default "action": "none". The orchestrator does not attempt to recover or reinterpret model output.
Mock Execution services remain simple and isolated
def add(self, a: float, b: float):
def multiply(self, a: float, b: float):
These functions perform work and return results. They have no context, no state, and no awareness of the agentic system. This separation ensures that reasoning, control, and execution remain distinct.
This is the foundation on which larger agentic systems can be built.
If you’ve made it this far, I hope you enjoyed working through this example as much as I did building it. In the next article, we’ll take this same foundation and explore how it evolves as systems grow. Until then, thanks for reading.
