Compass — Research Project

Compass is our internal research project on agentic workflows for complex research tasks: translate multi-step work that a human would spread over days into an orchestrated agent flow — with control checkpoints, source provenance, and auditable reasoning. Written in Python on the Claude Agent SDK, with MCP-based tool connections to our internal knowledge sources.

Python 3.13Claude Agent SDKMCPPydantic

What Compass investigates

The open question behind agentic workflows is: where does the autopilot break? An agent that runs one hour of research autonomously saves a day of work. An agent that tips into a hallucination loop after forty minutes costs a day of cleanup plus a trust hit. Compass tests concrete research tasks (market studies, compliance research, vendor comparisons) with different agent architectures — single-pass vs. plan-execute-replan, with and without human-in-the-loop checkpoints, with various verification strategies.

What we have learned so far

Three robust insights from the research mode. First: source provenance is non-negotiable — every factual claim by the agent must be traceable to a concrete document, otherwise the result is not verifiable. Second: plan-execute-replan beats single-pass on almost every complex task, but plan quality decides everything. Third: targeted human-in-the-loop checkpoints (typically after the planning phase and before final synthesis) noticeably increase client acceptance without meaningfully reducing efficiency.

Practice proof for agentic-workflow advisory

Compass is intentionally not a product — it is the workbench where we honestly test agentic workflows before we recommend them to customers. Concrete recommendations flow from there into our advisory mandates: which tasks are agent-suitable, which are not; what does a realistic cost/effort path look like; which verification mechanisms are regulatorily required in the relevant industry. Again: no marketing, just operating it ourselves.