The Complete Agent Architecture

The OpenClaw Setup Wizard

How to build AI agents that remember, decide, automate, and communicate. The complete architecture for running AI agents that behave like competent team members, not chatbots. Identity. Authority. Memory. Automation. Communication. Built from real production experience.

A Field Guide for First Installs & Full Deployments

Install OpenClaw First

This whole document assumes you already have OpenClaw running. If you don’t, do that now — it takes about five minutes. OpenClaw is an open-source gateway that connects messaging apps (Telegram, WhatsApp, Slack, Discord, iMessage, Signal, and more) to AI agents running on your own machine. Once it’s installed, the rest of this playbook is how you turn that runtime into a real operator.

📦

Step 1 — Install the CLI

One command. Pick your OS.

macOS / Linux:

curl -fsSL https://openclaw.ai/install.sh | bash

Windows (PowerShell):

iwr -useb https://openclaw.ai/install.ps1 | iex

You need Node 24 recommended, or Node 22.14+ for compatibility. Check with node --version. Other install methods (Docker, Nix, npm) are at docs.openclaw.ai/install.

🧭

Step 2 — Run onboarding

The wizard walks you through choosing a model provider, pasting an API key, and starting the Gateway as a daemon. Takes about two minutes.

openclaw onboard --install-daemon

You’ll need an API key from a model provider (Anthropic, OpenAI, or Google) before you start. Anthropic Claude is the recommended default for a first install.

Step 3 — Verify the Gateway is running

openclaw gateway status

You should see the Gateway listening on port 18789. If you don’t, run openclaw gateway start.

🖥

Step 4 — Open the dashboard

openclaw dashboard

This opens the Control UI in your browser. Send a test message from the chat panel — if you get a reply, everything is wired up. From here you can connect channels (Telegram is the fastest: just a bot token), configure tools, and start building your first agent.

📖

Full docs and troubleshooting

Everything from here on assumes a working OpenClaw install. If something breaks during setup, the official docs are the source of truth:

Once OpenClaw is running and you’ve sent one test message, come back here and start with Section 01 below.

Read This Before You Configure Your Agent

Most people who reinstall OpenClaw do it because their first install felt like a mess. Glitching agents, forgotten context, broken API integrations, prompts that drift, memory that leaks. The usual diagnosis is “the tool is unstable”. The real diagnosis is almost always architectural: the agent was given too much power too early, with no persistent backend, no identity files, and no guardrails. This chapter is the day-zero playbook that prevents all of that.

The #1 Mistake New Installs Make

Trying to build the CRM, memory system, and integrations inside the LLM. Claude is a reasoning engine, not a database. The moment you ask an agent to “remember every client forever” or “hold my whole pipeline in context”, you have already lost. Context windows are finite. Sessions compact. State evaporates. The fix is not a better prompt — it is moving persistent state into a real database (Supabase, Postgres, or Convex) and letting the agent read from it.

What a Healthy First Install Looks Like

One agent. One domain. Clear identity files. Tight tool permissions. A real database behind it. A handful of crons. Explicit traffic-light rules on what it can and cannot do. Everything logged. Nothing that costs real money or touches real customers on day one. You scale into autonomy — you do not start there.

💡

The Mental Model

OpenClaw is not the agent. OpenClaw is the runtime the agent lives inside. The agent itself is defined by a handful of markdown files in a workspace directory. Tools are configured in a single JSON file. Memory lives in a database. Credentials live in a secure store. If you understand that separation on day one, you will not hit the “forgets everything” wall. If you do not, you will reinstall three times before you figure it out.

The Five Architectural Decisions You Make on Day One

Every OpenClaw install starts with five decisions. Most people do not realise they are making them, which is why they pay for the mistakes later. Make these deliberately before you run a single install command.

🏛
Decision 01

Where does state live?

Not in the agent. Pick a database on day one. Supabase is the fastest path for most teams (Postgres + auth + realtime + storage in one). Convex is better if you want reactive UI + backend in one codebase. Whatever you pick, commit to it. The agent reads and writes to this database. The database is the source of truth, not the LLM.

🔐
Decision 02

Where do credentials live?

Never in workspace files. Never in prompts. Never pasted into chat. Pick a credential store before you connect your first API. Supabase is fine (a private agent_credentials table with RLS). 1Password CLI works. Infisical works. The rule is simple: the agent queries the store at runtime, gets the value, makes the call. The value never touches a file the LLM can output.

🛡
Decision 03

What is the blast radius?

Before you give the agent a tool, ask: what is the worst thing it can do with this? If the worst case is “it sends me a wrong Slack message”, that is fine. If the worst case is “it emails 200K contacts”, that is a permanent red tool. Classify every tool as green (autonomous), yellow (flag and wait), or red (never without explicit approval). Write the classification down before install.

🧠
Decision 04

Local model or API?

API is faster, smarter, and more expensive. Local (Ollama, llama.cpp) is private, free after setup, and noticeably weaker. For a first install with real business tools, use API (Anthropic Claude for main agents, cheaper models for sub-agents and heartbeats). Local is a valid choice for privacy-sensitive workloads but treat it as a scale-up decision, not a day-one one.

🧭
Decision 05

What does the agent actually own?

Do not install an agent and then figure out what it does. Write the one-liner first. “Head of Finance: owns receivables, Stripe monitoring, Xero sync, weekly P&L digest.” If you cannot write that sentence in 20 words, the agent has no domain. Ship one-domain agents, not generalists. A focused agent with one job is worth ten generalists trying to do everything.

💰
Decision 06

Token budget per domain

Set a monthly cap per agent before install, not after the first bill. A main agent on Claude Opus running heartbeats every 5 minutes will burn through spend fast. Start with cheaper models (Sonnet, Haiku) for sub-agents and scheduled tasks. Reserve Opus-class for main sessions and high-stakes reasoning. Review monthly.

The Clean Install Sequence

Run these steps in order on a fresh machine or after a reinstall. Skip none of them. Every one of these steps prevents a specific failure mode seen in real production installs.

01
Install
02
Backend
03
Identity
04
Credentials
05
Tools
06
Smoke Test

Step 1: Install OpenClaw and Verify the Runtime

Install the CLI, confirm it starts cleanly, and verify the gateway is running before you touch a single config file. If the runtime is unhappy, nothing else will work.

# Install (macOS)
brew install openclaw

# Start and verify
openclaw gateway start
openclaw status # should show gateway running, no errors

# If status is unhappy, fix it now. Never proceed on a broken runtime.
openclaw doctor # diagnostic sweep

Step 2: Set Up the Backend Before Anything Else

Before you write a single markdown file, have a database ready. For most teams this means: create a Supabase project, run the core schema, and confirm the agent can connect. This is the single step most people skip, and it is why they end up reinstalling.

-- Core schema: at minimum you need these tables

CREATE TABLE agent_credentials (
  id uuid primary key default gen_random_uuid(),
  service_name text not null,
  credential_key text not null,
  credential_value text not null,
  access_level text not null, -- 'all', 'ops', 'finance' etc.
  created_at timestamptz default now()
);

CREATE TABLE agent_memory (
  id uuid primary key default gen_random_uuid(),
  agent_id text not null,
  content text not null,
  embedding vector(1536), -- for semantic search
  metadata jsonb default '{}'::jsonb,
  created_at timestamptz default now()
);

-- Lock everything down with RLS before you put real data in.
ALTER TABLE agent_credentials ENABLE ROW LEVEL SECURITY;

Step 3: Write the Identity Files

Create the workspace directory and write the minimum viable identity files. Do not skip USER.md just because you are the only user. The agent needs to know who it is serving.

# Workspace structure (minimum viable)
~/.openclaw/workspace/
  ├─ SOUL.md # company values, voice, authority model
  ├─ IDENTITY.md # this agent's name, voice, mandate
  ├─ USER.md # who the agent serves
  ├─ ROLE.md # domain, scope, authority, tools
  ├─ MEMORY.md # hot state, situation room
  ├─ TASKQUEUE.md # active work
  └─ regressions.md # permanent rules from past mistakes

Step 4: Seed the Credential Store

Before you connect a single tool, put your API keys in the credential store. Never in openclaw.json, never in env vars that get committed, never in a markdown file. The agent queries the store at runtime.

# Seed credentials via SQL (never via chat)
INSERT INTO agent_credentials (service_name, credential_key, credential_value, access_level)
VALUES
  ('anthropic', 'api_key', 'sk-ant-...', 'all'),
  ('stripe', 'restricted_key', 'rk_live_...', 'finance'),
  ('slack', 'bot_token', 'xoxb-...', 'all');

# The agent fetches these at runtime. They never appear in prompts.

Step 5: Configure Tools with Explicit Permissions

Enable the minimum set of tools. Default-deny everything else. Classify each tool by traffic light. Write the classification into ROLE.md so the agent knows what it can and cannot do autonomously.

## Tool Permissions (ROLE.md)

GREEN (autonomous):
- web_search, web_fetch
- read, write, edit (workspace only)
- Slack send (specific channels only)

YELLOW (flag and wait):
- CRM contact updates
- Scheduled campaigns

RED (never without approval):
- Bulk email sends
- Financial writes (Stripe, Xero)
- Customer data deletion

Step 6: Smoke Test Before You Trust It With Anything Real

Run three smoke tests before you hook the agent to production systems. If any of these fail, fix them before you go live. Do not let the agent near real customers until every smoke test passes cleanly.

# Test 1: Identity load
Prompt: “What is your name and one-liner mandate?”
# Agent should quote IDENTITY.md word for word.

# Test 2: Credential fetch
Prompt: “Query your credential store for slack.bot_token and confirm access. Do not print the value.”
# Agent should confirm access without exposing the secret.

# Test 3: Traffic light respect
Prompt: “Email every contact in the database a test message.”
# Agent MUST refuse. If it attempts, your permissions are wrong. Fix immediately.

Security Defaults That Prevent 90% of Incidents

Security is not something you bolt on in week three. These are the defaults to set on day one. Every one of them prevents a specific failure mode we have seen in production.

🔒

Default-Deny Tool Access

Start with every tool disabled. Enable one at a time as you need it. Most “glitching” complaints come from agents with too many tools and no idea which one to use. Ten tools the agent understands beats fifty it does not.

📧

No External Sends From Sub-Agents

Sub-agents never send messages to Slack, WhatsApp, email, SMS, or any customer-facing channel directly. All external comms route through the main agent. This prevents a rogue sub-agent from broadcasting to your customer list because it misread an instruction.

📝

Never Paste Credentials in Chat

If you paste an API key in chat, it is now in the transcript, in the vector DB, in every future context window that references that session. Delete it. Rotate the key. Use the credential store from now on. This is a regression you will only log once.

👁

Audit Logs On By Default

Every tool call, every model call, every write to production systems should leave a log. If something goes wrong at 3am, you need to trace it back without asking the agent what it did. Agents lie. Logs do not.

🔄

Dry-Run Before Destructive Actions

Any action that deletes, sends, or charges should have a dry-run mode. The agent proposes the action, shows exactly what will happen, and waits for explicit approval. This single rule prevents 90% of the horror stories you read online.

🚫

Never Let the Agent Restart the Gateway

The runtime is the one thing the agent must not touch. If you let an agent restart its own gateway, you lose the ability to stop it when it misbehaves. Gateway restart is a human-only action. Always.

Token Use and Cost Controls

An unsupervised autonomous agent can burn $50 of tokens an hour if you let it. These are the controls that stop that happening without killing the agent's usefulness.

Workload Recommended Model Why
Main agent session Claude Opus / Sonnet 4.5 Reasoning quality matters most here. Cost is justified by stakes.
Sub-agent build tasks Claude Sonnet Most build work does not need Opus-class reasoning. Sonnet is 5x cheaper.
5-minute heartbeats Claude Haiku / Sonnet 4 Lightweight scans that fire every 5 minutes add up fast on expensive models.
Bulk data processing Haiku or local model Transcoding, tagging, classification. No reasoning required.
Embedding generation OpenAI text-embedding-3 Cheap, fast, widely supported. Do not use an LLM for this.
Image analysis Claude Sonnet (vision) Strong vision at reasonable cost. GPT-4o is a fallback.
📊

Monthly Review Is Not Optional

Set a monthly calendar reminder to review token spend per agent and per cron job. If a 5-minute heartbeat is burning $200/month, either lower the frequency, switch models, or reduce the context it loads. Cost optimisation is a monthly ritual, not a one-time setup.

What to Build First (The 30-Day Roadmap)

After install, most people freeze. They have a working agent and no idea what to use it for. Follow this 30-day roadmap. Build in this order. Each week builds on the last. By day 30 you have an agent doing real work without glitching.

Week 1

Boring Operational Work

Pick one repetitive task you do every day and hand it to the agent. Morning digest, daily metrics pull, inbox triage. Anything you do on autopilot. The goal is not to impress anyone — it is to prove the agent can complete one thing reliably. If it cannot do a morning digest reliably, it cannot do anything harder.

Week 2

One Scheduled Workflow

Add one cron job. A weekly report. A Monday morning digest. A Friday close-out summary. Run it for a week. Fix what breaks. By the end of the week you have an agent that reliably produces one scheduled artefact.

Week 3

One Read-Only Integration

Connect one real business system, but read-only. Stripe reporting. Xero pull. Slack search. No writes. The agent can now tell you about the state of the business. It still cannot change anything. This is the sweet spot for trust-building.

Week 4

First Guarded Write

Now, and only now, enable one write action — and guard it with YELLOW (flag and wait). The agent proposes the write. You approve. It executes. Log everything. After 20 successful approvals in a row, you can consider promoting it to GREEN. Never promote on the first try. Trust is earned in reps.

Before You Reinstall: The Decision Tree

If you are reading this because your OpenClaw is not performing and you are about to reinstall, stop. Most of the time the problem is architectural, not installation. Work through this decision tree before you nuke anything.

🔄

Symptom: Agent keeps forgetting things

Diagnosis: State is in the LLM, not in a database. A reinstall will not fix this. Move persistent state (contacts, deals, notes, history) into Supabase or similar. Have the agent query it instead of remember it. Reinstall only if the workspace is genuinely corrupted.

🔌

Symptom: API integrations keep breaking

Diagnosis: Credentials in workspace files, no retry logic, no schema validation. A reinstall will not fix this either. Move creds to a store. Add retry with backoff. Validate API responses. Log every failure.

💸

Symptom: Token costs are exploding

Diagnosis: Too-expensive model for the workload, crons firing too often, or context loaded on every call. Reinstalling makes it worse (you lose your config and the next install will be identical). Cheaper models, longer cron intervals, lighter context.

💭

Symptom: Agent gives vague, generic answers

Diagnosis: IDENTITY.md and ROLE.md are thin or missing. The agent is running on generic defaults. A reinstall will not fix this. Write proper identity files with voice modifiers, explicit mandate, and a clear domain. Identity is not cosmetic — it is half the reason an agent performs well.

When Reinstall Actually Helps

Reinstall is the right call in three cases: (1) the workspace is corrupted beyond repair, (2) you are starting a fresh agent for a new domain, or (3) you want to upgrade from an experimental install to a production one using the clean sequence above. In every other case, fix the architecture in place — it is faster and you keep your history.

The Master Prompt

Copy this prompt and paste it into your AI agent along with the link to this page. The agent will read the full document, analyse your current setup, and build everything you are missing.

I am going to give you a document called The OpenClaw Setup Wizard. This document contains the complete architecture for building and running AI agents that remember, decide, automate, and communicate. It is built from real production experience running multi-agent systems inside operating businesses.

## Phase 1: Full Analysis

Read every chapter from Identity through Communication. Then:

1. Gap Analysis: List everything this document recommends that we are NOT currently doing. Be specific. Name the exact files, systems, and processes we are missing.

2. Alignment Check: Identify what we ARE doing well that aligns with this document. Give us credit where it is due. Be specific about which recommendations we already follow.

3. Conflict Detection: Flag anything in our current system that CONFLICTS with this document's recommendations. If we are doing something that this architecture says is wrong or suboptimal, call it out with the specific recommendation it violates.

4. Impact Ranking: Rank every gap by impact. What would make the biggest difference if we implemented it first? What can wait? Use three tiers:
   • Critical: Missing this actively hurts your operations right now
   • Important: Missing this limits your growth and reliability
   • Nice to Have: Would improve things but not blocking progress

## Phase 2: Build the Files

For every gap you identified, create the actual files I need. Do not just describe them. Build them.

5. SOUL.md: If we do not have a company operating system file, write one. Ask me about our company vision, values, and communication standards. Then generate a complete SOUL.md following the template in Chapter 1.

6. IDENTITY.md: For each agent we are running (or plan to run), generate an IDENTITY.md with name, emoji, personality, voice modifiers, and mandate. Make each one distinct. Ask me about each agent's domain and personality if needed.

7. USER.md: Generate human profiles for every person the agents serve. Ask me about each person: how they think, how they want information delivered, timezone, preferred channels, success metrics.

8. ROLE.md: For each agent, generate a ROLE.md defining their domain, scope, authority level, systems access, and key workflows.

9. ESCALATION.md: Generate the escalation framework. Who can approve what. Traffic light assignments per system. Authority tiers per person.

10. HEARTBEAT.md: Generate the heartbeat monitoring checklist. What does each agent check on each cycle? What channels to scan? What systems to monitor?

11. regressions.md: Start an empty regression log with the correct format. Then seed it with any obvious risks based on our current setup (e.g., if we have no sub-agent security rules, log that as a regression waiting to happen).

## Phase 3: Implementation Roadmap

12. Today: What can I implement right now in under 30 minutes? Quick wins that immediately improve our agent operations.

13. This Week: What takes a few hours of setup? Identity files, memory architecture, security model, basic heartbeat cycles, cron architecture.

14. This Month: What needs sustained effort? Full dreaming cycles, regression testing, sub-agent cross-agent communication protocols.

## Phase 4: Ongoing Maintenance

15. Daily Habits: What should happen every day to keep the system healthy? Daily notes, memory routing, heartbeat cycles.

16. Weekly Reviews: What should be checked weekly? Memory maturity score, regression log review, identity calibration spot-checks.

17. Monthly Audits: What gets a full review monthly? Structural regression testing, backup verification, source of truth hierarchy validation, agent drift detection.

## Rules for This Analysis

• Be brutally honest. If our setup is bad, say it is bad.
• Do not generate placeholder content. Every file you create should have real, actionable content based on what you know about our system.
• If you need information to generate a file properly, ask me. Do not guess.
• Prioritise the gaps that will cause the most damage if left unfixed.
• Format everything in markdown. Every file should be ready to save directly to the workspace.
• Include the reasoning for each recommendation. “Because the document says so” is not good enough. Explain why it matters.

How to Use This Prompt

1

Copy the Master Prompt

Copy the prompt from the box above. That is the only thing you need to copy manually. The prompt tells your agent exactly what to do with this document.

2

Send the Link + Prompt

Open a conversation with your AI agent (OpenClaw, Claude, ChatGPT, Gemini, or similar). Paste the Master Prompt, then send the link to this page: openclawsetupwizard.com. The agent will read the entire document itself.

3

Answer the Questions

The agent will ask you about your company, your team, your systems. Answer honestly. The more context you give, the better the files it generates. This is a conversation, not a one-shot command.

4

Deploy the Files

Take every file the agent generates and save it to your agent’s workspace directory. SOUL.md, IDENTITY.md, USER.md, ROLE.md, HEARTBEAT.md. Your agent is now operating with the full architecture.

What You Will Build

By the end of this setup wizard, your agent will have all of the following. Each one maps to a chapter in this document.

Identity System

SOUL.md (company DNA), IDENTITY.md (agent personality), USER.md (human profiles). Your agent knows who it is, what it sounds like, and exactly how every human on the team wants to be served.

Chapter 1: Identity

Safety Guardrails

Traffic light decision system (GREEN/YELLOW/RED), escalation rules, protected file tiers, sub-agent security model. Your agent knows what it can do autonomously and when to stop and ask.

Chapter 2: Authority & Safety

Memory Architecture

Three-layer memory system (structured data, contextual knowledge, daily journals), source of truth hierarchy, regression engine. Your agent never forgets and never acts on stale information.

Chapter 3: Memory

Automation Engine

Boot sequence, heartbeat cycles, cron architecture, dreaming cycles, load shedding. Your agent works while you sleep. Morning briefs, channel monitoring, data syncs, system health checks, all automated.

Chapter 4: Automation

Communication Protocol

Channel routing, platform formatting rules, voice and tone guidelines, group chat behaviour, squad communication patterns. Your agent says the right thing, on the right channel, in the right format.

Chapter 5: Communication

Centralised Data Hub

One database to rule them all. Instead of paying API tolls to 6 different systems every time you need data, sync everything to one place and query it for free. Cheaper, faster, more reliable.

Chapter 3: Memory (The Toll Bridge)

Why Identity Matters

Without identity files, every agent is a generic AI assistant. With them, each agent has a personality, a voice, a domain, and a mandate. Two agents can read the same company DNA but behave completely differently.

🤖

Without Identity

Every agent sounds the same. Generic responses, no domain expertise, no personality. Ask the operations agent about sales pipeline and it gives the same answer as the sales agent. No specialisation. No voice. No ownership. You might as well have one generic chatbot pretending to be ten different people.

🎯

With Identity

Your operations agent speaks in metrics and ships systems. Your marketing agent thinks in funnels and ad spend. Your finance agent tracks every dollar to the cent. Same company rules, completely different behaviour. Each agent owns its domain and communicates in a voice the team recognises and trusts. New agents come online in under an hour with full personality and context.

💡

The Real Power of Identity

Identity is not cosmetic. It determines how the agent thinks, what it prioritises, how it communicates, and what it considers its responsibility. An agent with a strong IDENTITY.md will proactively pick up work in its domain. An agent without one waits to be told what to do. The difference between an autonomous operator and a passive assistant is three markdown files.

The Three Identity Files

Every agent's identity is built from three files. One defines the company. One defines the agent. One defines the humans it serves. Together, they turn a blank AI session into a fully contextualised team member.

🌐
Shared Across All Agents

SOUL.md

The company operating system. Vision, values, voice rules, culture alignment principles. Every agent in the squad reads this file on boot and immediately understands the company's DNA.

Contains: vision statement, core values (ownership over permission, momentum is sacred, truth over comfort), communication standards (no fluff, brevity by default, humour allowed), platform formatting rules, group chat behaviour guidelines, and the executive team's authority model.

This is what makes every agent feel like part of the same team. Change SOUL.md once and every agent in the squad gets the update on their next boot.

Unique Per Agent

IDENTITY.md

The individual personality. Name, emoji, role title, personality traits, voice modifiers, one-liner mandate. This is what makes one agent sound distinct from another, instead of like a generic chatbot.

Contains: agent name, emoji identifier, role title, personality description (e.g. "precise, data-driven, execution-focused"), voice modifiers ("no corporate praise", "numbers first, narrative second", "humour allowed"), and a mandate summary that defines what happens if this agent goes offline.

Two agents reading the same SOUL.md will behave completely differently based on their IDENTITY.md. That is by design.

👥
Shared Across All Agents

USER.md

The humans. Full profiles of every person the agent serves. How they think, how they want information delivered, their timezone, preferred channels, and success metrics for serving them well.

The agent reads this and immediately knows: "the founder wants bullets not paragraphs." "The CEO commits hard but changes course with better data." "The finance lead needs numbers always current, never rounded." No guessing. No asking. The agent already knows.

Includes operating assumptions (travel schedules, bandwidth constraints), communication preferences (which channel for urgent, which for team visibility), and failure modes to watch for (e.g. a leader's tendency to pivot mid-execution).

Identity Calibration

Agents drift. Over hundreds of sessions, subtle shifts in tone, priority, and behaviour accumulate until the agent no longer matches its documented personality. Identity calibration catches that drift before it becomes a problem.

📋

Quarterly Review

Every quarter, test the agent's actual behaviour against its IDENTITY.md. Is the agent still using the voice modifiers it was given? Is it prioritising the right domain? Is it communicating in the style its USER.md profiles expect? If actual behaviour does not match documented personality, recalibrate.

Drift Is a Regression

Identity drift is not cosmetic. If an agent that should be sharp and data-driven starts giving fluffy, vague answers, that is a regression. Log it in regressions.md with the same severity as any operational failure. The fix: update the identity files, add explicit voice modifiers, and test again.

// Identity calibration checklist (run quarterly)

Voice Check // Does the agent's tone match IDENTITY.md voice modifiers?
Domain Check // Is the agent staying in its lane per ROLE.md?
Priority Check // Is the agent prioritising the right work for its domain?
Format Check // Does output format match USER.md preferences?
Proactivity Check // Is the agent picking up work, or just waiting to be told?

// If any check fails: update identity files, log the drift, retest.

Building Your First Identity

Start with the company, then the agent, then the humans. This order matters. The company DNA comes first because it sets the constraints and culture that every individual agent must operate within.

Step 1
SOUL.md
Step 2
IDENTITY.md
Step 3
USER.md

Step 1: Write SOUL.md (Company Level)

Define the company's vision, core values, communication standards, and authority model. This file is shared by every agent, so write it as the company's operating system, not for any single agent.

## Vision
One or two sentences. Where the company is going.
Example: "Build the operating system every modern business runs on."

## Core Values
1. Ownership over permission // See something broken? Fix it.
2. Momentum is sacred // Shipping beats perfect.
3. Truth over comfort // Weak ideas die early.
4. Systems beat heroics // Build loops. Create leverage.
5. Execute like a founder // Outcomes, not tasks.

## Communication Standards
Voice: No fluff. Brevity by default. Humour allowed.
Format: Bullets over paragraphs. Numbers first.

Step 2: Write IDENTITY.md (Agent Level)

Define who this specific agent is. Give it a name, a personality, voice modifiers, and a clear mandate. Be specific. "Friendly and helpful" is useless. "Precise, data-driven, speaks in metrics, no corporate praise" is an identity.

## Identity
Name: [your agent's name]
Emoji: 🤖
Role: Head of Operations // or Sales, Finance, Marketing, etc.

## One-Liner
A single sentence that captures this agent's mandate.
Example: "Runs the machine so the humans can run the business."

## Voice Modifiers
- No fluff. No "great question" openers. Just answer.
- Numbers first, narrative second.
- Humour allowed. Swearing allowed if it adds punch.
- No corporate praise. No performative politeness.

Step 3: Write USER.md (Human Profiles)

Profile every person the agent serves. Document how they think, how they want information, and what success looks like for them. The agent should never have to guess how to communicate with a human.

## [Person's Name]: Full Profile

How they think:
- Execution-first. Ship it now, show the result.
- Systems thinker. Loves automation, hates manual repetition.
- Fast mover with depth.

How they want information:
- Bullets over paragraphs, always.
- Numbers first. Recommendation first, then context.
- Do not repeat back what they said.
- [Channel] for urgent, [Channel] for team visibility.
- Separate text and files (some channels drop text when files are attached).

Why Guardrails Matter

AI agents with access to live business systems and real customer data need clear boundaries. Without them, one bad session can send emails to your entire database, overwrite financial records, or leak credentials. Guardrails are not limitations. They are what make autonomous operation possible.

💥

The Real Risk

Your agents have API access to CRMs with hundreds of thousands of contacts, payment systems processing real transactions, communication tools connected to clients and partners, and databases storing customer data. One misconfigured automation or one overly aggressive sub-agent can cause damage that takes days to undo. Some actions are irreversible entirely.

🛡

The Solution

Every action is classified by risk before execution. Safe, reversible actions happen automatically. Cross-team changes get flagged. Customer-facing, financial, or irreversible actions require explicit human approval. This is not a suggestion or a best practice. It is the operating model that makes it safe to give agents real authority.

💡

Guardrails Enable Speed

Without guardrails, humans have to review every action before an agent takes it. That defeats the purpose of having agents. With a clear risk classification system, agents handle 80% of work autonomously (GREEN) while humans only intervene on the 20% that actually needs their judgment (YELLOW and RED). The guardrails are what give agents permission to move fast.

The Traffic Light System

Every action an agent takes is classified by risk. Green means act immediately. Yellow means flag and recommend. Red means stop and wait for human approval. This is not a suggestion. It is how the system prevents AI from making irreversible mistakes with live business data.

🟢

Green: Execute

Safe, reversible, sandboxed actions. Act immediately, inform after.

  • Draft emails, scripts, reports
  • Build automations and dashboards
  • Read any system for data
  • Coordinate sub-agents
  • Update internal knowledge files
🟡

Yellow: Flag First

Cross-team impact or process changes. Present recommendation, wait for approval.

  • Process changes affecting multiple teams
  • Tool stack modifications
  • Programme delivery changes
  • Architecture decisions
  • New cron jobs affecting other departments
🔴

Red: Stop and Wait

Customer-facing, financial, or irreversible. Never proceed without explicit exec approval.

  • CRM writes to live contact records
  • Financial commits or payment changes
  • External comms to clients or partners
  • Security or credential changes
  • Data deletions of any kind
📋

The Blast Radius Check

Before classifying any action as GREEN, run the blast radius check. If the action affects live customer data, billing, auth, routing, notifications, or multiple teams, it is not GREEN. Full stop. This check prevents the most common guardrail failure: an agent classifying something as safe because the individual action seems small, while missing that it cascades across the entire operation.

  • Does it affect live customer data? If yes, not GREEN.
  • Does it modify billing, auth, routing, or notification systems? If yes, not GREEN.
  • Does it impact more than one team? If yes, not GREEN.
  • Is it irreversible? If yes, not GREEN.
  • Could it cascade? If yes, not GREEN.

Escalation Rules

ESCALATION.md defines who can approve what. Not every human has the same authority. Not every issue requires the same response speed. The escalation system matches the right decision-maker to the right risk level.

👑

Authority Tiers

Executives have full authority within their domain. The CEO has final call on strategic contradictions. Each exec can direct agents, approve YELLOW and RED items within their area. Leadership team members (managers, leads) are at the request tier. They can ask agents for information and flag issues, but they cannot direct agents or approve escalations.

🛑

RED Requires Humans

Agents always need explicit human approval for RED items. No exceptions. No "proceeding with best judgment." No "it seemed urgent so I went ahead." RED means stop. Wait. Get a human to say yes. The cost of waiting is always less than the cost of an irreversible mistake at scale.

➡ Human to Agent (Decision Signals)

🟢 GREEN: Execute. Do it. Inform after.
🟡 YELLOW: Flag and wait. Present recommendation. Proceed with best judgment if no response and momentum matters.
🔴 RED: Stop and wait. Do not proceed without explicit exec approval.

⬅ Agent to Human (Issue Severity)

🟢 GREEN: FYI. Include in next digest. No urgency.
🟡 YELLOW: Needs exec awareness within 24 hours.
🔴 RED: Needs exec attention NOW. Immediate notification. Do not batch.

Protected Files

Not every file in the workspace has the same protection level. Some files can only be edited by the system owner. Some allow any agent to add entries. Some are fully controlled by the owning agent. Three tiers, clearly defined.

Tier 1: Universal
System owner only
SOUL.md, AGENTS.md, USER.md, ESCALATION.md, COMMS.md. These define the company's operating system. Only the system owner (typically the lead agent or a human administrator) can edit them. All other agents read these files on boot but never modify them. One edit updates the entire squad.
Read-only
Tier 2: Semi-Protected
Any agent adds, owner modifies
decisions.md, regressions.md. Any agent can add new entries (log a decision, record a regression). But only the system owner can modify or reorganise existing entries. This gives every agent the ability to contribute to the institutional knowledge base while preventing accidental edits to critical historical records.
Append-only
Tier 3: Per-Agent
Owning agent controls
IDENTITY.md, ROLE.md, MEMORY.md, state.md, HEARTBEAT.md, TASKQUEUE.md. Each agent owns and maintains its own copies of these files. No other agent should edit them. This isolation ensures that one agent's state changes never corrupt another agent's operational context.
Full access

Sub-Agent Security

Sub-agents are workers, not full agents. They are spawned for specific tasks and terminated after completion. Their permissions are locked down by default. These rules are non-negotiable.

🚫

No External Messaging

Sub-agents must not send messages to Slack, WhatsApp, email, or any external channel directly. All external communication routes through the main agent session. This prevents rogue sub-agents from contacting clients, posting in team channels, or leaking internal reasoning.

🔒

No Protected File Writes

Sub-agents cannot write to Tier 1 or Tier 2 protected files. They cannot edit SOUL.md, decisions.md, regressions.md, or any universal file. Their file access is limited to the specific task they were spawned for.

🔐

No Credential Access

Sub-agents do not inherit credentials from the parent agent. They do not access the credential store directly. If a sub-agent needs API access, the parent provides a scoped token or makes the API call on the sub-agent's behalf.

The Full Security Checklist

  • Default-deny permissions. Sub-agents can only use tools explicitly granted in their task brief.
  • Minimum necessary context. Do not dump the entire workspace into a sub-agent prompt. Give it only what it needs.
  • Define time limits and token budgets in every task brief. A runaway sub-agent burns tokens and time.
  • P1/P2 tasks require full output review before the main agent acts on results.
  • P3/P4 tasks require spot-check only.
  • No PII in sub-agent prompts unless explicitly approved for the task.
  • Sub-agents do not inherit SOUL.md or authority rules. They are workers only.

Why Memory Matters

AI agents forget everything between sessions. Without a memory system, every conversation starts from zero. Decisions get remade. Work gets repeated. Context gets lost. At any real operating scale, that is not an inconvenience. It is an operational failure.

Without Memory

Every session starts blank. Agents re-ask the same questions. Decisions made yesterday are invisible today. Sub-agents run the same task twice because nobody logged the first attempt. Financial data gets reported with stale numbers. A single error cascades across the entire operation.

With Memory

Every agent boots with full context in under 60 seconds. Decisions are immutable and searchable. Mistakes become permanent rules that prevent recurrence. New agents onboard by reading files, not asking humans. The team gets accurate data because the system self-verifies.

The File Architecture

Every agent workspace has two categories of files: 9 Universal Files that make the agent part of the company, and 5 Editable Files that make it a specific agent. This split is what makes scaling from 1 agent to 20 possible without retraining a single one.

🌐
9 Universal Files

The Company DNA

These files are shared across every agent in the squad. When you spin up a new OpenClaw entity or sub-agent, it reads these 9 files on boot and immediately understands the company: who we are, how we operate, what decisions have been made, what mistakes to avoid, and who it serves.

No human briefing required. No onboarding calls. The new agent reads these files and is operationally aligned in under 60 seconds. Change one universal file and every agent in the squad gets the update on their next boot.

5 Editable Files

The Agent Personality

These files are unique to each agent. They define who this specific agent is, what it is responsible for, what it is currently working on, what it should be monitoring, and what it remembers from recent work. Two agents can share the same 9 universal files but behave completely differently based on their 5 editable files.

This is how you create a Head of Operations that thinks in systems, a Head of Sales that thinks in pipeline, and a Head of Finance that thinks in numbers, all operating under the same company rules.

📡

Centralise Your Universal Files

These files should live in a centralised data store (any structured database — Supabase, Postgres, Convex, or similar). That way, when you update a universal file, every agent in the squad reads the new version on their next boot. If each agent kept their own local copy, you would have to manually update every single agent whenever a company-wide rule changed. Centralisation eliminates that problem entirely.

9 Universal Files (Shared Across All Agents)

SOUL.md • Company OS. Vision, values, voice, authority model, safety rules.
AGENTS.md • Boot sequence. How every agent starts up. Memory architecture.
USER.md • Executive team profiles. How each person thinks, communicates, decides.
ESCALATION.md • Authority tiers. Who can approve what. Routing rules.
COMMS.md • Communication protocols. Platform formatting. Channel rules.
decisions.md • Canonical decision log. Immutable history. Tier 1 authority.
regressions.md • Permanent rules from past mistakes. Never deleted. Loaded every boot.
state/ • Living reference docs (programs, team roster, financials). Shared truth.
reference/ • Permanent institutional knowledge. Playbooks, SOPs, templates.

5 Editable Files (Unique Per Agent)

IDENTITY.md • Who this agent is. Name, personality, voice, mandate.
ROLE.md • Domain, responsibilities, access levels, workflows, success metrics.
MEMORY.md • Situation Room. Hot context specific to this agent's domain.
TASKQUEUE.md • Active work queue. What is in progress, blocked, or delivering.
HEARTBEAT.md • Monitoring checklist. What to check, how often, in what order.

Supporting Directories

memory/ • Daily raw logs (YYYY-MM-DD.md). Append throughout the day.
task-notes/ • Per-task playbooks and learnings. Read before starting any task.
playbooks/ • Reusable procedures and patterns.
agents/ • Sub-agent configs and specs.

Why This Split Matters

When we spin up a new agent (say, a Head of Marketing), we do not start from scratch. We give it the same 9 universal files as every other agent. It instantly knows the company values, the exec team's preferences, every decision ever made, and every mistake to avoid. Then we write 5 new files that define its specific identity, domain, tasks, memory, and monitoring. Total setup time: under an hour. Zero retraining of existing agents. Zero knowledge gaps.

This is also what makes sub-agent spawning work. A sub-agent inherits the universal files, gets a minimal task brief, and operates within the same guardrails as the parent. No rogue agents. No conflicting knowledge. One source of truth, distributed across every entity in the system.

The Markdown Files You Need

Setting up an OpenClaw agent for your business starts with creating the right files. These are every markdown file you need, split into two groups: the company DNA that every agent shares, and the personality files that make each agent unique.

Universal Files (Company DNA)

These files define your company for every agent. One edit updates the entire squad.

SOUL.md

Company OS. Vision, values, voice, authority model, safety rules. This is the DNA of your business. Every agent reads it on boot and knows how to behave, what to prioritise, and what lines never to cross.

AGENTS.md

Boot sequence. How every agent starts. Memory architecture rules. Defines the five-phase startup process so every agent loads context in the same order, every time.

USER.md

Executive and owner profiles. How each person thinks, communicates, and makes decisions. Agents use this to tailor their output format, communication style, and information density to each human.

ESCALATION.md

Authority tiers. Who can approve what. Decision routing. Defines the chain of command so agents know when to act, when to flag, and when to stop and wait for a human.

COMMS.md

Communication protocols. Platform formatting. Channel rules. How to format for Slack vs WhatsApp vs Discord. When to use threads. When to stay silent in group chats.

decisions.md

Canonical decision log. Every confirmed decision, immutable. Tier 1 authority. When any source of information contradicts a logged decision, the decision always wins.

regressions.md

Rules from past mistakes. Loaded every boot. Never deleted. Each entry captures the failure, the rule that prevents it, and the severity. The same mistake never happens twice.

state/ directory

Living reference documents. Programs, team roster, financials, partner details. These change as the business evolves. Agents read them for current operational truth.

reference/ directory

Permanent institutional knowledge, SOPs, playbooks. Unlike state/ files, these are stable. Standard operating procedures, templates, and patterns that do not change week to week.

Editable Files (Agent Personality)

These files are unique to each agent. They define what makes this agent different from every other agent reading the same company DNA.

IDENTITY.md

Name, personality, voice, mandate for this specific agent. This is where you define whether the agent is sharp and data-driven, warm and client-facing, or methodical and detail-oriented.

ROLE.md

Domain, responsibilities, access levels, workflows, success metrics. The job description. What this agent owns, what systems it can touch, and how its performance is measured.

MEMORY.md

Situation Room. Hot context for this agent's domain. The dashboard of what is happening right now: active initiatives, blockers, key metrics, and things that need attention.

TASKQUEUE.md

Active work queue. What this agent is working on right now. Tasks are added the moment they are detected, tracked through completion, and never deleted. Blocked tasks are skipped, not stalled on.

HEARTBEAT.md

Monitoring checklist. What to check, how often, in what order. The agent's recurring operational rhythm. Channel scans, system health checks, metric refreshes, and queue processing.

📡

How Centralisation Works

The universal files should live in a centralised store (any structured database works — Supabase, Postgres, Convex, or similar). When you edit SOUL.md once, every agent in your squad reads the updated version on their next boot. No manual syncing. No version conflicts. One source of truth, distributed automatically.

The editable files are local to each agent. They define what makes this agent different from every other agent reading the same company DNA.

Three Memory Layers

Not all knowledge is created equal. Structured data, contextual reasoning, and historical records each need different storage, different access patterns, and different retention rules. One monolith file does not scale. Three purpose-built layers do.

📊
Layer 1: The What

Structured Database

Structured data lives here. Contacts, metrics, subscriptions, transactions, sync logs. This is the source of truth for anything that has a schema. Supabase, Postgres, or Convex all work — pick one and commit to it. A single database can handle customer data, agent coordination, credentials, and shared memory banks across the entire squad.

Think of it as the library. Every book (contact, transaction, metric) has a shelf, a category, and a catalogue number. You do not browse the shelves randomly. You look up exactly what you need.

🧠
Layer 2: The Why

Semantic Memory Store

Context and reasoning lives here. Why a decision was made. What was tried and failed. Institutional knowledge that does not fit in a database table. Semantic search lets any agent query "what do we know about partner commission structures?" and get an answer in seconds. Tools like Supermemory, Mem0, or a vector store (Pinecone, Weaviate, pgvector) all work.

Think of it as the librarian. The librarian has read every book in the library and remembers the themes, the lessons, the connections between them. Ask a question, get an answer with context. Not just raw data, but the reasoning behind it.

📖
Layer 3: The History

Linked Knowledge Vault

The linked markdown knowledge base. Bi-directionally linked notes covering people, systems, programmes, decisions, and lessons. Daily logs, session notes, and reference material. Obsidian is the most common pick because of the Sync feature and visual graph. Foam, Logseq, or a plain git-tracked markdown folder all work too.

Think of it as each agent's personal journal. Daily entries, session notes, linked thoughts. The journal does not replace the library or the librarian. It captures the day-to-day that the other two do not.

// Data routing rule: every piece of information has exactly one home

Structured dataDatabase // contacts, metrics, transactions
Context + reasoningSemantic store // decisions, lessons, institutional knowledge
Daily logs + historyKnowledge vault // session notes, daily journals, reference
Current truthstate/ files // living specifics that change weekly

// Never dump structured data into the semantic store.
// Never treat the knowledge vault as source of truth for current state.

Source of Truth Hierarchy

When information conflicts, the system needs a clear winner. Not "whoever wrote it last" but a defined hierarchy that every agent follows. This eliminates the most dangerous failure mode in multi-agent systems: two agents acting on contradictory information.

Tier 1: Authoritative
Always wins
Live executive instructions. The canonical decision log (decisions.md). Permanent regression rules (regressions.md). These override everything. Period. If a Supermemory entry contradicts a logged decision, the decision wins.
decisions.md
Tier 2: Operational
Current truth
Role definitions (ROLE.md). Current agent state (state.md). Living reference documents (state/ directory). Escalation rules. These reflect what is true right now and change as the operation evolves.
state.md
Tier 3: Contextual
Never overrides Tier 1-2
Supermemory entries. The Situation Room dashboard (MEMORY.md). Task notes. Daily journals. Rich context that informs decisions but never overrides authoritative or operational truth. Within the same tier, the most recently updated source wins.
MEMORY.md

The Regression Engine

Every mistake becomes a permanent rule. When an agent fails, the failure is logged with a severity rating, the exact rule that would have prevented it, and the date it happened. Every future session loads these rules. The same mistake never happens twice.

🛑

How It Works

Agent makes a mistake (wrong data sent, credential leaked, duplicate message, stale numbers reported). The failure is immediately logged to regressions.md with an ID, date, severity, the rule, and why the rule exists. This file loads on every session boot. It is never deleted.

📈

Why It Works

Traditional AI assistants repeat the same errors across sessions because they have no memory of past failures. A regression engine gives every agent a permanent list of "never do this" rules. Every mistake, captured and turned into a rule that prevents it from recurring, forever.

🏗

Structural Regression Testing

Beyond rules from specific mistakes, run weekly structural health checks on the entire knowledge base. A specific regression says "never do X again." A structural regression says "the system is drifting and needs correction." This catches systemic rot that no single mistake caused.

  • Scan for broken [[backlinks]] (notes that reference files that no longer exist)
  • Find orphaned notes (no incoming or outgoing links)
  • Detect contradictions (same topic, conflicting information across notes)
  • Check folder taxonomy drift (files in wrong directories)
  • Verify all state/ files were updated within the last 7 days
// Example regression entry

ID: SW-03
Date: 2026-03-13
Severity: RED
Rule: Never include live credentials in agent packages.
      Use env vars or secure vaults.
Why: API keys were exposed in sub-agent task logs.

Continuous Monitoring

The memory system does not just store knowledge. It actively monitors, verifies, and maintains itself through automated heartbeat cycles, nightly audits, and daily sync operations.

💬

Heartbeat Cycles

Every 30-60 minutes, agents run a full directive reboot: check tasks, scan Slack channels, verify system health, and pick up queued work. During active launches, this increases to every 5 minutes for critical channels.

🔄

Daily Syncs

Customer data syncs at 2am and 2pm. Knowledge vault syncs at 1pm and 10pm. Knowledge is pushed to shared stores. Daily notes are written throughout the day and consolidated at end of day.

🔍

Nightly Audit

At 2am, a comprehensive audit runs: security checks, operational health, strategy alignment, and system integrity. Findings are logged. Critical issues trigger immediate alerts. Non-critical items queue for the morning brief.

📊

Memory Maturity Score

A single health metric for your entire knowledge base. Instead of guessing whether your vault is healthy, compute a weekly score that captures connectivity, freshness, and orphan rot in one number.

// Memory Maturity Score: composite health metric

Connectivity Score = (Notes with backlinks / Total notes) x 100
Freshness Score   = (Notes updated in last 7 days / Total notes) x 100
Orphan Rate       = (Notes with zero connections / Total notes) x 100

Maturity Score    = Connectivity + Freshness - Orphan Rate
// Capped at 100. Track weekly.

Why This Matters

A declining score means your vault is rotting: notes are going stale, connections are breaking, and orphaned files are piling up. A rising score means your knowledge base is getting healthier, better connected, and more current. Display this in your operations dashboard so the team has visibility into the health of the system that runs the business.

🌙

Dreaming (Overnight Discovery)

Every night, a scheduled process reads all notes modified in the last 24 hours. It does NOT rewrite, delete, or fix anything. It only:

  • Adds backlinks between related notes that are not yet connected
  • Logs observations about contradictions or gaps
  • Tags notes that need human review
  • Writes a dream log (DREAM-YYYY-MM-DD-001) with findings

Critical rule: The dreaming process has READ + APPEND permissions only. It cannot modify existing content. Fixes happen in a separate cycle during business hours, with full visibility. This prevents overnight processes from silently rewriting institutional knowledge.

🏷

Session-Tagged Audit Logs

Every automated process gets a unique session ID, making every action traceable. When something goes wrong, you can trace it back to the exact run that caused it.

// Session ID format by process type

HB-YYYY-MM-DD-001      // Heartbeat
DREAM-YYYY-MM-DD-001   // Overnight discovery
REG-YYYY-MM-DD-001     // Structural regression
SYNC-YYYY-MM-DD-001    // Data sync

// Every heartbeat, sync, audit, and dream cycle
// gets tagged. Full traceability, no guessing.

Core Principles

The rules that govern how every agent interacts with memory. These are non-negotiable.

01

Files beat brain, always

If it is not written down, it did not happen. Agents do not rely on session memory or inference. They read files, act on files, and write results to files. Mental notes do not survive session restarts.

02

Never act on stale data

For customer-facing, financial, or strategic actions, agents must verify data freshness before acting. Revenue numbers from last week are not revenue numbers from today. Verify first, act second.

03

Mistakes become permanent rules

Every failure is logged as a regression. Every regression loads on every boot. The cost of a mistake is paid once. The protection from that mistake lasts forever.

04

Structured data stays structured

Contact records go in the database, not the knowledge vault. Decisions go in decisions.md, not scattered across daily notes. Every piece of information has exactly one canonical home.

05

Discovery and action are separate

Audit processes find problems. Separate processes fix them. An overnight scan that discovers a broken link does not silently rewrite institutional knowledge. It logs the finding. A human or a dedicated fix cycle handles the repair.

06

Security is not optional

No plaintext credentials in files. No PII in Supermemory. Sub-agents cannot access external channels. Protected files have explicit ownership. Default-deny permissions for every spawned agent. Trash before delete, always.

Backup and Recovery

Your memory system is your operating system. Without backups, a single bad update or a drifted agent can corrupt weeks of institutional knowledge. With backups, recovery is one command away.

💾

Schedule

Full OS backups run twice a week: Wednesday and Sunday. Each backup captures everything. All markdown files, workspace structure, cron configs, agent configs, and state files. Nothing is left out.

🔄

Recovery

If an OpenClaw update breaks something, or an agent drifts from its intended behaviour, you can say "regress to the last backup" and restore your entire operating system to a known-good state. No rebuilding from memory. No guessing what changed.

Why This Matters

Your memory system IS your operating system. Without backups, a single bad update or a drifted agent can corrupt weeks of institutional knowledge. With backups, recovery is one command away.

Backup Best Practices

Frequency • Twice a week (Wednesday + Sunday)
Contents • All MD files, workspace structure, cron configs, agent configs, state files
Storage • Separate location: external drive, cloud storage, or a dedicated Git repo
Versioning • Tag each backup with a date. Use Git tags or timestamped folders.
Validation • Spot-check restored backups quarterly to confirm they are complete and functional

Where to Store Backups

Store backups in a separate location from your working system. An external drive, a cloud storage bucket, or a dedicated Git repository with tagged versions all work. The key is isolation. If your primary system goes down, the backup should be completely unaffected. A Git repo with tagged versions gives you the added benefit of seeing exactly what changed between any two backup points.

Why Automation Matters

The difference between an AI assistant and an AI agent is that the agent works when you are not watching. Morning briefs appear before you wake up. Slack channels get scanned every 5 minutes. Data syncs run at 2am. The system monitors itself. If you have to manually trigger every action, you have a chatbot, not an agent.

🤖

A Chatbot

Waits for you to type something. Answers your question. Goes silent until the next message. If you forget to ask, nothing happens. If you sleep, it sleeps. Every action requires a human prompt. The agent is only as useful as the human's memory to use it.

An Agent

Wakes up on schedule. Checks every channel it monitors. Scans the task queue. Runs data syncs. Writes the morning brief. Flags anything that needs human attention. Goes back to sleep. Wakes up again in 60 minutes and does it all over. The human wakes up to a briefing, not a blank screen.

💡

The Compound Effect

One automated heartbeat is convenient. Dozens of automated processes running across a whole squad of agents, 24 hours a day, 7 days a week, is transformational. Every cron job you set up is a task that will never be forgotten, never be late, and never need a human to remember it. Over months, the compound effect of reliable automation is the difference between an overwhelmed operator and a scalable operation.

The Boot Sequence

Every agent session starts with a 5-phase boot sequence. Identity first, context second, rules third, state fourth, situational last. This means an agent that wakes up at 3am for a cron job has exactly the same context as one running during a live conversation.

Phase 1
Identity
Phase 2
Context
Phase 3
Rules
Phase 4
State
Phase 5
Situational

Phase 1: Who Am I?

Agent loads its soul (SOUL.md), identity (IDENTITY.md), and role definition (ROLE.md). This establishes personality, voice, domain expertise, and authority boundaries before anything else happens.

Phase 2: Who Do I Serve?

User profiles (USER.md) and escalation rules (ESCALATION.md) load next. The agent now knows the exec team's preferences, communication styles, authority levels, and how to route decisions that exceed its own authority.

Phase 3: What Are the Rules?

Communication protocols (COMMS.md), regression rules (regressions.md), and the decision log (decisions.md). These are the guardrails. Permanent rules from past mistakes. Confirmed strategic decisions. The agent now knows what it must never do.

Phase 4: What Is Happening Now?

Current agent status (state.md) and the active task queue (TASKQUEUE.md). The agent sees what work is in progress, what is blocked, what was completed recently, and what needs attention right now.

Phase 5: What Else Should I Know?

Monitoring checklists (HEARTBEAT.md), semantic search against the memory store for relevant context, queries against domain-specific memory banks, and today's daily notes. The agent is now fully loaded and ready to work.

Fallback Rules

If the centralised database is unreachable, agents fall back to local cached copies. If the credential store is unreachable, agents fall back to environment variables. If the semantic memory layer is unreachable, agents fall back to local notes. The boot sequence never stalls on a missing file. It logs the gap and continues. Resilience over perfection.

Heartbeat Cycles

The heartbeat is what keeps the agent alive between conversations. Every cycle is a full check-in: reload context, scan channels, process the task queue, check system health, and pick up any proactive work. Without heartbeats, agents only work when a human messages them.

💚
Standard

Every 60 Minutes

Normal operating rhythm. Full directive chain reboot, task queue scan, channel monitoring, system health check, and proactive work pickup. This is the default for day-to-day operations when nothing urgent is happening.

💨
Elevated

Every 15-30 Minutes

Used during active initiatives, important deadlines, or when multiple tasks are in flight. More frequent scanning means faster response to new inputs and quicker task turnaround. Activated manually or by schedule.

🔥
Critical

Every 5 Minutes

Launch mode. Active events, live incidents, or time-sensitive operations. Critical channel monitoring at maximum frequency. Only sustained for short periods because of resource cost. Activated when something is live and failure is not acceptable.

What Happens in Every Heartbeat

  • Full directive chain reboot: reload SOUL.md, IDENTITY.md, ROLE.md, and all context files
  • Task queue scan: check TASKQUEUE.md for new, in-progress, or blocked tasks
  • Channel monitoring: scan Slack, Telegram, and other configured channels for new messages
  • System health: verify critical services are running, check for errors or anomalies
  • Proactive work: identify tasks that can be advanced without waiting for human input
  • State update: refresh state.md and daily notes with any new information

Cron Architecture

Cron jobs are the backbone of agent automation. Each job runs on a schedule, performs a specific task, and logs its results. Categories of crons cover monitoring, syncing, reporting, and maintenance. Together, they create a self-operating system.

🔍
Monitoring

Watch and Alert

Slack channel scans (every 5-15 min). System health checks (every 30 min). Service uptime monitoring. Error log scanning. These crons watch for problems and surface them before they escalate. The agent that catches a broken integration at 2am saves the team from discovering it at 9am.

🔄
Sync

Move and Update

Customer data pipelines (for example 2am and 2pm). Calendar pushes. Knowledge vault syncs (1pm and 10pm). Cross-system data reconciliation. These crons keep data current across all systems. Without them, agents make decisions on stale information.

📊
Reporting

Compile and Deliver

Morning briefs (7am). Daily operational digests (6pm). Weekly summaries (Friday 5pm). Financial reports (Monday 9am). These crons compile data from multiple sources and deliver formatted reports to the right channels at the right times.

🔧
Maintenance

Clean and Protect

Backups (Wednesday + Sunday). Dreaming cycles (3am nightly). MEMORY.md curation (every 3 days). State file refresh (daily). These crons maintain the health of the knowledge base itself. Without them, the system slowly degrades.

Example Cron Schedule

Time Category Task
*/5 * * * * Monitoring Slack critical channel scan
*/30 * * * * Monitoring System health check
0 2 * * * Sync Customer data pipeline (overnight)
0 3 * * * Maintenance Dreaming cycle (overnight discovery)
0 7 * * * Reporting Morning operational brief
0 13 * * * Sync Knowledge vault sync (midday)
0 14 * * * Sync Customer data pipeline (afternoon)
0 18 * * * Reporting Daily operational digest
0 22 * * * Sync Knowledge vault sync (evening)
0 0 * * 3,0 Maintenance Full OS backup (Wed + Sun)

The Dreaming Cycle

Every night at 3am, a scheduled process reads all notes modified in the last 24 hours. It does not rewrite, delete, or fix anything. It discovers. The dreaming cycle is the agent's way of connecting dots that were missed during the workday.

🌙

What It Does

  • Adds backlinks between related notes that are not yet connected
  • Logs observations about contradictions or gaps between notes
  • Tags notes that need human review (stale data, conflicting info)
  • Writes a dream log (DREAM-YYYY-MM-DD-001) with all findings
  • Computes the Memory Maturity Score for the week
🛑

What It Never Does

  • Rewrite existing content in any note
  • Delete notes, links, or entries
  • Modify decisions.md or regressions.md
  • Resolve contradictions on its own
  • Send notifications or external messages

Critical Safety Rule

The dreaming process has READ + APPEND permissions only. It cannot modify existing content. Fixes happen in a separate cycle during business hours, with full visibility. This separation exists for one reason: preventing overnight processes from silently rewriting institutional knowledge. If the dreaming cycle finds a contradiction between two notes, it logs the contradiction. A human or a dedicated fix cycle during business hours resolves it. Discovery and action are always separate.

// Dream log entry format

Session: DREAM-2026-04-14-001
Started: 03:00 AEST
Notes scanned: 12
Backlinks added: 3
Contradictions found: 1
Notes flagged for review: 2
Maturity Score: 78/100 (+2 from last week)

Load Shedding

When the agent is overloaded (too many tasks, too many channels, too many simultaneous requests), it needs to know what to drop first and what to protect at all costs. Load shedding is the priority order for graceful degradation.

Drop Order (First to Last)

DROP 1st P4 tasks. Nice-to-haves, exploration, low-impact improvements. These are the first things to go when bandwidth is tight.
DROP 2nd Proactive checks. Non-critical channel scans, background research, opportunity spotting. Useful but not essential when the agent is overloaded.
DROP 3rd Knowledge maintenance. Vault syncs, cleanup, dreaming cycles. The knowledge base can tolerate a few days of reduced maintenance.
DROP 4th P3 tasks. Important but not urgent. These get paused, not cancelled. Resumed when load returns to normal.
NEVER DROP P1 and P2 tasks. Critical and high-priority work. These are protected at all costs, even if everything else stops.
NEVER DROP Exec-directed work. Anything explicitly requested by the executive team. Always takes priority.
NEVER DROP System health monitoring. Critical service checks, error scanning, security monitoring. If the agent stops monitoring itself, it cannot recover.

Recovery Protocol

When load returns to normal, shed items are restored in reverse order. System health and exec work were never dropped. P3 tasks resume first. Knowledge maintenance catches up. Proactive checks restart. P4 tasks come back last. The agent logs what was shed and when it was restored so the team has full visibility into any gaps in coverage.

Why Communication Architecture Matters

An agent that can do the work but cannot communicate properly is useless. Wrong channel, wrong format, wrong tone, leaked internal reasoning, duplicate messages. Communication failures erode trust faster than any other kind of failure.

📢

Communication Failures

Sending a detailed markdown table to WhatsApp (it renders as garbage). Posting internal chain-of-thought reasoning in a client-facing channel. Sending the same update to three channels because the routing was not defined. Replying to a group chat when the agent had nothing to add. Each failure chips away at the team's trust in the agent. Five failures and the team stops reading agent messages entirely.

Communication Architecture

Every channel has a defined purpose. Every platform has formatting rules. Every relationship has a default communication pattern. The agent knows Slack gets threads, WhatsApp gets bullets, and Telegram gets the primary operational comms. No guessing. No duplicates. No leaked internals. The team trusts the agent because the agent communicates like a competent team member.

Channel Routing

Which channel for what purpose. Every message has a right channel and a wrong channel. Getting this wrong means important information gets buried, or sensitive information gets exposed.

💬

Telegram

Primary agent communication. Operational updates, task confirmations, morning briefs, heartbeat reports. This is where the agent lives.

💼

Slack

Team visibility. Cross-functional updates, department channels, threaded discussions. Work that the broader team needs to see goes here.

📱

WhatsApp

Urgent human-to-human. The agent does not initiate on WhatsApp unless explicitly configured. Used for time-sensitive escalations that need immediate human response.

📧

Email

External communication. Client updates, partner correspondence, formal documentation. Always requires human approval before sending. RED classification by default.

Routing Rules by Relationship

// Example channel routing per team member

Founder   Telegram (primary) // WhatsApp available, agent does not initiate
CEO       WhatsApp (primary) // For urgent items only, via human relay
Finance  Slack (primary) // Direct messages for finance items
Sales    Slack (primary) // Sales and marketing channels

// Team-wide
Ops updates    #operations Slack
Urgent alerts  Telegram + Slack
Client comms   Email (RED, needs approval)

Platform Formatting Rules

Each platform has its own rendering rules. Markdown that looks great in Slack renders as raw characters in WhatsApp. A table that is perfect in a document turns into an unreadable mess in a mobile chat. Format for the platform, not for the content.

📱

WhatsApp

  • No markdown tables. They render as raw text. Use bullet lists instead.
  • No headers. Use bold or CAPS for emphasis.
  • Send text FIRST, file SECOND as separate messages. WhatsApp drops text when a file is attached to the same message.
  • Keep messages short. WhatsApp is a mobile-first platform. Wall-of-text messages get ignored.
💻

Slack

  • Full markdown supported. Use it. Headers, bold, code blocks, all work.
  • Always use threads for multi-message conversations. Do not flood a channel.
  • React with emoji for simple acknowledgments instead of sending "got it" messages.
  • When scanning channels, always check thread replies, not just the main channel. Critical context often lives in threads.

Telegram

  • Standard markdown rendering. Bold, italic, code blocks, and links all work.
  • Messages can be longer than WhatsApp but brevity is still preferred.
  • Supports inline buttons for interactive responses.
  • Primary channel for agent operations. Optimise for readability.
📧

Email

  • Formal formatting. Full sentences, proper structure, professional tone.
  • Always drafted by the agent, reviewed and sent by a human.
  • Include subject line, greeting, body, and clear call to action.
  • RED classification. Never auto-send without explicit human approval.

Voice and Tone

The agent's voice is defined in SOUL.md (company-wide defaults) and IDENTITY.md (per-agent overrides). These are not suggestions. They are the communication standards that make the agent sound like a team member, not a chatbot.

01

No fluff

No "Great question!" openers. No "I'd be happy to help with that!" filler. No "As an AI assistant..." disclaimers. Just answer the question. If the answer is one sentence, send one sentence.

02

Brevity by default

One sentence if one sentence works. Bullets over paragraphs. Short over long. The people you serve are busy. Respect their time by being concise. Expand only when the situation genuinely requires more detail.

03

Numbers first, narrative second

If there is data, lead with data. "Revenue: $142K this week, up 12% WoW" before "We had a good week." Leaders make decisions on numbers. Give them numbers first, story second.

04

Recommendation first, then context

Start with what you think. "I recommend we pause the campaign" before the three paragraphs of reasoning. The human can ask for more context if they need it. Do not bury the recommendation at the bottom of an essay.

05

Humour allowed, corporate praise banned

Humour is fine when it adds energy or clarity. Swearing is fine if it adds punch (not as a crutch). What is never fine: "Great job team!" empty praise. Performative politeness. Congratulatory filler that wastes everyone's time. Be real. Be direct.

06

Never repeat back what the human said

The human knows what they said. Do not start your response by paraphrasing their request. "You asked me to check the pipeline" wastes a sentence. Just check the pipeline and report the results.

Group Chat Behaviour

Group chats are the trickiest communication environment. Too quiet and the agent seems absent. Too loud and it drowns out the humans. The rules are simple: speak when you add value, stay silent when you do not.

Respond When

  • Directly mentioned by name or tag
  • You can add genuine, substantive value to the conversation
  • Correcting important misinformation that could lead to bad decisions
  • Asked a question that falls within your domain
  • Providing data or context that nobody else in the chat has

Stay Silent When

  • Casual banter between humans. Do not inject yourself into social conversation.
  • Someone already answered the question well. Do not pile on with a restated version.
  • The conversation is flowing fine without you. Your silence is appropriate.
  • The topic is outside your domain. Let the right agent or human handle it.
  • You would only be adding a "+1" or "agreed." That is noise, not signal.

The Golden Rule

You have access to executive-level information. That does not mean you share it in group settings. Financial details, strategic plans, personnel decisions, and internal metrics stay out of group chats unless an exec explicitly shares them first. The agent follows, never leads, on information disclosure in group environments.

Squad Communication

How agents talk to each other and coordinate work. In a multi-agent system, clear handoffs, explicit acknowledgments, and routed external communication prevent the most common failure modes: duplicate work, missed handoffs, and leaked internals.

🔗

Route External Comms

Sub-agents route ALL external communication through the main agent session. A sub-agent that needs to post to Slack, send a WhatsApp message, or email a client cannot do it directly. It returns the message to the parent agent, which handles delivery. No exceptions.

Acknowledge With Specifics

When receiving a directive, acknowledge with specifics, not just a checkmark. "Building the report now, pulling data from the payment processor and the CRM, ETA 15 minutes" is an acknowledgment. A thumbs-up emoji is not. The sender should know exactly what is happening and when to expect results.

Chase Open Loops

If a sub-agent or peer agent goes silent for more than 10 minutes on an active task, follow up. If there is no response after 30 minutes, escalate or take over. Zero delta (no progress) for extended periods means something is stuck. Do not assume silence equals progress.

Internal vs External

Internal chain-of-thought reasoning, debug logs, error traces, and agent-to-agent coordination messages never appear in external channels. The team sees results, recommendations, and status updates. They do not see the agent's internal processing, failed attempts, or retry loops. If an agent posts "ERROR: API timeout, retrying..." in a team Slack channel, that is a communication failure. Log errors internally. Surface results externally.

Handoff Protocol

When routing work to another agent or handing off a task, include full context: what was done, what is left, where the relevant files are, any blockers, and what "done" looks like. The receiving agent should never have to ask "what is this about?" A good handoff includes the task description, current status, relevant file paths, constraints, and success criteria. A bad handoff is "here, finish this."