Mapping the Domain of a Personal AI Agent

In the wake of OpenClaw and the proliferation of personal agent implementations, I used formal ontology engineering to map the shared architecture: 115 classes across 9 modules.

The Missing Map

OpenClaw, Claude Code, custom builds on LangGraph and CrewAI: personal AI agents are proliferating. Every implementation makes the same design choices (sessions, memory tiers, tool invocations, permission policies) then invents its own vocabulary.

A “session” in one system is a “thread” in another. “Memory” means three different things depending on who you ask. “Tool use” might refer to function calling, MCP connections, or both. The architecture is converging, but the language is not.

I wanted to see what the shared structure looks like. Not the code, but the domain. What concepts does every implementation assume? What relationships hold between those concepts? Where do the real architectural boundaries fall?

To answer these questions, I turned to formal ontology engineering, the same methodology used for biomedical knowledge bases, industrial standards, and the semantic web. The result is the Personal Agent Ontology (PAO): 115 classes, 173 properties, and 75 named individuals organized into 9 modules. The ontology draws from academic research (cognitive architectures, BDI models, dialog act theory) and real implementations (Claude Code’s session model, Letta’s tiered memory, MCP’s capability discovery).

The full ontology is available as browsable HTML documentation and as OWL 2 DL in Turtle, JSON-LD, and OWL/XML.

This post walks through what the ontology covers.

What Is a Personal Agent?

Start with the simplest question: what is a personal AI agent? Most implementations treat this as obvious and move on. But even here, formal modeling forces you to make hidden structure explicit.

PAO actor and identity class hierarchy showing Agent, AIAgent, HumanUser, SubAgent, and their relationships to Persona, AgentRole, and Organization

An Agent is the base class. It splits into AIAgent and HumanUser, not because the distinction is surprising, but because the properties differ. An AI agent has a Persona (system prompt, behavioral configuration), belongs to a ModelDeployment, and can spawn a SubAgent. A human user has none of these.

SubAgent is a subclass of AIAgent, not a separate type. A SubAgent has every property an agent has, plus a delegation relationship to its parent: its own session, memory, and tool access. This matters for systems like Claude Code, where a sub-agent spun up for a background task needs the same architectural scaffolding as the primary agent.

Most implementations leave the identity layer implicit: AgentRole (the same agent can act as researcher, coder, or reviewer), Organization (agents belong to tenants), and Persona (the behavioral configuration that makes an agent act like “a helpful assistant” or “a senior engineer”). All three exist in every production agent system. Architects rarely diagram them.

PAO aligns every top-level class to BFO 2020 (ISO 21838-2), the same upper ontology used by the OBO Foundry’s biomedical ontologies. An AIAgent is a Generically Dependent Continuant, an information entity that depends on some physical substrate (a server, a process) for its existence. A HumanUser is an Object. The distinction is academic until you reason about deployment migration: the agent persists across servers because it is information, not matter.

The Conversation Stack

Every agent system models conversation. Most stop at “a list of messages.” Modeling conversation formally exposes five distinct layers, each with its own architectural concerns.

PAO conversation stack from Conversation down through Session, Turn, Message, ContentBlock, with ToolInvocation and ContextWindow branches

A Conversation spans multiple Sessions. A session is a bounded period of interaction: open a chat window, that is a session. Close it, come back tomorrow, that is a new session in the same conversation. Each session contains Turns, each turn contains a Message, and each message breaks down into ContentBlocks (text, code, images, tool calls).

This decomposition carries architectural weight. Each layer holds distinct state:

  • Conversation holds the long-running thread identity and the participants.
  • Session tracks status transitions (active, suspended, completed), owns the ContextWindow, and records CompactionEvents: what happened when the context window filled up and the system had to summarize or discard earlier turns.
  • Turn records who spoke and links to any ToolInvocations that occurred.
  • ToolInvocation connects to a ToolDefinition, captures inputs and outputs through a ToolResult, and groups related calls into a ToolInvocationGroup.

Context overflow is the most interesting piece. In PAO, ContextWindow is a first-class entity with token capacity, current usage, and a CompactionDisposition (the policy that determines what happens when the window fills). A CompactionEvent records that tokens were reclaimed, how many, and what was lost. Most agent architectures bury context overflow as an implementation detail. PAO elevates it to an architectural concern. Memory, tool tracing, and conversation continuity all degrade when context compacts.

The conversation module also includes a pragmatics layer drawn from DIT++ dialog act theory: DialogAct, CommunicativeFunction, CommonGround, and GroundingAct. These model why an utterance was made (requesting information, confirming understanding, changing topic), not just what was said. This layer matters for agents that need to track mutual understanding. Did the user actually acknowledge that the agent’s plan is correct, or did they just say “ok”?

Channels add another dimension. The same agent might interact through a CLI terminal, a Discord server, a web chat widget, an email thread, or a REST API. PAO models this through CommunicationChannel, which links to both sessions and individual messages. A ChannelType enumeration captures the six media that current implementations support: CLI, Messaging, WebChat, APIChannel, VoiceChannel, and EmailChannel. The enumeration is closed (via owl:oneOf), which means a reasoner can verify that every channel in the system has a recognized type. Governance depends on this distinction: a permission policy might allow tool execution through a CLI session but block it through an email-triggered agent. The channel model gives that policy something concrete to bind to.

Memory Is Not a Database

Agent memory looks deceptively simple from the outside: store things, retrieve things. Formal modeling exposes an architecture closer to cognitive science than to database engineering.

PAO memory tier architecture showing WorkingMemory, EpisodicMemory, SemanticMemory, and ProceduralMemory with memory operations flowing between them

Letta (formerly MemGPT) was a major influence on PAO’s memory model. Their core insight, drawn from the MemGPT paper, is that agent memory should work like an operating system’s virtual memory: the LLM gets a fixed context window (analogous to RAM), and a tiered storage system pages data in and out. The LLM itself manages the paging. It decides what to store, what to retrieve, and what to evict, using designated tool calls rather than an external controller.

Letta implements this through four tiers: core memory (labeled text blocks always present in context), a message buffer (FIFO queue with recursive summarization), recall memory (full searchable conversation history), and archival memory (vector-indexed long-term knowledge). The architecture makes memory a structural concern rather than an afterthought on top of a chat interface.

PAO generalizes this pattern into four memory tiers, drawing also from the CoALA framework and LangMem’s cognitive science framing:

  • WorkingMemory: the agent’s current context. Volatile, bounded by the context window, directly accessible. This is where the active conversation lives.
  • EpisodicMemory: records of specific interactions. An Episode captures what happened in a particular session: the turns, the tools used, the outcomes. Episodic memory answers “what did we do last Tuesday?”
  • SemanticMemory: general knowledge extracted from episodes. A Claim is a proposition the agent holds with some confidence: “the user prefers dark mode,” “this API requires authentication.” Claims persist across sessions and carry provenance.
  • ProceduralMemory: how to do things. Learned patterns, tool usage sequences, workflow templates. The least-modeled tier in current implementations, but architecturally distinct from declarative knowledge.

The operations between tiers matter more than the tiers themselves.

Encoding moves information from working memory into episodic or semantic storage. Retrieval pulls it back. Consolidation transforms episodic memories into semantic knowledge, extracting a general preference from three separate conversations where the user expressed it. Forgetting removes items, either by policy (retention periods) or by request (privacy-aware deletion). Rehearsal strengthens items the agent accesses frequently.

Each operation is a first-class process in the ontology. This matters because every operation has provenance. When a Claim lives in semantic memory, PAO tracks where it came from through PROV-O derivation chains: which episode, which turn, which message produced the original evidence. If a user asks to delete personal information, the system can follow provenance chains to find every derived memory item: not just the original statement, but every claim consolidated from it.

The memory module also handles multi-agent scenarios. A SharedMemoryArtifact is a memory item visible to multiple agents, and a MemoryWriteConflict records what happens when two agents try to update the same item. Letta encountered this directly with their shared memory blocks, where memory_insert is safe for concurrent writes but memory_replace can cause conflicts.

Memory remains the hardest unsolved problem in agent architecture. Every framework tackles it differently: Letta with self-directed virtual context, Mem0 with automatic fact extraction into hybrid vector and graph stores, Zep with a temporal knowledge graph engine. No two implementations agree on what “remembering” means, let alone how to implement it. The write-back problem is especially acute: allowing an agent to write “facts” into long-term memory without rigorous validation creates self-reinforcing errors. The ontology maps the full design space (tiers, operations, provenance, governance) rather than solving this, so that builders can see the choices they are making and the ones they are ignoring.

The Full Map

The three modules above (identity, conversation, and memory) form the core that every personal agent needs. PAO maps six additional modules covering goals, governance, integration, error recovery, model identity, and scheduling.

PAO module dependency map showing all 9 modules and their relationships

Goals, Plans & Tasks borrows from the Belief-Desire-Intention model in philosophy of mind. An agent holds Beliefs (what it thinks is true), Desires (what it wants), and Intentions (what it commits to doing). Goals decompose into Plans, plans contain Tasks with dependencies and status. Deliberation is modeled explicitly: the act of choosing which goals to pursue and which plans to adopt. BDI gives agents a vocabulary for why they act, not just what they do.

Governance & Safety treats permissions and policies as structural components. A PermissionPolicy governs what tools an agent can use. A SafetyConstraint defines non-negotiable behavioral limits. ConsentRecord and RetentionPolicy handle privacy. AuditLog and AuditEntry record every authorization decision. The ontology aligns to ODRL (the W3C permissions vocabulary) for policy representation. Governance is architecture; PAO treats it as such.

External Services & Integration covers how agents connect to the outside world. An ExternalService exposes capabilities (tool capabilities, resource capabilities, prompt capabilities) discovered through CapabilityDiscoveryEvent. This module aligns closely with the Model Context Protocol, modeling the same connection lifecycle: discovery, authentication, invocation, disconnection.

Things go wrong. Error Recovery & Observability classifies failures through ErrorRecoveryEvent (timeout, authentication, rate limiting) and tracks recovery strategies: RetryAttempt, ReplanEvent, RollbackEvent. Checkpoint and CheckpointDecision capture the agent’s state before risky operations. OperationalMetric and MetricObservation provide the observability hooks.

Model Identity tracks which foundation model, from which provider, through which deployment, produced each conversation turn. GenerationConfiguration captures temperature, top-p, and max tokens: the parameters that affect output quality.

Scheduling & Automation handles recurring and triggered agent tasks. A Schedule binds a RecurrencePattern to an action. Triggers fire on cron expressions, intervals, or external events. ScheduledExecution tracks each run’s outcome. ConcurrencyPolicy determines what happens when a new execution overlaps with one still running.

Why Formalize?

The ontology is descriptive, not prescriptive. It maps what the domain contains, not how to build an agent.

That matters because the domain is stabilizing. Across OpenClaw, Claude Code, Letta, and a dozen other implementations, the same concepts keep appearing: tiered memory, tool invocation lifecycles, context compaction, permission policies, provenance chains. The vocabulary differs; the architecture converges.

PAO makes the convergence visible. It gives agent builders a shared map, not to constrain design but to clarify it. When you model your agent’s memory system, you can check whether you have accounted for consolidation and forgetting, not just storage and retrieval. When you design governance, you can verify that every tool invocation passes through an authorization decision. When you handle context overflow, you can stop treating it as a silent failure and model it as an architectural event.

Formal ontology also earns its keep through automated reasoning. OWL 2 DL supports disjointness axioms: PAO declares 12 AllDisjointClasses groups and 4 DisjointUnion axioms, which means a reasoner can prove that no individual can be both a WorkingMemory and an EpisodicMemory, or both an AIAgent and a HumanUser. If a new class violates these constraints, the reasoner catches it before any code is written.

The current ontology has zero unsatisfiable classes, meaning every class can in principle have instances without contradiction. On top of the OWL layer, 60 SHACL shapes validate structural constraints (required properties, cardinality, value ranges) that OWL alone cannot express. And 128 competency questions, formalized as SPARQL queries, serve as an automated test suite: 1,332 individual test assertions, all passing. This is the difference between a glossary and an ontology. A glossary defines terms. An ontology lets a machine verify that those definitions are logically consistent, structurally complete, and free of hidden contradictions.

The ontology contains 115 classes, 138 object properties, 35 data properties, 60 SHACL validation shapes, and 128 competency questions formalized as SPARQL queries. Every class aligns to BFO 2020. Every class has a formal definition. Every competency question has a passing test.

Browse the full documentation at /pao/, or explore the source on GitHub.


Built with ROBOT, oaklib, and eight custom ontology engineering skills in Claude Code.