← Back to blog

The Anatomy of a Personal AI That Actually Knows You

A couple of weeks ago I wrote about the AI assistant I built that runs my life and my wife's. That post was the what. This one is the how.

Prepping for a demo at an AI meet-up in Singapore made me realise I'd skipped the most useful part of the story: the shape of the thing. Why it actually works instead of just demoing well. Why I trust it with my finances — and no, before you ask, it doesn't have a login to a single bank or brokerage. Why my wife uses hers without ever opening the web app.

If you skipped the first post: Max is the assistant, Ella is my wife's, both live in our chats, both run on a small server, both can do real things in the world only after we approve.

Memory is what makes it yours

A few weeks ago I uploaded a credit card statement screenshot and asked Max to reconcile it. He extracted the transactions, compared them to what was already in the system, and imported the new ones. Beautiful flow. Except he imported a $0.10 pending bus fare — a hold, not a real charge — that the bank cleared an hour later. Now I had a duplicate.

The fix wasn't code. It was a single sentence — if a row is labelled "Pending", it's a hold; don't import it; the bank will settle it tomorrow — written into Max's memory store, pinned high-confidence so it would never decay. The next reconciliation went clean. The one after that, too. The fix to a bug became a permanent piece of what Max knows about how I bank.

This is the part that makes a personal AI feel like yours, and the part that took longest to get right.

The first version of the agent had no memory. Every conversation started fresh. Max could read tomorrow's calendar but couldn't tell which entries were standing rituals I didn't need to prep for, which needed transport sorted the night before, and which Carmen was joining.

So I added a memory layer. It's a facts store with confidence scores and a subject scope — mine, Carmen's, or joint. A small pinned kernel never decays. Low-confidence facts fade unless something reinforces them. Important PDFs get parsed in with citations back to the page they came from, so when Max tells me something specific he can show me where he learned it.

The thing that surprised me most: the store stays small. Mine has 162 facts in it today. Not thousands. The point isn't volume; it's signal. Each fact carries a category, a confidence score, a source citation, and an audit trail of the observations that built it. It stays that small because a quiet background agent called Curator runs alongside it — surfacing conflicting facts, merging duplicates, retiring anything that hasn't been re-grounded inside its decay window, and routing the open gaps to either "ask the user", "re-read the source document", or "accept unknown". Memory grows by use, not by ingestion.

The pending-bus-fare rule above is one entry in the corrections category — the operational lessons that come from real use. Another that's saved me money: annual obligations that bypass the monthly standing orders need their own reminders a month ahead; the once-a-year ones are how money slips through. Each one came from a moment where Max got something wrong, I said "no, actually—", and the lesson got written down — usually by Max himself, into the facts store, scoped and categorised so it surfaces next time the same situation comes up.

This is half of what building with AI in production really is: the running notebook of "things you should have known and didn't", carried from one conversation to the next.

The spine: a CLI, not an LLM

Memory only works because the data underneath it is honest. That's what the CLI gives me.

The agent doesn't talk to my data. It runs commands.

Every module is exposed as a typed command-line tool — a CLI — called pa. When I tell Max "how much did we spend on travel this month?", what actually happens is he plans a command, runs pa wealth transactions export --from 2026-05-01 --category travel, parses the result, and answers me.

It feels like a small thing. It's not. A typed CLI gives Max two things he can't fake:

  • Validated inputs and honest outputs. He can't pass a name where the system expects an ID, can't typo a column name into a silent failure, and gets structured data back instead of prose he has to invent from.
  • A natural unit of permission. Reads just run. Writes to my own data run too (everything's soft-deleted, so I can roll back). Writes to the outside world — emails, calendar invites, money moves — queue for my approval.

A nice side effect: I share the surface. I open a terminal and run the exact same commands when I need to. Max and I have the same hands.

Modules are single files

The other choice that paid off: every module is one file.

There are 18 of them today: calendar, email, contacts, notes, todos, finance (the largest), travel, health, places, activities, documents, saved links, blog, chat search, and a few internal ones. Each one lives at a path like lib/plugins/modules/<name>.ts and declares four things in the same file: the tool schemas the model can call, a few sentences telling the model when to use them, a reference to the handler that runs them, and the plumbing (a route, a nav icon, an action type) if the module has a UI.

Modules are cheap because they're isolated. Schemas, handlers, prompts, nav entry — all colocated. The blast radius of a new module is the file it lives in.

That changes the question I ask before building one. Not "is this worth the effort?" but "do I want Max to be able to do this?" Most experiments turn into things I actually use; some sit unused until I delete them. Either way, the cost of trying is low enough that I keep trying — which is the whole point of treating the agent as a platform instead of a product.

This is also the cleanest way to say how this project differs from the agent frameworks people are adopting right now. OpenClaw is the popular one at the moment — install it, give it your accounts, layer skills on top. The agent is the product; what changes is what you plug in. Mine isn't a product. Every module is a one-off shaped by a specific thing I wanted to be able to ask Max. The data model is mine, not generic. Which is also why I can't really hand it to you whole — the interesting parts are the parts you'd build yourself.

Safety is the substrate, not a feature

I'll tell two stories on myself.

The first you've heard: the day I gave Max email access, he tried to reply to spam. I added an approval queue the same day. Every write to the outside world — emails, calendar invites, money moves, chat messages — lands in a pending state. I tap approve. Nothing leaves the house without me.

The second is more recent. The agent runs inside an ephemeral Docker container that's spun up per message, runs Claude with full tool access, and is killed when the reply is sent. For months, those containers held a real database token. One leaked log line, one compromised image — and the whole game was over.

I changed it. The containers now hold only an opaque token — a meaningless string that authenticates against my own API. The API resolves it into the actual database credential for that user, in memory, never leaving the server. Add an allowlist on top so the opaque token only works from my server's address. Even a full container breach hands an attacker something that's useless from anywhere else.

Boring engineering. The kind that doesn't demo well. The kind that lets me sleep.

What's next

A few directions I'm chewing on:

  • Tighter feedback on the morning brief. It's good. I want it to know when something landed weird and adjust without me asking.
  • More modules. Recipes. A reading list with proper highlights. Maybe a media tracker.
  • Family-of-agents patterns. Max and Ella share a database — when I move money for the family, Carmen sees the action queue too. Underneath each of them there's already a layer of specialised personas — wealth coach, travel planner, memory curator — that hand off to each other. That's the next post.
  • Opening more of it. The pieces that aren't personal — the modular plugin pattern, the CLI-as-spine architecture, the opaque-token pattern, the layered memory model — are reusable. I'm slowly carving them out.

The point

None of these choices are clever on their own. They just compound. The memory layer made him feel like mine and not a clever stranger who'd read my files. The CLI made the model reliable. The plugin pattern made new domains cheap. The approval queue made me trust the writes. The opaque token made me trust the whole thing on a server.

If I were starting again today, I'd build the CLI, the approval queue, and the memory layer before I picked the model.


If you're building something similar — or thinking about it — I'd love to compare notes.

Thanks for reading. Find me on LinkedIn or Threads.