Discussion about this post

User's avatar
Phil Windley's avatar

This is a great take. You call on promise theory as an answer, but I don't think it's enough. Promises are declarations of intent. In a system based just on promises, you'd have to trust the other agent to keep promises (and then give them a bad reputation if they don't). This is a good start, but it's not sufficient. I think you need three things:

1. policies that set hard boundaries for the agent

2. promises derived from policies

3. reputation for agents.

This would allow you to delegate a capability (say a spending limit) using something like a verifiable credential, then have some confidence that it would be followed. Not as a prompt, but a deterministic matter of policy. Things could still go wrong (fraud) but at least it wouldn't be from carelessness.

This creates a community of agents. I wrote about this idea for IoT things here: https://www.windley.com/archives/2015/07/social_things_trustworthy_spaces_and_the_internet_of_things.shtml and the role of promise theory might play in such communities here: https://www.windley.com/archives/2015/12/promises_and_communities_of_things.shtml

Pawel Jozefiak's avatar

"You can't prompt-engineer your way out of financial guardrails" - this needs to be on a t-shirt. I learned this the hard way when my agent hit API cost limits because the only guardrail was a system prompt saying "be cost-conscious."

The neurosymbolic integration argument is compelling. My agent's rules evolved from prose instructions to structured config files to actual enforcement code: https://thoughts.jock.pl/p/wiz-1-5-ai-agent-dashboard-native-app-2026

The expert systems revival angle is fascinating. We spent decades building rule engines, abandoned them for neural nets, and now we need both. History rhymes.

1 more comment...

No posts

Ready for more?