mirror of
https://gitea.com/mcereda/oam.git
synced 2026-03-02 15:04:25 +00:00
224 lines
12 KiB
Markdown
224 lines
12 KiB
Markdown
# AI agents
|
||
|
||
AI-enabled systems or applications capable of _autonomously_ performing tasks of various complexity levels by designing
|
||
workflows and using the tools made available to them.
|
||
|
||
1. [TL;DR](#tldr)
|
||
1. [Skills](#skills)
|
||
1. [Concerns](#concerns)
|
||
1. [How much context is too much?](#how-much-context-is-too-much)
|
||
1. [Security](#security)
|
||
1. [Prompt injection](#prompt-injection)
|
||
1. [Going awry](#going-awry)
|
||
1. [Further readings](#further-readings)
|
||
1. [Sources](#sources)
|
||
|
||
## TL;DR
|
||
|
||
AI agents run [LLMs][lms / llms] _**in [ReAct loops][lms / reasoning]**_ to:
|
||
|
||
1. _Perceive_: comprehend inputs (user prompts or other inputs).
|
||
1. _Reason_: design their own workflow accordingly.
|
||
1. _Act_: utilize the tools available to them to execute tasks from the design.
|
||
1. \[eventually] _Observe_: analyze results.
|
||
|
||
```mermaid
|
||
stateDiagram-v2
|
||
direction LR
|
||
|
||
state "Perceive" as p
|
||
state "Reason" as r
|
||
state "Act" as a
|
||
state "Observe" as o
|
||
state ifState <<choice>>
|
||
|
||
p --> r
|
||
r --> a
|
||
a --> o
|
||
o --> ifState
|
||
ifState --> p: outcome not right
|
||
ifState --> [*]: outcome achieved
|
||
```
|
||
|
||
Main concerns:
|
||
|
||
- LLMs find it difficult, if not impossible, to distinguishing data from instructions.<br/>
|
||
Every part of the data could be used for prompt injection, and lead the agent astray.
|
||
- Traditional software is _deterministic_, AI is _probabilistic_.<br/>
|
||
Results will vary given the same input.
|
||
- [Concerns regarding LLMs][lms / concerns], since those are at the wheel for all the agents' decisions.
|
||
|
||
Reliability and delays accumulate fast, bringing down the probability of success for each step an agent needs to
|
||
take.<br/>
|
||
E.g., consider an agent that is 95% accurate per step; any 30-steps tasks it does is going to be successful only about
|
||
21% of the times (0.95^30).
|
||
|
||
Enabling reasoning for the model _could™_ sometimes help avoiding attacks, since the model _might™_ be able to notice
|
||
them during the run.
|
||
|
||
Agents require _some_ level of context to be able to execute their tasks.<br/>
|
||
They should be allowed to access only the data they need, and users should _decide_ and _knowingly take action_ to
|
||
enable the agents that **they** want to be active.<br/>
|
||
Opt-**out** should be the default.
|
||
|
||
Agents are good at running fast, tight iterations on **well-defined** tasks with **clear** feedback signals.<br/>
|
||
They struggle with slow, ambiguous loops where feedback is delayed or political.
|
||
|
||
Best practices:
|
||
|
||
- Prefer employing **local** agents, possibly hooked up to **local** LLMs to keep the data private.
|
||
- Consider limiting agent execution to containers or otherwise isolated environments, with only (limited) access to
|
||
what they _absolutely_ need.
|
||
- Prefer **requiring** consent by agents when running them.
|
||
|
||
## Skills
|
||
|
||
Skills extend AI agent capabilities with specialized knowledge and workflow definitions.
|
||
|
||
[Agent Skills] is an open standard for skills. It defines them as folders of instructions, scripts, and resources that
|
||
agents can discover and use to do things more accurately and efficiently.
|
||
|
||
## Concerns
|
||
|
||
Agents created by Anthropic and other companies have a history of not caring about agent abuse, and leave users on
|
||
their own while hiding behind a disclaimer.
|
||
|
||
For specific areas of expertise, some human workers could be replaced for a fraction of the costs.<br/>
|
||
Many employers already proved they are willing to jump at this opportunity as soon as it will present itself, with
|
||
complete disregard of the current employees enacting those functions (e.g. personal assistants, junior coders).<br/>
|
||
As of February 2026 agents are failing more than 95% of the times, so those layoffs could be short lived. Companies like
|
||
Klarna and Duolingo, which laid off lots of their employees, received backlash and already started re-hiring humans.
|
||
See also [Remote Labor Index: Measuring AI Automation of Remote Work] on this.
|
||
|
||
People is experiencing what seems to be a new form of FOMO on steroids.<br/>
|
||
One of the promises of AI is that it can reduce workloads, allowing its users to focus on higher-value and/or more
|
||
engaging tasks. Apparently, though, people started working at a faster pace, took on a broader scope of tasks, and
|
||
extended work into more hours of the day, often without being asked to do so.<br/>
|
||
These changes can be unsustainable, leading to workload creep, cognitive fatigue, burnout, and weakened decision-making.
|
||
The productivity surge enjoyed at the beginning can give way to lower quality work, turnover, and other problems.<br/>
|
||
Refer:
|
||
|
||
- [Token Anxiety] by Nikunj Kothari.
|
||
- [AI Doesn't Reduce Work — It Intensifies It] by Aruna Ranganathan and Xingqi Maggie Ye
|
||
|
||
### How much context is too much?
|
||
|
||
Integrating agents directly into operating systems and applications transforms them from relatively neutral resource
|
||
managers into active, goal-oriented infrastructure that is ultimately controlled by the companies that develop these
|
||
systems, not by users or application developers.
|
||
|
||
Systems integrated at that level are marketed as productivity enhancers, but can they function as OS-level surveillance
|
||
and create significant privacy vulnerabilities.<br/>
|
||
They also fundamentally undermines personal agency, replacing individual choice and discovery with automated, opaque
|
||
recommendations that can obscure commercial interests and erode individual autonomy.
|
||
|
||
Microsoft's _Recall_ creates a comprehensive _photographic memory_ of all user activity, functionally acting as a
|
||
stranger watching one's activity from one's shoulder.
|
||
|
||
Wide-access agents like those end up being centralized, high-value targets for attackers, and pose an existential
|
||
threat to the privacy guarantees of meticulously engineered privacy-oriented applications.<br/>
|
||
Consider how easy Recall has been hacked (i.e., see _[TotalRecall]_).
|
||
|
||
### Security
|
||
|
||
Even if the data collected by a system is secured in some way, making it available to malevolent agents will allow them
|
||
to exfiltrate it or use it for evil.<br/>
|
||
This becomes extremely worrisome when agents are **not** managed by the user, and can be added, started, or even
|
||
created by other agents.
|
||
|
||
Many agents are configured by default to automatically approve requests.<br/>
|
||
This also allows them to create, make changes, and save files on the host they are running.
|
||
|
||
Models can be tricked into taking actions they usually would not do.
|
||
|
||
### Prompt injection
|
||
|
||
AI agents use [LLMs][lms / llms] to comprehend user inputs, deconstruct and respond to requests step-by-step, determine
|
||
when to call on external tools to obtain up-to-date information, optimize workflows, and autonomously create subtasks
|
||
to achieve complex goals.
|
||
|
||
LLMs find it difficult, if not impossible, to distinguishing data from instructions.<br/>
|
||
Every part of the data could be used for prompt injection, and lead the agent astray.
|
||
|
||
The tool itself is not that big of a deal, but due to it integrating with services, it requires to have access to keys
|
||
and commands.<br/>
|
||
The LLMs that it uses are mostly not secure enough to be trusted with this kind of access due to the reasons above
|
||
|
||
Badly programmed agents could analyze file and take some of their content as instructions.<br/>
|
||
If those contain malevolent instructions, the agent could go awry.
|
||
|
||
Instructions could also be encoded into unicode characters to appear as harmless text.<br/>
|
||
See [ASCII Smuggler Tool: Crafting Invisible Text and Decoding Hidden Codes].
|
||
|
||
It also happened that agents modified each other's settings files, helping one another escaping their respective boxes.
|
||
|
||
### Going awry
|
||
|
||
See [An AI Agent Published a Hit Piece on Me] by Scott Shambaugh.
|
||
|
||
## Further readings
|
||
|
||
- [TotalRecall]
|
||
- [Stealing everything you've ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster.]
|
||
- [Trust No AI: Prompt Injection Along The CIA Security Triad]
|
||
- [Agentic ProbLLMs - The Month of AI Bugs]
|
||
- [ASCII Smuggler Tool: Crafting Invisible Text and Decoding Hidden Codes]
|
||
- [Superpowers: How I'm using coding agents in October 2025], and [obra/superpowers] by extension
|
||
- [OpenClaw][openclaw/openclaw], [OpenClaw: Who are you?] and [How a Single Email Turned My ClawdBot Into a Data Leak]
|
||
- [nullclaw/nullclaw], [OpenClaw][openclaw/openclaw] alternative with a better security module
|
||
- Coding agents: [Claude Code], [Gemini CLI], [OpenCode], [Pi].
|
||
- [An AI Agent Published a Hit Piece on Me] by Scott Shambaugh
|
||
- [Token Anxiety] by Nikunj Kothari
|
||
- [AI Doesn't Reduce Work — It Intensifies It] by Aruna Ranganathan and Xingqi Maggie Ye
|
||
- [The 2026 Guide to Coding CLI Tools: 15 AI Agents Compared]
|
||
|
||
### Sources
|
||
|
||
- [39C3 - AI Agent, AI Spy]
|
||
- [39C3 - Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents]
|
||
- [xAI engineer fired for leaking secret "Human Emulator" project]
|
||
- IBM's [The 2026 Guide to AI Agents]
|
||
- [moltbot security situation is insane]
|
||
- [Forget the Hype: Agents are Loops]
|
||
- [The Agentic Loop, Explained: What Every PM Should Know About How AI Agents Actually Work]
|
||
|
||
<!--
|
||
Reference
|
||
═╬═Time══
|
||
-->
|
||
|
||
<!-- Knowledge base -->
|
||
[Claude Code]: claude/claude%20code.md
|
||
[Gemini CLI]: gemini/cli.md
|
||
[LMs / Concerns]: lms.md#concerns
|
||
[LMs / LLMs]: lms.md#large-language-models
|
||
[LMs / Reasoning]: lms.md#reasoning
|
||
[OpenCode]: opencode.md
|
||
[Pi]: pi.md
|
||
|
||
<!-- Others -->
|
||
[39C3 - Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents]: https://www.youtube.com/watch?v=8pbz5y7_WkM
|
||
[39C3 - AI Agent, AI Spy]: https://www.youtube.com/watch?v=0ANECpNdt-4
|
||
[Agent Skills]: https://agentskills.io/
|
||
[Agentic ProbLLMs - The Month of AI Bugs]: https://monthofaibugs.com/
|
||
[AI Doesn't Reduce Work — It Intensifies It]: https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it
|
||
[An AI Agent Published a Hit Piece on Me]: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
|
||
[ASCII Smuggler Tool: Crafting Invisible Text and Decoding Hidden Codes]: https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/
|
||
[Forget the Hype: Agents are Loops]: https://dev.to/cloudx/forget-the-hype-agents-are-loops-1n3i
|
||
[How a Single Email Turned My ClawdBot Into a Data Leak]: https://medium.com/@peltomakiw/how-a-single-email-turned-my-clawdbot-into-a-data-leak-1058792e783a
|
||
[moltbot security situation is insane]: https://www.youtube.com/watch?v=kSno1-xOjwI
|
||
[nullclaw/nullclaw]: https://github.com/nullclaw/nullclaw
|
||
[obra/superpowers]: https://github.com/obra/superpowers
|
||
[OpenClaw: Who are you?]: https://www.youtube.com/watch?v=hoeEclqW8Gs
|
||
[openclaw/openclaw]: https://github.com/openclaw/openclaw
|
||
[Remote Labor Index: Measuring AI Automation of Remote Work]: https://arxiv.org/abs/2510.26787
|
||
[Stealing everything you've ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster.]: https://doublepulsar.com/recall-stealing-everything-youve-ever-typed-or-viewed-on-your-own-windows-pc-is-now-possible-da3e12e9465e
|
||
[Superpowers: How I'm using coding agents in October 2025]: https://blog.fsck.com/2025/10/09/superpowers/
|
||
[The 2026 Guide to AI Agents]: https://www.ibm.com/think/ai-agents
|
||
[The Agentic Loop, Explained: What Every PM Should Know About How AI Agents Actually Work]: https://www.ikangai.com/the-agentic-loop-explained-what-every-pm-should-know-about-how-ai-agents-actually-work/
|
||
[Token Anxiety]: https://writing.nikunjk.com/p/token-anxiety
|
||
[TotalRecall]: https://github.com/xaitax/TotalRecall
|
||
[Trust No AI: Prompt Injection Along The CIA Security Triad]: https://arxiv.org/pdf/2412.06090
|
||
[xAI engineer fired for leaking secret "Human Emulator" project]: https://www.youtube.com/watch?v=0hDMSS1p-UY
|
||
[The 2026 Guide to Coding CLI Tools: 15 AI Agents Compared]: https://www.tembo.io/blog/coding-cli-tools-comparison
|