It is widely touted that “the future of AI is agentic.” However, there is an intricate trade-off in going fully agentic: the more autonomous the agents in an application, the less reliable the application will be.

Going agentic is only powerful when the workflow is complex. Consider other techniques, such as augmenting LLMs with RAG systems and tools for simple use cases, and only use agents for the scenarios involving prompt chaining, routing, parallelisation, worker orchestration or evaluation optimisation.

To address the issue, AI companies either train more capable models to support agentic workflows natively or create specialised application frameworks such as LangGraph and Magentic One to help downstream AI application developers. Of course, you can use both.

2024 saw a few key features emerge, often considered the backbones and benchmarks of a practical agentic solution.

State & memory management & persistence

Memory can be used to remember a single conversation (a thread) or for information across multiple conversations. For example, information about the user, their preferences and past interactions with the LLMs. All this information can make future conversations between the user and AI more natural. It is more close to natural conversions between actual humans. It will feel strange if someone forgets the previous conversations whenever it’s your turn to speak. In this case, the dialogue between the two participants will not be able to carry on.

Implementation details differ with different frameworks, but the idea is the same. For example, in LangGraph, you can use Checkpointer to remember a single thread and the Store interface for long-term memory. In LangChain, you can use RunnableWithMessageHistory to facilitate chats between humans and LLMs.

Practitioners have tried to map the memory types of human brains to that of AI agents: semantic, episodic, and procedural. For AI agents, semantic memory often refers to information about a specific user. Episodic memory refers to the agents’ past actions. Besides using a permanent store, few-shot prompting is another convenient way to provide LLMs with episodic memories. Procedural memory refers to the agent’s system prompts or out-of-box model capabilities. Memories can be updated on the hot path or in the background, with a trade-off between performance and simplicity.

One caveat is that memory does not follow the ‘the more, the better’ rule. With less advanced models, the longer the context window filled with memory, the more likely the LLMs are to hallucinate or be forgetful. Conciseness is the key here. One commonly used technique is keeping track of a conversation summary instead of the entire conversation verbatim. After each new turn, you can use LLM to update the summary with any new info from the recent chat. This is closer to a real-world conversation, where you may not remember every word the other person said, but you remember the critical points and even resume the conversation when you meet the person next time.

Human-in-the-loop

The ability to intervene is essential for many real-world AI use cases, especially when AI safety, accountability, and regulatory requirements are involved. All popular agentic systems allow users to provide feedback, approve/refuse steps, and update the application state to influence AI agents’ behaviour.

Controllability

Fine-grained control over agents’ behaviour is key to developing high-performing agentic systems. Different tools have different opinions on this topic. Some empower engineers to design their agentic workflow from scratch, while some prescribe the orchestration part of the system.

Other considerations

LLMs are often better prompt authors than humans.

Using LLMs to construct or rewrite prompts for complex tasks proves hugely effective. In addition, combined with long-term memory, we can ask LLMs to continuously improve the system or application prompts (self-improvement) based on human-in-the-loop feedback.

Not all problems require an agentic solution. A simple LLM application with well-designed and optimised prompts can achieve a lot. Although powerful, agentic systems have the drawbacks of increased costs and degraded speed. It may be overkill in many use cases.

References