AI Agent
Artificial intelligence (AI) is a widely used term, applied in many different ways depending on the context. Its recent rise — the so-called AI bubble — has been driven by advances in generative AI, popularized by GPT and rapidly adopted by the general public, which has fueled both excitement and confusion.
Rather than being a true “intelligence,” AI is better understood as a computer system capable of performing tasks typically associated with human intelligence and improving its performance through learning.
In practice, this covers any action that reproduces a behavior once considered uniquely human. For example, imagine a company relocating its offices and wanting to analyze the impact on commuting expenses reimbursed to employees. It sets up a system that, for each employee, carries out the following steps:
- connect to the database to obtain the home address;
- open a web browser;
- navigate to Google Maps;
- input the addresses (employee and company);
- extract and store the distance information;
- compare this new distance to the old one;
- calculate the new total reimbursement cost, depending on the mode of transport of each employee;
- calculate a global indicator (lambda).
Does this process involve learning or intelligence? Not really. The system in reality follows a simple list of instructions, including extracting information from a database and using APIs (Application Program Interface) to access tools. Yet, the entire process, although requiring no intelligence, reproduces a behavior that until then belonged to human competence and, consequently, may, according to the literature, be defined as artificial intelligence.
Generative AI vs AI Agent
Generative artificial intelligence, or generative AI, is a type of AI system capable of generating text, images, videos, or other media in response to prompts. This has evolved considerably in recent years, moving from simple content-generation tools to more sophisticated systems capable of handling complex tasks.
To understand the differences between generative AI and AI agents, it is interesting to understand the evolution of most generative AI models (GPT, Gemini, Claude, ...). Let’s take the example of OpenAI.
In 2022, GPT-3.5 fascinated the world with its ability to generate coherent text and answer complex questions. What seemed incredible was above all the breadth of its apparent knowledge, drawn from the immense corpus of web texts and other sources on which it had been trained. But this model also had important limitations: no memory between conversations, knowledge frozen at a specific date, and above all, no capacity for action.
The major difference between a classic AI (like GPT-3.5) and an AI agent (like most current generative applications) lies in autonomy and action. While a classic AI is a passive tool, dependent only on human commands, an AI agent acts by making decisions and carrying out tasks in a partially autonomous way.
For example, GPT-3.5 could:
- write a brilliant email;
- offer intelligent suggestions;
- answer complex questions.
However, it could not:
4. automatically find the ideal recipient;
5. decide the right moment to send it;
6. send it itself without human intervention.
While traditional language models excel at generating content and answering questions, an AI agent is designed to observe relevant data in its environment, analyze this information, plan its actions, and act directly to achieve a specific goal.
The year 2023 marks a turning point in the evolution from generative AI to agentic AI. Models became multimodal, capable of understanding not only text but also images. More importantly, they acquired the ability to use tools: execute Python code, browse the web, generate images, and even interact by voice.
The introduction of memory also represents a crucial advance. The model can now remember who you are, your preferences, and your ongoing projects. This memory, fully transparent and under your control, enables fine personalization of responses and continuity in interactions.
LLM vs LRM
Despite all these evolutions, one key dimension was still missing to approach the perfect definition of agentic AI: the ability to think.
This is precisely what Large Reasoning Models (LRM) introduce. While LLMs (Large Language Models) mainly generate fluent text based on correlations, LRMs are trained to take time to reason before answering. They add an explicit reasoning step: chaining facts, posing simple equations, ranking hypotheses, or deciding when to use external tools, before providing a final answer.
OpenAI, with its o-series (later renamed Thinking), presented in 2024 the first LRMs accessible to the public. Their particularity is being explicitly “trained to think longer before answering,” which significantly improves the relevance and accuracy of results.
A patient asks:
“I’ve had a runny nose for three days, I sneeze a lot, and my eyes are red and itchy. What could this be?”
-
LLM (ChatGPT Model LLM)
“These symptoms may be related to a cold, the flu, or seasonal allergies. If it persists, consult a doctor.”
→ Here, the model recalls several possibilities, but without explicit reasoning or prioritization. -
LRM (ChatGPT Model LRM)
“Runny nose + sneezing + itchy eyes are typical of seasonal allergic rhinitis. The absence of fever or body aches makes flu or cold less likely. Considering that it’s spring and the current pollen level, the most plausible hypothesis is an allergy. An over-the-counter antihistamine could help, but consult a clinician if symptoms persist.”
→ Here, the model applies step-by-step reasoning (“itchy eyes and spring season → think allergies”, “no fever → rule out infection”) before concluding.
In summary, Large Reasoning Models complete the definition of Agentic AI: where the AI agent observes, analyzes, plans, and acts, the LRM adds the ability to explicitly think. This new brick of reasoning brings artificial systems even closer to structured human reasoning.
The AI Agent
An AI agent is an entity capable of interacting with its environment to achieve a defined objective. It combines three essential characteristics:
- observe: capture information from its environment — whether it is a user prompt, real-time data, documents, or a knowledge base.
- reason (think): analyze this information, infer meaning, and develop a strategy adapted to the objective.
- act: execute the necessary actions — for example run a calculation, call an API, generate a document, send a message, or trigger a business process.
This combination of observation, planning, and action places AI agents at a higher level compared to traditional models. Take the case of a personal assistant AI agent organizing your trips:
- a classic generative AI: provides a list of the best routes according to your criteria, but the rest of the organization is up to you.
- an AI agent: analyzes traffic in real time, chooses an optimal route, books a parking spot, and adjusts the plan in case of unforeseen events.
📝 With this approach, the AI agent does not just assist you, but becomes a proactive and autonomous actor, capable of managing complex processes while adapting to dynamic situations.
The components of an AI agent
An AI agent is not just a language model: it is a complete architecture where several blocks cooperate to observe, reason, memorize, and act in complex environments.
The model (LLM or LRM): This is the cognitive core of the agent: it understands natural language, analyzes requests, reasons (more or less explicitly depending on whether it is an LLM or an LRM), and generates appropriate responses.
Memory: It maintains context over time: short-term (in a conversation or a session) and long-term (user preferences, project history, learning new rules). Without memory, the agent would respond to each request as if starting from scratch.
Tools: They extend the agent beyond text by allowing it to call external APIs and services (read, write, trigger actions), execute specialized functions or scripts (calculation, data transformation, code), and access data stores (databases, indexes, documents). Tools connect reasoning to concrete action.
Context: It brings together all the information available at a given moment — shared documents, business data, real-time signals (traffic, market, technical logs, etc.) — and constitutes the “current situation” in which the agent operates.
The contract / prompt: It sets the rules of the game and the objectives: explicit instructions, security constraints (not executing certain actions), or business framework (for example: “optimize transport costs”). This contract guides the agent’s mission and frames its autonomy.
Comparison: AI Agent, Generative AI, and RPA
Aspect | Generative AI | AI Agent | RPA |
---|---|---|---|
Interaction | Static | Dynamic | None |
Adaptation | Low | High | Low |
Task level | Medium | High | Low |
Example | Textual responses | Proactive support | Simple automation |
The 5 levels of autonomy
AI agents are evolving rapidly, and their future potential opens the way to even more remarkable innovations. To understand this trajectory, we can distinguish five levels of autonomy, which constitute the continuum of Agentic AI.
Level 1 – Rule-based automation This is the most basic stage, that of deterministic programming. The system follows a sequence of precise instructions, with no learning, no adaptation.
Example: a company sets up a system that, for each employee, connects to a database, opens Google Maps, calculates the home-work distance, compares the old and new, then updates the total reimbursement cost.
This process imitates human behavior but remains simple automation: no intelligence, just fixed rules.
Level 2 – Intelligent automation At this stage, we introduce language models (LLM) such as GPT into an automated flow. The system no longer just executes rules: it analyzes, writes, classifies, generates content. Its autonomy remains framed by a contract (instructions), a role (defined mission), and tools (APIs, databases, functions), often defined via a prompt.
Example: an agent receives a prospect profile, analyzes it with GPT, writes a personalized email, then waits for validation before sending.
We are already beyond simple rules: the model contributes with language capabilities, but remains dependent on humans for the final action.
Level 3 – Orchestrated systems of agents This is the current stage. Here, it is no longer a single agent, but a set of specialized agents, each with its own role and tools, coordinated by an orchestrator. The orchestrator distributes tasks, manages dependencies, and ensures overall coherence.
Example: a complex customer support request is handled by an orchestrator that mobilizes a technical agent, a performance agent, a compliance agent, a communication agent… before consolidating their contributions into a single clear response.
Level 4 – Autonomous collaboration This is the stage we are already moving towards. Agents are no longer just orchestrated: they perceive the world directly (via cameras, sensors, multimodality), reason and interact in real time, and learn from their own actions.
We are seeing the first signs of this with the emergence of Large Action Models (LAM).
LLMs taught machines to write and converse.
LRMs introduced the ability to think longer before acting.
LAMs, in turn, represent a further advance: they learn by observing and reproducing human actions at scale. Unlike LLMs, which train on text corpora, LAMs assimilate not only rules and language, but above all sequences of gestures, decisions, and strategies.
Example: instead of learning Power BI from manuals, a LAM is exposed to thousands of videos and tutorials showing experts building dashboards. It learns not by reading, but by observing and reproducing actions.
Projects like Google RT-2 or Meta Ego4D already show this trend: models capable of learning directly from videos, interactions, and human behaviors.
Thus, even though we are still mostly at level 3, level 4 is already being realized through these new models.
Level 5 – Strategic generalization The ultimate level: the agent becomes proactive and strategic. It detects weak signals, anticipates problems, makes complex decisions, and acts without explicit request.
Example: your agent identifies a critical regulatory change for your business on Monday at 6 a.m. It reads the legal text, simulates impacts, designs a mitigation plan, adjusts client communication, and informs your teams… before you even open your computer.