Building Production-Ready AI Agents: OpenAI's Practical Guide

Explore OpenAI's practical guide on building production-ready AI agents. Learn about key components like LLMs, tools, and instructions. Discover best practices for single vs multi-agent systems and essential guardrails. Optimize agent performance, cost, and latency for your use case.

29 يونيو 2025

This blog post provides a practical guide on building production-ready AI agents, drawing insights from OpenAI's recent release. It covers the key components of an agentic system, including the model, tools, and instructions, as well as strategies for orchestrating multi-agent workflows and implementing robust guardrails. This content offers valuable guidance for developers looking to leverage the power of AI agents to solve complex, real-world problems.

When Should You Build an Agent?
Key Components of an Agentic System
Selecting the Right Model
Defining Tools for Agents
Crafting Instructions for Agents
Orchestrating Agents: Single vs. Multi-Agent Systems
Implementing Guardrails for Agents
Conclusion

When Should You Build an Agent?

According to OpenAI, not every LLM-based solution requires an agentic use case. You can actually solve a lot of problems by building a normal rule-based decision-making solution. To determine whether you need an agentic solution for a given problem, OpenAI recommends looking at three core requirements:

Complex Decision Making: The system needs to have a need for nuanced reasoning and complex decision-making.
Difficult to Maintain Rules: At a certain point, the number of rules and the complexity will go beyond what is humanly maintainable. In such cases, you should consider more agentic solutions or LLM-based decision-making.
Heavy Reliance on Unstructured Data: Decision or rule-based decision-making systems usually work great on structured data, but if you have a lot of unstructured data such as natural language or information extraction from documents, you could potentially rely on an LLM-based system.

Before committing to building an agent, it's important to validate that your use case meets these criteria clearly. Otherwise, a deterministic solution may suffice.

Key Components of an Agentic System

According to OpenAI's practical guide, there are three main components of an agentic system:

Model: The language model (LLM) that powers the agent's reasoning and decision-making.
Tools: The capabilities expanded by function calls that the agent can use to interact with external systems.
Instructions: The system instructions, guidelines, and guardrails that control the agent's behavior.

When implementing a simple agent, you need to define these three components:

The agent is powered by an LLM model.
The agent has access to a set of tools that extend its capabilities.
The agent's behavior is controlled by a set of instructions.

To select the appropriate model for your agent, you should first establish a performance baseline using the most capable model. Then, you can try to optimize for cost and latency by replacing it with a weaker model while monitoring the impact on performance.

The tools provided to the agent can be categorized into three types: data, action, and orchestration. These tools enable the agent to retrieve information, interact with external systems, and coordinate with other agents, respectively.

The instructions define how the agent should make decisions and what it should consider. Best practices include using existing documents, breaking down tasks into subtasks, capturing edge cases, and iteratively refining the instructions based on observed failures.

Overall, the key to building effective agentic systems is to carefully design and balance these three components - the model, the tools, and the instructions - to meet the specific requirements of your use case.

Selecting the Right Model

According to the guide, when building an agentic system, you should first establish a performance baseline by using the most capable model for every task in the workflow. This allows you to understand the maximum potential performance.

Next, you can start replacing the models with weaker ones and observe the impact on performance. The goal is to optimize for performance, cost, and latency, rather than always using the most capable model.

The key recommendations are:

Establish a performance baseline: Start with the most capable models to understand the maximum potential performance.
Focus on meeting accuracy targets: Once the baseline is established, try to meet your accuracy targets using the best available models.
Optimize for cost and latency: Carefully consider the cost and latency constraints of your application, and select models accordingly, even if it means using a slightly weaker model.

By following this approach, you can build an effective agentic system that balances performance, cost, and latency requirements.

Defining Tools for Agents

When defining tools for agents, you need to provide a detailed tool description that the model or agent can use to pick the appropriate tool for a certain task. OpenAI categorizes tools into three different categories:

Data Tools: These enable agents to retrieve contacts and information necessary for executing workflows. Examples include API calls to retrieve data from databases, CRM systems, or other sources to enrich the context that the agent will use.
Action Tools: These enable agents to interact with external systems and take actions, such as sending emails, updating records, or handing off tasks to humans.
Orchestration Tools: These allow agents to coordinate with other agents, acting as tools for a higher-level orchestrator agent.

To implement tools in the OpenAI Agent SDK, you define simple Python functions with a @function_tool decorator. The docstring of the function should describe the tool's purpose, expected inputs, and outputs. You then provide the list of tools to the agent, allowing it to autonomously decide which tool to use for a given task.

It's important to limit the number of tools an agent has access to, as too many tools can make it difficult for the language model to reliably choose the right one. If you find the tool set becoming unwieldy, it may be better to split the tools into different categories and build separate agents around those specialized tool sets.

Crafting Instructions for Agents

When defining instructions for agents, OpenAI recommends the following best practices:

Use Existing Documents: Leverage existing operational procedures, workflows, or policy documents to inform the instructions. This helps ensure the agent's behavior aligns with established practices.
Prompt Agents to Break Down Tasks: Divide tasks into subtasks to make it easier for the language model to make decisions. Avoid assuming the agent is highly intelligent and can figure out the steps on its own.
Define Clear Instructions: Explicitly tell the agent what to do, rather than letting it make assumptions. For example, instruct the agent to ask the user for their order number or call an API to retrieve account details.
Capture Edge Cases: Anticipate and provide instructions for handling incomplete or unexpected user input. This helps the agent resolve edge cases gracefully.
Iteratively Refine: Continuously update the instructions, tools, and models based on the agent's performance and the failure cases you observe. Maintain an evolving dataset to drive these improvements.

By following these guidelines, you can craft clear and effective instructions that enable your agents to make autonomous decisions and accomplish tasks on your behalf.

Orchestrating Agents: Single vs. Multi-Agent Systems

According to the guide, there are two main patterns for orchestrating agents in agentic systems:

Single Agent System:
- In a single agent system, a single model is equipped with the necessary tools and instructions to execute workflows in a loop.
- The agent receives user input, goes through a reasoning/agentic loop, selects and executes tools, and produces the final output.
- This is usually an iterative process, where the agent's performance is refined over time based on user feedback and observed failures.
- The OpenAI Agents SDK provides a runner.run() method to facilitate the execution of a single agent system.
Multi-Agent System:
- In a multi-agent system, the workflow execution is distributed across multiple coordinated agents.
- There are two patterns for multi-agent systems:
  - Manager Agent Pattern: A central manager agent orchestrates the execution by calling upon other specialized agents as tools.
  - Decentralized Agent Pattern: Each agent works autonomously, with handoffs of control between agents as the workflow progresses.
- Both patterns can be modeled as graphs, with agents represented as nodes and tool calls/handoffs represented as edges.
- The decision to use a single or multi-agent system depends on factors like complexity of decision-making, number of tools, and maintainability of the system.

The guide recommends starting with a single agent system, even if you anticipate the need for a multi-agent system in the future. This allows you to establish a performance baseline and gradually optimize the system. When the complexity of the decision-making or the number of tools becomes unwieldy, it may be time to consider a multi-agent approach.

Implementing Guardrails for Agents

Guardrails are a critical component for any customer-facing agentic system. They help control and manage data privacy risks as well as reputational risks for the company. Here's a quick example of how these guardrails could be implemented:

The user input that goes into the agentic system created using the Agent SDK is passed in parallel to a set of independent guardrails. This parallel mechanism evaluates the input and output of the system to detect any potential risks, such as jailbreaks or information leakage. It's important to keep this guardrail system independent of the agent implementation so that the user is unaware of its existence.

Some key types of relevant classifiers that can be used as guardrails include:

Safety Classifier: Detects unsafe inputs and outputs.
PII Filter: Prevents leakage of personally identifiable information (PII).
Moderation Guardrail: Moderates the content to ensure it's appropriate.
Tool Safeguard: Ensures the safe usage of tools and prevents any potential breakouts.
Injection Prevention: Protects against SQL injections or prompt injections.

In addition to input guardrails, you should also have output guardrails to validate the responses generated by the agentic system. This is important because LLM-based systems can sometimes be jailbroken.

When setting up the guardrails, focus on addressing the risks you've already identified for your use case, and layer in additional ones as you encounter new vulnerabilities. Iterative refinement is key, so observe how your system is behaving and update the guardrails, system prompts, and tool calls as needed.

Here's a simple example of how you can add guardrails to your agents using the Agent SDK:

from agent_sdk.guardrails import input_guardrail, output_guardrail

@input_guardrail(['safety_classifier', 'pii_filter'])
def agent(user_input):
    # Agent implementation
    return response

@output_guardrail(['safety_classifier', 'moderation'])
def agent_output_guardrail(output):
    # Output validation
    return validated_output

In this example, the input_guardrail decorator applies the safety_classifier and pii_filter guardrails to the agent's input, while the output_guardrail decorator applies the safety_classifier and moderation guardrails to the agent's output.

Remember, building robust guardrails is essential for ensuring the safety and reliability of your agentic systems.

Conclusion

In this comprehensive guide, we have explored the key components and best practices for building effective agent-based systems. The insights shared by OpenAI provide a solid foundation for developers looking to leverage the power of language models and autonomous decision-making in their applications.

The core elements of an agent-based system, including the model, tools, and instructions, were discussed in detail. Emphasis was placed on selecting the appropriate model based on performance, cost, and latency requirements, as well as defining a robust set of tools to enhance the agent's capabilities.

Particular attention was given to the importance of clear and well-defined instructions, which serve as the guiding principles for the agent's behavior. The guide recommends leveraging existing documentation, breaking down tasks into subtasks, and capturing edge cases to ensure the agent operates within the desired parameters.

The guide also explored the two primary architectural patterns for agent-based systems: the single-agent and multi-agent approaches. The trade-offs and implementation details for each pattern were outlined, providing developers with a solid understanding of when to choose one over the other.

Finally, the critical role of guardrails in ensuring the safety and reliability of agent-based systems was highlighted. The guide emphasizes the need for independent input and output validation mechanisms to address data privacy, content safety, and other potential risks.

Overall, this practical guide from OpenAI serves as a valuable resource for developers looking to build agent-based systems that can autonomously accomplish tasks on behalf of users. By following the recommended best practices and principles, developers can create robust, scalable, and secure agent-based applications that meet the evolving demands of the market.

التعليمات

What is an agent according to OpenAI?

What are the three main components of an agentic system according to OpenAI?

When should you build an agent according to OpenAI?

What are the two patterns for building multi-agent systems according to OpenAI?

Why is it important to have guardrails in an agentic system according to OpenAI?