The Single-Model Trap
When we started building Iris, the obvious approach was to pick the best available language model and build everything around it. Many AI products take this path. But we quickly realized that no single model excels at everything. Some models are exceptional at reasoning but struggle with vision tasks. Others generate beautiful prose but falter on structured data extraction. Building a world-class AI agent meant embracing this reality rather than fighting it.
The Multi-Model Approach
Iris uses a multi-model architecture powered by LiteLLM, an open-source routing layer that lets us send requests to different model providers through a unified interface. This means we can swap, upgrade, or add models without rewriting a single line of application logic. It is the foundation that makes everything else possible and protects us from vendor lock-in.
Model Specialization in Practice
Here is how we think about model selection within Iris:
- Claude for Reasoning — When tasks require deep analytical thinking, multi-step planning, or careful instruction following, we route to Anthropic's Claude. Its strength in structured reasoning makes it ideal for orchestrating agent workflows and producing nuanced analysis
- GPT-4 for Vision — Image understanding tasks, from analyzing charts and reading documents to describing photographs, go to OpenAI's GPT-4V. Its vision capabilities remain best-in-class for production workloads that require reliable visual interpretation
- Specialized Models — For tasks like fast classification, embedding generation, and lightweight summarization, we route to smaller, purpose-tuned models that deliver results faster and at lower cost
Intelligent Routing and Fallbacks
The routing layer does more than just pick a model. It handles automatic fallback, so if one provider is experiencing downtime, Iris retries with an alternative without any disruption to your workflow. It manages cost optimization, using lighter models for simple tasks and reserving expensive frontier models for complex ones. And it tracks token usage and latency across every provider so we can continuously tune routing decisions based on real performance data.
Consistent Tool Calling Across Models
One of the more challenging engineering problems was ensuring that tool calling works consistently regardless of which model is active. Claude and GPT-4 use different tool-calling formats, and not all models support structured tool use natively. Our orchestration layer in the run engine normalizes tool calls and results so that the agent loop operates seamlessly no matter which model handles a given step.
Future-Proofing the Platform
The AI landscape evolves at a staggering pace. New models launch every month, each with different strengths and trade-offs. By building Iris as a multi-model platform from day one, we can integrate new models the day they become available without disrupting the user experience. When a better reasoning model launches, we add it to the routing layer. When a faster vision model appears, we integrate it immediately.
This architecture is not just a technical choice. It is a strategic commitment to always giving our users access to the best AI capabilities available, regardless of which company builds them. The winners in AI tooling will be platforms that are model-agnostic and relentlessly focused on outcomes rather than locked to any single provider.
