LLMBase: One API, Many AI Models - UNU Campus Computing Centre

I built a tool to make working with multiple AI models easier, and it’s now available on GitHub: https://github.com/ngstcf/llmbase

Dealing with API Fragmentation

Like many developers experimenting with AI, I found myself wanting to try different models for different tasks. GPT-4o is great for general-purpose work, Claude Sonnet handles complex reasoning well, Gemini shines with technical content, DeepSeek offers capable reasoning models at a fraction of the cost, and sometimes you just want to run things locally through Ollama for privacy.

Navigating the unique APIs, authentication methods, and idiosyncrasies of each AI provider creates unnecessary friction. I found the constant context switching became a bottleneck in my development cycle. To solve this, I developed a straightforward tool to normalize these interactions through a single, consistent interface. It is a functional solution that reduces code complexity. I have found it essential for maintaining a clean architecture and hope it proves useful to the wider community.

Core Functionality

The tool functions as a normalization layer that standardizes interactions across different AI ecosystems. It currently supports OpenAI, Azure OpenAI, Anthropic, Google Gemini, DeepSeek, xAI/Grok, Perplexity, and local models via Ollama. Rather than managing unique API schemas for each vendor, you can utilize a single, consistent interface to communicate with all of them.

Design Philosophy: Transparency Over Abstraction

While ecosystems like LangChain and LiteLLM offer powerful capabilities, they often obscure the underlying mechanics through heavy abstraction layers. This opacity can complicate debugging when issues arise. I designed this tool to prioritize transparency. It ensures that you have full visibility into the exact payloads sent to each provider, complete with unique request IDs for tracing and dedicated endpoints for verifying configuration status.

Key Technical Features

Deployment Flexibility: The tool is adaptable to your architecture. You can integrate it directly as a Python library to eliminate network overhead or deploy it as a standalone HTTP API for consumption by other services.
Feature Normalization: Advanced features, such as the “thinking” or “reasoning” modes found in newer models, are normalized within the interface. This allows you to leverage specific model capabilities without writing vendor-specific code.

Strategic Advantages

Reliability and Resilience – The tool secures application stability through two layers of defense. First, it incorporates essential fault tolerance using retries and timeouts to handle transient network errors. Second, the unified interface allows for rapid provider switching, meaning you can effortlessly migrate to a backup service if your primary vendor experiences a prolonged outage.

Operational Optimization – Switching between models requires only a simple configuration update. This flexibility enables several strategic workflows:

Privacy: Route sensitive data through local Ollama models to ensure it never leaves your infrastructure.

Cost Management: Direct routine tasks to cost-effective models while reserving high-performance models for complex requirements.

A/B Testing: Rapidly swap models to empirically determine the best fit for specific use cases.

Who Might Find This Useful

If you’re a developer: It saves you from rewriting code when you want to switch providers. The error handling is built-in, so you don’t have to implement retries yourself.

If you’re running a business: You’re not locked into one vendor. If pricing changes or a provider has issues, switching is straightforward.

If you’re learning: You can experiment with different models without needing to understand each provider’s specific API quirks.

Real-World Examples

User Support Bot: Use GPT-4 for general user questions. Switch to Claude when things get complex. Route sensitive data to a local Ollama model for privacy.

Content Moderation Pipeline: Screen user-generated content by running it through multiple models simultaneously. GPT-4 checks for policy violations, Claude analyzes context and nuance, and your fine-tuned local model applies your specific community guidelines. If the models disagree, flag for human review. One API call, multiple perspectives.

Research Assistant: Building a tool that summarizes academic papers. Use Gemini for initial summarization (great with technical content), then pass complex sections to Claude for deeper analysis, and finally use GPT-4 to make the language accessible. Each model does what it does best, and you orchestrate them all through the same interface.

With this tool, all of that is just configuration changes. Your main application code stays the same.

Possible Improvements

I’m using this for my own projects and adding features as I need them. Some things I might explore:

Multi-model orchestration: Combine outputs from multiple models for complex tasks like having different models “vote” on the best answer, or routing specialized requests to the most appropriate model automatically
Smart caching with semantic matching: Cache responses not just by exact prompt, but by meaning – so similar questions hit the cache instead of your API bill
Cost optimization tracking: Monitor which models are actually performing best for your specific use cases, with per-task cost analysis
Async batch processing: Queue up multiple requests and process them in parallel across different providers to maximize throughput
Provider health monitoring: Track response times, error rates, and uptime across providers – automatically route around degraded services
Prompt testing framework: A/B test different prompts across models to see what actually works best for your specific application
Adding support for new providers as they launch – the architecture makes this straightforward

The goal is to keep it simple and transparent, while making it easier to work with different AI models without getting bogged down in provider-specific details.

Getting Started

If you’re interested in trying it out, there’s more detailed documentation at https://c3.unu.edu/projects/ai/llmbase.

The code is now available on GitHub: https://github.com/ngstcf/llmbase

You can:

Use it as a Python library in your own code
Run it as a simple HTTP API server
Configure it with a JSON file that defines your models and settings

The setup involves adding your API keys to a .env file and creating a config file with your models. From there, you can call it directly in Python or make HTTP requests to the API server.

LLM Services is a simple tool for working with multiple AI providers through a unified interface. Check it out on GitHub: https://github.com/ngstcf/llmbase