I built a tool to make working with multiple AI models easier, and it’s now available on GitHub: https://github.com/ngstcf/llmbase
Dealing with API Fragmentation
Like many developers experimenting with AI, I found myself wanting to try different models for different tasks. GPT-4o is great for general-purpose work, Claude Sonnet handles complex reasoning well, Gemini shines with technical content, DeepSeek offers capable reasoning models at a fraction of the cost, and sometimes you just want to run things locally through Ollama for privacy.
But each provider has their own API, authentication method, and quirks – it was getting tedious to switch between them.
So, I put together a simple tool to make this easier. It’s nothing fancy – just a straightforward way to interact with multiple AI providers through a consistent interface. I’ve been finding it useful, so I thought I’d share it in case others run into the same problem.
What It Does
Basically, it acts as a translator that speaks the language of different AI providers. Instead of figuring out how each provider wants you to format your requests, you use one consistent way of talking to all of them.
There are similar tools out there, like LangChain and LiteLLM. They’re powerful, but I found they hide a lot of what’s happening behind layers of abstractions. When something goes wrong, it can be hard to debug. I wanted something more transparent – where I could see exactly what was being sent to each provider and how errors were being handled.
A few things I’ve found helpful:
Everything is visible: No hidden prompts or magic formatting. You can see exactly what’s being sent to each API and how errors are handled.
Resilience without complexity: There’s basic retry logic and circuit breakers, but they’re straightforward. An OpenAI outage doesn’t crash everything – it just moves on to the next request.
Same interface for different features: Some models have “thinking” or “reasoning” modes. The tool normalizes these so you don’t need different code for each provider’s specific implementation.
Use it how you want: You can call it as a Python library directly (no network overhead), or run it as a simple HTTP API if you need to call it from other services.
What It Supports (So Far)
Right now, it works with OpenAI, Anthropic, Google Gemini, DeepSeek, and local models via Ollama. The idea is you don’t need to remember different API formats for each one.
Switching models is just a matter of updating a config file. I’ve found this handy for:
- Privacy: Running sensitive data through local Ollama models instead of sending everything to cloud providers
- Cost: Using cheaper models for routine tasks and saving the expensive ones for when they’re really needed
- Testing: Quickly trying different models to see what works best for a particular use case
- Reliability: If one provider is having issues, it’s easy to switch to another
There’s also basic error handling – retries and timeouts – so if a provider is temporarily down, it doesn’t crash your whole application.
Who Might Find This Useful
If you’re a developer: It saves you from rewriting code when you want to switch providers. The error handling is built-in, so you don’t have to implement retries yourself.
If you’re running a business: You’re not locked into one vendor. If pricing changes or a provider has issues, switching is straightforward.
If you’re learning: You can experiment with different models without needing to understand each provider’s specific API quirks.
Real-World Examples
User Support Bot: Use GPT-4 for general user questions. Switch to Claude when things get complex. Route sensitive data to a local Ollama model for privacy.
Content Moderation Pipeline: Screen user-generated content by running it through multiple models simultaneously. GPT-4 checks for policy violations, Claude analyzes context and nuance, and your fine-tuned local model applies your specific community guidelines. If the models disagree, flag for human review. One API call, multiple perspectives.
Research Assistant: Building a tool that summarizes academic papers. Use Gemini for initial summarization (great with technical content), then pass complex sections to Claude for deeper analysis, and finally use GPT-4 to make the language accessible. Each model does what it does best, and you orchestrate them all through the same interface.
With this tool, all of that is just configuration changes. Your main application code stays the same.
Possible Improvements
I’m using this for my own projects and adding features as I need them. Some things I might explore:
- Multi-model orchestration: Combine outputs from multiple models for complex tasks – like having different models “vote” on the best answer, or routing specialized requests to the most appropriate model automatically
- Smart caching with semantic matching: Cache responses not just by exact prompt, but by meaning – so similar questions hit the cache instead of your API bill
- Cost optimization tracking: Monitor which models are actually performing best for your specific use cases, with per-task cost analysis
- Async batch processing: Queue up multiple requests and process them in parallel across different providers to maximize throughput
- Provider health monitoring: Track response times, error rates, and uptime across providers – automatically route around degraded services
- Prompt testing framework: A/B test different prompts across models to see what actually works best for your specific application
- Adding support for new providers as they launch – the architecture makes this straightforward
The goal is to keep it simple and transparent, while making it easier to work with different AI models without getting bogged down in provider-specific details.
Getting Started
If you’re interested in trying it out, there’s more detailed documentation at https://c3.unu.edu/projects/ai/llmbase.
The code is now available on GitHub: https://github.com/ngstcf/llmbase
You can:
- Use it as a Python library in your own code
- Run it as a simple HTTP API server
- Configure it with a JSON file that defines your models and settings
The setup involves adding your API keys to a .env file and creating a config file with your models. From there, you can call it directly in Python or make HTTP requests to the API server.
LLM Services is a simple tool for working with multiple AI providers through a unified interface. Check it out on GitHub: https://github.com/ngstcf/llmbase