The Exact AI Stack We Install for Every Client. No Fluff. No Extras. Just What Works.
The complete AI operations stack we use at Zune Lab. Every tool, why we chose it, what we dropped, and how they connect into one system.
Every week someone sends me a "Top 50 AI Tools You Need in 2026" article and asks what I think. I think it's noise. I think the people writing those articles have never installed a single one of those tools inside a real business. I think they're optimizing for clicks, not for operations.
Here's what I actually think matters: what are you installing, how does it connect, and does it survive contact with real employees on a Monday morning?
That's the bar. Not "this tool has cool features." Not "this raised $200M in Series C." Does it work when your ops manager needs an answer at 7am and nobody from your team is awake?
We have installed this stack in IT consulting firms, MSPs, agencies, and service businesses doing anywhere from $500K to $30M in revenue. The tools have been tested. The ones that didn't make it got cut. What's left is what actually works.
Let me walk you through all of it.
The Philosophy Before the Tools
Before I name a single product, you need to understand the principle that drives every decision we make.
We build systems. Not tool collections.
Most companies buy AI tools the way they buy SaaS: one at a time, for one problem, with no thought about how it talks to anything else. Six months later they have nine subscriptions, three of which overlap, two of which nobody uses, and zero of which actually connect to each other.
Our stack is different because every piece was chosen for one reason: it fits into the system. If a tool is the best in its category but doesn't integrate well with the rest of the stack, we drop it. No exceptions.
The system has five layers:
- Workflow Automation: the connective tissue
- Vector Storage: the memory
- LLM Layer: the brain
- Communication Routing: the nervous system
- Knowledge Base Input: the food
- Custom Agents: the specialists
Every layer depends on the others. Pull one out and the whole thing degrades. That's by design. Let me show you what sits in each layer and why.
Layer 1: Workflow Automation (Make, with n8n as Backup)
What it does in the stack
Make is the central nervous system. Every automation, every trigger, every "when this happens do that" runs through Make. Client sends an email? Make routes it. New employee starts? Make triggers the onboarding sequence. Ticket comes in after hours? Make decides whether it needs a human or whether the AI agent can handle it.
If the stack were a body, Make would be the spinal cord. Everything passes through it.
Why Make over the alternatives
We tested Zapier for about four months. It works for simple things. Two triggers, one action, done. But the second you need conditional logic, branching paths, error handling, or any kind of data transformation mid flow, Zapier falls apart. The interface fights you. The pricing punishes complexity. And the execution speed on anything beyond basic triggers is noticeably slower.
Make gives us visual workflow builders that can handle 30, 40, 50 step automations without becoming unreadable. The pricing model is based on operations, not on "zaps," which means a complex workflow doesn't suddenly cost you 5x more.
What we tried and dropped: Zapier (too rigid, too expensive at scale), Power Automate (great if you're a Microsoft shop, terrible if you're not), Tray.io (overkill for most of our clients, pricing is enterprise only).
Where n8n fits
For clients who need self-hosted automation, usually because of data residency requirements or because they handle sensitive government contracts, we deploy n8n on their own infrastructure. It's open source, it's fast, and the workflow builder is nearly as good as Make's. The tradeoff is that someone needs to maintain the server. For most clients under $5M revenue, Make is the answer. Above that, or in regulated industries, n8n.
How it connects
Make talks to everything. It pushes data to Pinecone when new knowledge is added. It calls OpenAI and Anthropic APIs when an agent needs to generate a response. It listens to Slack and email for incoming messages. It updates Notion and Airtable when records change. It is the glue.
Layer 2: Vector Storage (Pinecone, with Weaviate for Specific Cases)
What it does in the stack
This is where your company's knowledge actually lives in a format that AI can use. When we say "the AI knows your SOPs" or "the AI can answer questions about your service agreements," that knowledge is stored as vector embeddings inside Pinecone.
Think of it this way: your documents, your processes, your FAQs, your pricing guides: all of that gets chunked, embedded, and stored in Pinecone. When someone asks a question, the system searches Pinecone for the most relevant chunks, pulls them out, and feeds them to the LLM as context. That's how the AI "knows" your business. It's not magic. It's architecture.
Why Pinecone over the alternatives
Speed and reliability. That's it. We tested ChromaDB, FAISS, Weaviate, Qdrant, and Milvus. Here's what happened:
- ChromaDB: great for prototyping, not great for production. The moment you go beyond 100K vectors it starts showing cracks. We used it in early builds and ripped it out twice.
- FAISS: Meta's library. Incredibly fast for local similarity search. But it's a library, not a service. You need to build the entire infrastructure around it yourself. For a consulting firm doing $2M in revenue, that's not practical.
- Qdrant: solid product, good performance. But the managed offering was less mature than Pinecone's when we standardized, and we didn't want to self host vector databases for every client.
- Milvus: powerful but heavy. Built for massive scale. Our clients don't need to search through 500 million vectors. They need to search through 50,000 vectors really well.
Pinecone's managed service just works. Serverless pricing means clients aren't paying for idle compute. Query latency is consistently under 50ms. And the metadata filtering is clean enough that we can build namespace separation for multi tenant setups without any hacks.
Where Weaviate fits
For clients who want hybrid search, meaning they want to combine traditional keyword search with vector similarity, Weaviate is better out of the box. Pinecone added sparse vector support but Weaviate's BM25 plus vector hybrid approach is more mature. We also use Weaviate when a client has complex object schemas where the data isn't just text chunks but structured entities with relationships. Weaviate handles that natively. Pinecone doesn't.
How it connects
When new documents land in Notion or Airtable, Make triggers a processing pipeline. The documents get chunked (we use recursive character splitting with overlap, typically 512 tokens with 50 token overlap). The chunks get embedded via OpenAI's embedding model. The embeddings get upserted into Pinecone with metadata tags: source, date, department, document type. When an agent needs to answer a question, it queries Pinecone, retrieves the top 5 to 10 most relevant chunks, and passes them to the LLM as context.
Layer 3: LLM Layer (OpenAI and Anthropic)
What it does in the stack
This is the brain. The LLM takes context from Pinecone, instructions from the system prompt, and the user's question, then generates a response. Every client-facing AI interaction, whether it's answering a Slack question, drafting an email reply, summarizing a ticket, or walking a new employee through setup, runs through this layer.
Why both OpenAI and Anthropic
Because they're good at different things, and betting your entire operation on one provider is reckless.
OpenAI (GPT 4o, GPT 4.1): This is our default for most client facing interactions. It's fast, it follows instructions well, and the function calling is the most reliable in the industry. When we need an agent to take actions (create a ticket, update a record, send a message), GPT 4o's structured output and function calling makes it the obvious choice.
Anthropic (Claude Opus, Claude Sonnet): This is what we use for anything requiring long context, nuanced reasoning, or document analysis. When we need to process a 40 page service agreement and extract specific clauses, Claude handles it better. When the task requires careful reasoning about ambiguous situations, such as deciding whether a support ticket needs escalation, Claude's responses are more measured and accurate.
We also use Anthropic as a fallback. If OpenAI's API has latency issues or goes down (which happens more often than people admit), the system automatically routes to Anthropic. Zero downtime for the client.
What we tried and dropped
Google Gemini: tested it extensively. The multimodal capabilities are interesting but the API reliability wasn't where it needed to be for production systems when we evaluated it. The responses were also less consistent. We'd get great output 80% of the time and bizarre output 20% of the time. That's not good enough when a client's employee is relying on it.
Open source models (Llama, Mistral): we've deployed these for specific use cases where data can never leave the client's infrastructure. But for general operations AI, the performance gap is still real. You need serious GPU infrastructure to run them at acceptable speeds, and most of our clients don't want to manage that. We revisit this every quarter.
How it connects
Make handles the routing logic. Based on the task type, it sends the request to either OpenAI or Anthropic. The system prompt is stored in the workflow configuration. The context comes from Pinecone. The response goes back through Make and gets routed to wherever it needs to go: Slack, email, Notion, or a custom dashboard.
Layer 4: Communication Routing (Slack and Email Integrations)
What it does in the stack
This is how the AI system talks to humans and how humans talk to it. Every inbound message, whether it comes through Slack, email, or a web form, gets captured, classified, and routed. The AI either handles it directly or escalates it to the right person with full context attached.
The Slack setup
We deploy a custom Slack bot (usually named after the client's brand) that lives in designated channels. Employees can ask it questions in natural language. "What's our process for onboarding a new client in the healthcare vertical?" "What's the SLA for priority one tickets?" "Draft a response to this email from the prospect."
The bot receives the message, queries Pinecone for relevant context, sends the prompt to the LLM, and posts the response in thread. If the bot isn't confident in its answer (we measure this through a confidence scoring system built into the prompt), it flags the message for a human and tags the appropriate person.
Why Slack specifically: because that's where our clients already live. We don't force new interfaces. The fastest way to get adoption is to put the AI where people already work. If a client's team uses Microsoft Teams instead, we build the same system there. But 80% of our clients are on Slack.
The email setup
Inbound emails to specific addresses (like support@ or info@) get captured by Make, parsed for intent and urgency, and routed accordingly. Simple questions get auto drafted responses that a human reviews before sending. Complex issues get classified, summarized, and assigned to the right team member with a suggested response attached.
This alone saves most of our clients 15 to 25 hours per week. That's not a guess. We measure it during the first 30 days post install and show the client the data.
What we tried and dropped
Intercom / Zendesk AI features: built-in AI in these platforms has gotten better, but it's limited to their own ecosystem. The moment you need the AI to pull context from your internal knowledge base or trigger an action in another tool, you're stuck. We'd rather build the routing layer ourselves and keep full control.
Custom chat widgets: we built a few. The adoption rate was terrible. People don't want another tab to check. They want answers where they already are. We stopped building standalone chat interfaces for internal use entirely.
How it connects
Slack and email are input/output channels. Make listens for triggers from both. When a message comes in, Make orchestrates the entire flow: classify the message, pull context from Pinecone, call the LLM, format the response, deliver it back through the same channel. The user never knows there are six systems working behind a single reply.
Layer 5: Knowledge Base Input (Notion and Airtable)
What it does in the stack
This is where the raw knowledge lives before it becomes AI usable. Your SOPs, your process docs, your pricing sheets, your client information, and your templates all sit in either Notion or Airtable (sometimes both), and the system pulls from it automatically.
Why Notion
Notion is where unstructured knowledge lives. Long form documents, process guides, meeting notes, project briefs. The API is mature, the block structure makes it easy to parse programmatically, and most of our clients already use it (or can be migrated in a day).
When a team member updates an SOP in Notion, a webhook fires. Make picks it up, re chunks the document, generates new embeddings, and updates Pinecone. The AI's knowledge is current within minutes of a document change. No manual retraining. No batch processing overnight. Minutes.
Why Airtable
Airtable is where structured knowledge lives. Client lists, service catalogs, pricing tiers, vendor information, employee directories. Anything that fits in rows and columns. The reason we use Airtable over a traditional database is that non technical team members can maintain it. Your ops manager doesn't need to write SQL to update the pricing table. They edit a cell in Airtable and the AI knows about it.
What we tried and dropped
Google Docs / Drive: the API is painful. The permissions model is a nightmare for automated access. Parsing Google Docs programmatically is surprisingly hard because of how they structure the underlying JSON. If a client is deeply embedded in Google Workspace we'll integrate it, but we never recommend it as the primary knowledge source.
Confluence: too heavy, too slow, too Atlassian. The API works but the user experience of maintaining knowledge in Confluence is so bad that people stop updating it. Dead knowledge bases kill AI systems. We'd rather use a tool people actually enjoy using.
SharePoint: same story as Confluence but worse. The API is technically capable but practically miserable. Every permission issue becomes a two hour troubleshooting session.
How it connects
Notion and Airtable are the inputs. Changes in either trigger Make workflows that process, embed, and store the updated knowledge in Pinecone. The LLM never talks to Notion or Airtable directly. It only talks to Pinecone. This means the knowledge source can be swapped without touching the AI layer. If a client wants to move from Notion to something else in six months, we re point the ingestion pipeline and everything else stays the same.
Layer 6: Custom Agents, Purpose-Built for Onboarding, Triage, and Reporting
What they do in the stack
This is where the generic becomes specific. The layers above give us a general purpose AI operations system. Custom agents turn that system into something that does a particular job extremely well.
The Onboarding Agent
New employee or new client? The onboarding agent handles the first 48 hours. It sends welcome messages. It walks people through required steps. It answers questions about tools, access, and process. It checks whether setup tasks have been completed and sends reminders if they haven't.
For one MSP client, this agent reduced their employee onboarding time from two weeks to three days. The new hire got instant answers to every "where do I find X" and "how do I do Y" question instead of waiting for a busy team lead to respond.
The Triage Agent
Incoming requests, support tickets, or client emails hit the triage agent first. It reads the message, classifies the urgency, identifies the category, checks whether an existing knowledge base article answers the question, and either resolves it or routes it with full context to the right human.
The classification accuracy after the first month is typically above 90%. By month three, it's above 95%. The agent learns from corrections. Every time a human reclassifies something the agent got wrong, that feedback gets incorporated.
The Reporting Agent
End of week, end of month, end of quarter: this agent compiles data from across the stack and generates summary reports. How many tickets were handled automatically? What was the average response time? Which knowledge base articles are being accessed most? Where are the gaps?
This isn't a dashboard. It's a written report, delivered to Slack or email, in natural language, with specific recommendations. "Ticket volume for network issues increased 34% this month. Three of the most common questions don't have knowledge base articles yet. Here are the questions. Here are draft articles."
How they're built
Every custom agent is a Make workflow with a specific system prompt, specific Pinecone namespaces it can access, specific actions it can take, and specific escalation rules. They're not separate software. They're configurations within the same stack. That means adding a new agent takes days, not months.
How the Whole System Connects: A Real Example
Let me walk you through a real scenario so you can see how all six layers work together.
Tuesday, 7:14 AM. A client of your MSP sends an email to support@ saying their VPN isn't connecting and they have a presentation in 45 minutes.
- Make captures the inbound email and triggers the triage workflow.
- The Triage Agent reads the email, classifies it as "connectivity issue, high urgency" based on the 45 minute deadline.
- Pinecone gets queried for VPN troubleshooting articles specific to that client's setup (using metadata filters for client name and issue category).
- OpenAI (GPT 4o) generates a response with three troubleshooting steps based on the retrieved context, plus a note that if these don't work, a technician will be assigned immediately.
- Make sends the auto drafted response to the client via email and simultaneously posts in the appropriate Slack channel so the on call tech has visibility.
- If the client replies saying the steps didn't work, Make escalates to a human, attaches the full conversation history, the troubleshooting steps already attempted, and the client's configuration details pulled from Airtable.
Total time from email received to first response: under 90 seconds.
No human touched it. The client got a personalized, accurate response with troubleshooting steps specific to their environment. And if it needed escalation, the tech got everything they needed without asking the client to repeat themselves.
That's not a demo. That's what's running right now in production for multiple clients.
What This Stack Costs
I'm going to be direct because nobody else in this space talks about pricing honestly.
For a typical install, a service business with 15 to 50 employees, the monthly tool cost for this entire stack runs between $300 and $800 per month. That's Make, Pinecone, OpenAI/Anthropic API usage, and whatever integrations are needed.
That's not our fee. That's the raw tool cost. The AI API spend for most operations use cases is shockingly low because you're not generating millions of tokens per day. You're handling hundreds of queries. At current pricing, that's pennies per interaction.
Compare that to hiring one additional operations person at $4,000 to $6,000 per month. The math isn't close.
What We Deliberately Left Out
A few things we consciously chose not to include:
- No dedicated AI platform (like Jasper, Writer, etc.): These are wrappers around the same LLMs we're already using directly. You're paying a premium for a UI you don't need because your team interacts through Slack and email.
- No standalone chatbot builders (Botpress, Voiceflow, etc.): We tried these. They add a layer of abstraction that makes debugging harder and customization more limited. Building agents directly in Make with API calls gives us full control.
- No RPA tools (UiPath, Automation Anywhere): RPA is for clicking buttons in legacy software that doesn't have an API. Most modern tools have APIs. If your operation still requires screen scraping bots, you have a bigger problem than AI can solve.
The Honest Truth About This Stack
Is it perfect? No. Nothing is. Here's what's genuinely hard about it:
Maintenance is real. APIs change. Models get updated. Prompts need tuning. A system like this isn't "set it and forget it." It's more like "set it, monitor it, improve it monthly." If someone tells you AI operations are maintenance free, they're either lying or they've never built one.
Data quality is everything. The AI is only as good as what's in Pinecone, and what's in Pinecone is only as good as what's in Notion and Airtable. If your knowledge base is outdated, incomplete, or wrong, the AI will confidently give outdated, incomplete, or wrong answers. Garbage in, garbage out. That law hasn't changed.
Adoption takes effort. Building the system is 40% of the work. Getting your team to actually use it is the other 60%. We spend significant time during the first month training teams, answering questions, and showing people how to interact with the agents naturally. The tech is the easy part. The people are the hard part.
But here's what's also true: when this stack is installed and running well, it fundamentally changes how a company operates. Not because of any single tool. Because of how they work together. Because your knowledge is always accessible. Because your responses are always fast. Because your team spends their time on work that actually requires a human brain instead of copying information from one place to another.
That's the stack. Every tool, every connection, every decision, every tradeoff. No affiliate links. No "use my code for 20% off." Just what we actually install, because it actually works.
If you've read this far, you're probably the kind of operator who wants this running inside your business. Not the theory. The thing itself.
Want this stack installed for your operation?
Get Your Free AI Audit →