The Lab
What we're testing right now
Every tool — and every workflow — goes through our own operations before we recommend it.
ToolWorkflowLLM model
New5
⚙ Workflow
Automated Competitor Intelligence
⚙ Workflow
AI Sales Call Analysis
⚙ Workflow
AI Contract Review Workflow
⚙ Workflow
AI Vendor Due Diligence
⚙ Workflow
AI Employee Onboarding Q&A Bot
Up Next7
⚙ Workflow
RAG Internal Wiki Chatbot
⚙ Workflow
Site Photo → ISO-Compliant Daily Diary
Site photos → Claude vision extracts progress + safety notes → ISO-compliant daily diary entry
⚙ Workflow
Voice Memo → Structured Project Update
Voice memo → transcription → Claude structures into a standardised project update in Notion
⚙ Workflow
Municipal Budget Outreach Generator
Scans municipal budget publications → identifies relevant line items → drafts tailored outreach
⚙ Workflow
Automated Proposal Generator
⚙ Workflow
AI Customer Feedback Analysis
Testing6
Nous Research's self-improving agent — persistent memory, skills, browser automation. Evaluating for agentic test execution (reads a test plan, drives a real browser).
⚙ Workflow
Knowledge Base Auto-Update
⚙ Workflow
Hilma Tender Intel Brief
Pulls new Hilma procurement notices → Claude scores fit + drafts a go/no-go brief into Notion
⚙ Workflow
Inbox → Action Queue
Inbound email → Claude triage → structured action items pushed to a Notion task queue
⚙ Workflow
Meeting → Action Pipeline
All notes filed, action items created automatically — saves ~45 min/day
⚙ Workflow
Slack-to-Notion Knowledge Base Builder
Slack messages → n8n → Claude summary → Notion wiki entry, automated daily
Validated15
◎ Model
Primary model for all Since Labs production workflows. Best instruction-following at scale.
◎ Model
Best cost-to-speed ratio for high-volume classification, routing, and extraction tasks
◎ Model
1M context, native audio/video, 20x cheaper than Sonnet — best for high-volume pipelines
◎ Model
Strong vision capabilities; useful for eval comparison and multimodal pipelines
◎ Model
Best open-source reasoning model — strong for math, code, and structured logic
◎ Model
Strong general-purpose open-source model at near-zero inference cost when self-hosted
Best for full-repo refactors, autonomous dev tasks, and MCP-integrated agent loops
Core orchestration layer for all Since Labs AI agent stacks. Cheapest at scale.
Good for wikis and quick lookups; use custom RAG for production agent workflows
Validated for async meeting docs — not strong enough to replace dedicated transcription
Strong for long-context document analysis and multimodal tasks via the web interface
Preferred vector store for all Since Labs RAG pipelines — pgvector on managed Postgres
⚙ Workflow
RAG Document Q&A Pipeline
V
voyage aiCore Since Labs RAG template — PDF/URL → Firecrawl → Voyage embeddings → pgvector → Claude
Skip11
◎ Model
Marginal gains over GPT-4o at significantly higher cost — not worth it for most workflows
◎ Model
Claude Sonnet wins on instruction following and tool use; use only if self-hosting is required
◎ Model
Smaller Llama 4 variant — outclassed by Haiku 3.5 on instruction following
◎ Model
Previous gen; superseded by Llama 4 — self-hosting cost not justified vs. API alternatives
◎ Model
Good but Claude Haiku wins on instruction following at similar price points
◎ Model
Competitive but harder to deploy reliably vs. Claude or DeepSeek alternatives
◎ Model
Impressive for size but not a replacement for frontier models in production workflows
◎ Model
Good open-source option but Gemini 2.0 Flash API is cheaper and more capable
Replaced Zapier for anything beyond 2-step flows
Best for simple 2-step integrations only
Local-first messaging-channel agent (WhatsApp/Telegram/etc). Fun, but n8n + Claude covers our automation needs with better control and auditability.