Architecture

7 SaaS Architecture Mistakes That Kill Scalability

By Syed Daud Ghaznavi, Co-Founder · April 2025 · 6 min read

We review a lot of codebases — inherited projects, pre-acquisition due diligence, "please tell us why this is falling over" calls. The same architectural mistakes appear with startling regularity. Not because the engineers who built them were bad — but because these patterns look fine at 1,000 users and catastrophic at 100,000.

Here are the seven we see most often, what they look like, and what to do instead.

Good architecture is boring. If your system design gets interesting at scale, something went wrong earlier.

Synchronous everything

Processing emails, PDF generation, notifications, third-party API calls — all in the HTTP request-response cycle. Works fine under light load. At scale, slow tasks block request threads, latency climbs, and the service falls over.

→ Fix: move any task that doesn't need to return data immediately to an async queue (BullMQ, SQS, RabbitMQ). The HTTP handler enqueues and responds immediately; a worker processes separately. This is non-optional at scale.

No tenant data isolation at the database layer

Single-tenant apps retrofitted for multi-tenancy by adding a tenant_id column and relying on application-layer filtering. One missing WHERE clause in any query leaks data across tenants. This is how data breaches happen.

→ Fix: enforce tenant isolation at the database layer, not just application code. Use row-level security (PostgreSQL RLS) or separate schemas per tenant for anything sensitive. Make it structurally impossible to query without a tenant scope.

Ignoring N+1 query problems until it's too late

An endpoint that runs one query per row in a list. Looks fast in development with 10 rows. In production with 10,000 rows, it's 10,000 database round-trips per request. We've seen this take down databases on otherwise healthy traffic.

→ Fix: query count monitoring in CI (libraries like Hibernate's statistics, Prisma's query logging, or custom middleware). Reject PRs that introduce N+1 patterns. Use eager loading and batch queries by default, not as an optimisation.

Monolithic scheduled jobs

A single cron job that processes all pending records — billing, notifications, data sync — sequentially. At small scale it finishes in seconds. At scale it runs for 4 hours, overlaps with the next run, and creates cascading failures.

→ Fix: decompose into atomic, idempotent units of work. Each job processes one record and is safe to run multiple times. Use a proper job queue with concurrency control instead of raw cron.

No database connection pooling strategy

Each application instance opens its own pool. Under high concurrency, you exhaust database connections — PostgreSQL defaults to 100 — and requests queue. Often mistaken for a database performance problem when it's actually a connection management problem.

→ Fix: use a connection pooler (PgBouncer for PostgreSQL) between your app and database. Size your pool correctly based on actual query duration, not intuition. Monitor active connections in production from day one.

Putting business logic in the frontend

Pricing calculations, permission checks, discount logic — in client-side JavaScript. Besides being trivially bypassable, it means any rule change requires a frontend deployment and potentially hard-coding logic that should live in a central service.

→ Fix: all business rules live on the server. The frontend is for presentation only. This is especially non-negotiable for anything that affects billing, access control, or data visibility.

No observability until something breaks

Structured logging, distributed tracing, and metrics added "later" — which means after the first major outage. Without them, diagnosing production issues takes 10x longer and often requires reading raw log files at 2am.

→ Fix: instrument your application from day one. Every request should have a correlation ID, every slow query should be logged, every service boundary should emit a trace. It costs almost nothing to add at the start and everything to retrofit.

The pattern behind all seven

Every mistake on this list has the same root cause: a decision that was "good enough for now" became load-bearing before anyone noticed. Architecture problems compound — fixing them at scale requires rewriting running production systems, not greenfield work.

The answer isn't to over-engineer from the start. It's to make the right choice for where you're going, not just where you are. If you're planning to scale to 50k+ users, design for that now. The cost is a few weeks of extra thought at the start. The alternative is months of fire-fighting later.

If you want an independent review of your SaaS architecture — whether you're pre-launch or already hitting walls — get in touch. We'll tell you what we see honestly.

Want to work with FiveNodes?

Chat with our AI guide instead of filling a form — instant answers, no sign-up.

Chat with our AI Profile

No sign-up · Instant answers · Free