The Node.js Architecture Playbook

Why Node.js Scales Poorly by Default

Node.js is single-threaded with an event loop. CPU-intensive tasks block the loop. Memory leaks accumulate. Callback hell becomes promise hell. Without deliberate architecture, Node.js applications collapse under production load.

We have seen applications that handle 1,000 requests per second in development grind to 100 requests per second in production, not because of the framework but because of architectural decisions made early that were never revisited.

Memory management is equally critical. Node.js applications run in a single process with a single heap. Without explicit limits, memory usage grows until the operating system kills the process. The symptoms are subtle at first: gradual slowdown, occasional restarts, then cascading failures.

The Cluster Module

Use Node.js cluster module or PM2 to fork multiple processes per machine. Each process runs on a separate core. This turns a single-threaded runtime into a multi-core server. Essential for any production deployment.

PM2 Configuration: PM2 abstracts the cluster pattern with process monitoring, log aggregation, zero-downtime deployments, and automatic restarts. We recommend PM2 for most production deployments.

Instance Count: For CPU-bound workloads, set instances to the number of physical cores. For I/O-bound workloads, oversubscribe to 1.5x or 2x the number of cores. This increases throughput but increases memory usage — monitor heap size carefully.

Worker Threads for CPU Work

For CPU-intensive tasks — image processing, PDF generation, complex calculations — use worker threads. They run on separate threads, not the event loop. This keeps the main thread responsive for I/O operations.

Three Worker Thread Use Cases: Data transformation (parsing large CSV files, transforming JSON structures), image processing (resizing, watermarking, format conversion), and report generation (building PDFs from HTML templates).

Worker Error Handling: An unhandled exception in a worker does not crash the main process, but it does terminate the worker. You must handle worker errors, restart failed workers, and ensure that in-flight requests are not lost.

Memory Management

Set explicit memory limits with --max-old-space-size. Monitor heap usage in production. Profile memory leaks with clinic.js or the built-in inspector. The most common leak sources are: unclosed connections, growing caches without eviction, and event listeners that are never removed.

Memory Limit Rule: Set --max-old-space-size to 75% of available memory. This leaves headroom for the operating system, other processes, and memory spikes during garbage collection. If your application consistently approaches this limit, you have a leak or your workload exceeds the server's capacity.

Top Three Leak Patterns: Unclosed database connections (especially in error paths), unbounded caches (growing without eviction policies), and event listener accumulation (adding listeners without removing them). Each has a straightforward fix once identified, but identification requires systematic profiling.

Error Handling and Resilience

Unhandled Promise rejections crash Node.js processes. Always attach catch handlers to async operations. Use process.on('unhandledRejection') as a safety net, but never rely on it — it is a last resort, not a strategy.

Circuit Breakers: When a downstream service fails repeatedly, the circuit breaker stops sending requests and returns a fallback response immediately. This gives the failing service time to recover and prevents your application from wasting resources on requests that will fail. We implement circuit breakers using the opossum library.

Rate Limiting: Implement token bucket or sliding window rate limits on all public endpoints. Apply different limits per client tier: authenticated users get higher limits than anonymous users; premium subscribers get higher limits than free users. Rate limits should return clear 429 responses with Retry-After headers.

Streaming and Backpressure

Node.js streams are powerful but dangerous. Backpressure — when a fast producer overwhelms a slow consumer — causes memory spikes and eventual crashes. Always handle backpressure events: pause the producer when the consumer's buffer is full, resume when it drains. The pipe() method handles this automatically, but manual stream programming requires explicit backpressure management.

We use streams for file uploads, real-time data processing, and large response generation. The key pattern is to never buffer entire datasets in memory. Stream data from source to destination, transforming it incrementally. For example, when generating a CSV export of a million records, stream rows from the database through a transform stream that formats them as CSV, then pipe to the HTTP response. Memory usage remains constant regardless of dataset size.

Monitoring and Observability

Production Node.js applications need three layers of visibility: application metrics (request rates, response times, error rates), system metrics (CPU, memory, event loop lag), and business metrics (active users, transactions per minute, revenue per request). We instrument applications with the prom-client library for Prometheus metrics, exposing an /metrics endpoint that scrapers poll every 15 seconds.

Event Loop Lag: The most Node.js-specific metric. It measures how long tasks wait in the event loop before execution. High lag indicates blocking operations or insufficient CPU capacity. We alert on event loop lag above 50ms, which indicates user-visible latency degradation.

Our Recommendation

Cluster by default. Worker threads for CPU work. Explicit memory limits. Systematic profiling for leaks. Comprehensive error handling. Streaming for large data. Monitoring for everything. And always profile before you optimise — premature optimisation wastes time and often makes things worse.

The Node.js ecosystem is mature, the tooling is excellent, and the performance is competitive with any runtime when architected correctly. The failures we see are rarely due to Node.js itself; they are due to assumptions carried over from other platforms, insufficient operational visibility, or architectural decisions made under time pressure that were never revisited.