monad

Monad RPC/WSS Best Practices: Rate Limits, Caching, and Dont Melt\

Builder-first notes and practical takeaways.

Panagiotis

27 Feb 2026 — 3 min read

The Monad free tier enforces a strict hard cap of 5 requests per minute, as documented by freerpc.com/monad. Builders running anything beyond trivial smoke tests will immediately hit this ceiling, so production workloads should never rely on free endpoints for sustained use.

For higher throughput, the Tatum Monad endpoint supports up to 200 requests per second, per docs.tatum.io/monad. This is a hard provider-side limit; exceeding it triggers HTTP 429 errors, so batching and rate control are non-negotiable for stable integrations.

QuickNode’s Monad testnet infrastructure has handled real-world scale—over 40 billion requests monthly, averaging 25,000 RPS with peaks at 50,000 RPS and 99.99% uptime (QuickNode case study). These numbers are only sustainable with disciplined batching, cache utilization, and error-aware retry logic.

HTTP endpoints on Monad are stateless and scale horizontally, making them suitable for high-throughput polling and batch queries. In contrast, WSS endpoints support real-time subscriptions but enforce stricter disconnects and session state, so they’re less forgiving of bursty or idle traffic (QuickNode docs).

WSS connections require regular heartbeat/ping messages to avoid silent disconnects, as detailed in monad.xyz/docs/rpc. Missed heartbeats or protocol violations often result in disconnect codes like 1006 (abnormal closure) or 1011 (internal error), which are not always accompanied by explicit error payloads.

Error handling diverges: HTTP endpoints return explicit 429 payloads when rate limits are breached, while WSS endpoints may simply drop the connection or messages without warning. Monitoring for disconnect codes and dropped subscriptions is essential for WSS consumers.

Batch RPC call support is robust on Monad, but payload limits and practical batch size tuning are critical. Oversized batches can trigger node throttling or outright rejection, so start with conservative batch sizes and progressively increase under feature flag control, monitoring for error spikes.

Compute unit cost varies significantly by RPC method and payload complexity. High-cost queries (e.g., deep state reads or large batch payloads) can exhaust quotas rapidly, leading to throttling or downtime. Incident analysis has shown that unbatched, repetitive queries can cause compute unit spikes that destabilize nodes.

Caching strategies should be tied to Monad block height or state changes, not just time-based TTLs. Serving stale data between blocks is often acceptable, but cache invalidation must be triggered on new block events to avoid unnecessary load and quota exhaustion.

Exponential backoff retry logic should be tuned specifically to Monad error codes and node-side throttling behaviors. For HTTP 429s, backoff intervals should increase aggressively; for WSS disconnects, reconnect logic must account for silent drops and avoid rapid reconnect loops.

Monitoring for compute unit spikes, accidental DoS patterns, and near-threshold usage is non-optional. Instrumentation should track both request rates and compute unit consumption per method, with alerts for anomalous surges or sustained high usage.

Progressive rollout and rollback of batch size, cache, and retry logic changes should be managed via feature flags. This allows for rapid mitigation if new patterns trigger unexpected rate limiting or node instability, minimizing downtime and user impact.

Batch payload limits on Monad endpoints are not just theoretical; real-world production workloads have seen node instability when batch sizes were increased without monitoring compute unit consumption. Tuning batch size should be an iterative process, with rollback plans in place if error rates or disconnects spike.

When using WSS endpoints, heartbeat/ping intervals must be calibrated to avoid disconnects, especially under fluctuating network conditions. Silent disconnects (code 1006) are common if heartbeats are missed, so production clients should implement robust ping logic and monitor for these disconnects as a signal to adjust intervals (monad.xyz/docs/rpc).

HTTP 429 errors from exceeding the Tatum Monad endpoint’s 200 RPS cap are explicit, but WSS disconnects can be silent or use codes like 1011, making it critical to differentiate between rate limiting and underlying node issues. Exponential backoff for HTTP should be paired with jittered reconnects for WSS to prevent cascading failures.

Cache invalidation tied to block height, rather than fixed TTL, is essential for high-frequency polling on QuickNode’s Monad testnet, which regularly sustains 25,000+ RPS. This approach reduces redundant queries and helps avoid compute unit exhaustion during block production surges (QuickNode case study).

Production workloads should never rely on the Monad free tier for anything beyond smoke tests, as the 5 requests per minute cap is enforced at the provider level and will result in immediate throttling or dropped requests (freerpc.com/monad).

Monad RPC/WSS Best Practices: Rate Limits, Caching, and Dont Melt\

Panagiotis

Read more

Walrus Memory Turns Decentralized Storage Into Agent Infrastructure

Why Polygon Still Matters: Stablecoin Payments Are Becoming Its Real Use Case

10 Crypto Use Cases for AI Agents That Actually Make Sense

Sui's Gasless Stablecoin Transfers Are Really About Payments UX