Technical

5 Jan 2025

10 min read

Building Scalable Web Applications: Lessons from CheckID SA

Technical insights from building CheckID SA which serves thousands of South African users monthly with 99.9% uptime.

Building Scalable Web Applications: Lessons from CheckID SA

CheckID SA is an identity verification platform serving South African users. Built to handle thousands of monthly validations with 99.9% uptime, the project taught me crucial lessons about building applications that scale reliably.

This article covers the technical decisions, architecture choices, and operational practices that keep CheckID SA running smoothly.

The Challenge

CheckID SA validates South African ID numbers in real-time. The requirements were clear:

Instant validation (under 500ms response time)
Handle traffic spikes (validation requests can surge)
Maintain uptime during peak usage periods
Scale cost-effectively

Architecture Decisions

Serverless-First Approach

I chose Next.js with serverless functions for the API layer. This provides:

Automatic scaling (no server management)
Pay-per-use pricing
Built-in redundancy
Global edge distribution

Database Strategy

Rather than a traditional relational database, I used:

Edge-compatible database (Vercel Postgres at edge locations)
Connection pooling for efficient queries
Read replicas for geographic distribution
Caching layer for frequently accessed data

Caching Layer

A multi-tier caching strategy:

Edge caching: Static validation rules cached at CDN edge
Application caching: Frequently validated IDs cached in memory
Database caching: Query results cached with TTL

This reduces database load by 80% during peak traffic.

Performance Optimization

Response Time Targets

API response: < 200ms
Database queries: < 50ms
Cache hits: < 10ms

Every component was optimized to meet these targets.

Database Query Optimization

The ID validation algorithm runs entirely in the database using stored procedures. This:

Eliminates application-level processing time
Reduces network round trips
Leverages database query optimization

Edge Deployment

All static assets and API routes are deployed to edge locations close to users. This reduces latency for South African users significantly.

Reliability and Uptime

Monitoring and Alerting

I implemented comprehensive monitoring:

Uptime monitoring (checks every 60 seconds)
Error rate tracking
Response time monitoring
Database connection pool monitoring

Alerts trigger if:

Response time exceeds 1 second
Error rate exceeds 1%
Database connections approach limits

Graceful Degradation

If the primary database is unavailable, the system:

Falls back to cached validation rules
Returns partial results (basic validation only)
Logs errors for post-processing

Users always get a response, even if full validation isn't available.

Database Redundancy

The database setup includes:

Primary database with automatic failover
Read replicas in multiple regions
Automated backups every 6 hours
Point-in-time recovery capability

Load Testing

Before launch, I ran load tests simulating:

100 concurrent requests
1,000 requests per minute
Sustained traffic over 1 hour

This identified bottlenecks before they affected users.

Scaling Patterns

Horizontal Scaling

The serverless architecture scales horizontally automatically. As traffic increases:

More function instances spin up
Load distributes across instances
No manual intervention required

Database Scaling

Database scaling follows a tiered approach:

Connection pooling (maximize existing resources)
Read replicas (distribute read load)
Vertical scaling (upgrade instance size)
Horizontal scaling (sharding if needed)

Currently at stage 2, with capacity for 10x current load.

Cost Management

Serverless pricing means costs scale with usage:

Low traffic periods: minimal costs
Peak traffic: higher costs, but still predictable
No idle server costs

I monitor costs weekly and set alerts for unusual spikes.

Security Considerations

Input Validation

All ID numbers are validated before processing:

Format validation (correct length and structure)
Sanitization (remove whitespace, normalize)
Rate limiting (prevent abuse)

Data Privacy

ID numbers are never stored (only validation results)
Results are cached with short TTLs
No personal information collected
GDPR-compliant data handling

API Security

API key authentication
Rate limiting per key
Request logging for audit trails
DDoS protection via Cloudflare

Operational Practices

Deployment Strategy

Zero-downtime deployments
Feature flags for gradual rollouts
Automated rollback on errors
Database migrations run separately from code deployments

Error Handling

Every error is:

Logged with context
Categorized by severity
Tracked in error monitoring
Reviewed weekly for patterns

Performance Monitoring

I track:

Average response times
P95 and P99 response times
Error rates by endpoint
Database query performance

Weekly reviews identify optimization opportunities.

Lessons Learned

1. Start with Edge Computing

Deploying to edge locations from day one improved performance significantly. Users in South Africa experience sub-200ms response times despite servers being geographically distant.

2. Cache Aggressively

The caching layer handles most requests without hitting the database. This reduces costs and improves reliability.

3. Monitor Everything

Comprehensive monitoring caught issues before they affected users. Response time alerts prevented degraded performance.

4. Plan for Failure

Graceful degradation ensures users always get value, even when components fail. This builds trust and reliability.

5. Load Test Early

Load testing before launch identified bottlenecks that would have caused issues under real traffic. Fixing these proactively saved headaches later.

Current Performance

CheckID SA now handles:

Thousands of validations monthly
99.9% uptime (monitored)
Average response time: 150ms
P95 response time: 300ms
Zero unplanned downtime

The architecture continues to scale as usage grows, with clear paths for further optimization when needed.

Future Considerations

As traffic grows, potential optimizations include:

Additional read replicas in more regions
More aggressive caching strategies
Database query optimization
CDN integration for static assets

The foundation is solid, and scaling is straightforward.

Start a Project