How We Improved the Performance of an Assessment Platform to Handle 5000+ Concurrent Users

One of our clients was facing a significant performance issue with their assessment platform. When a large number of concurrent users (around 2000-5000) accessed the platform, it became very slow and practically unusable. They turned to us to resolve this critical problem.

Initial Audit and Analysis

We began by thoroughly auditing the application. The frontend was built with Next.js and deployed as a static site using S3 and CloudFront. The backend was developed with NestJS, deployed in a Kubernetes cluster, and the database was PostgreSQL, hosted on Amazon RDS. The architecture was straightforward, comprising a frontend, backend, and database.

Identifying API Inefficiencies

To understand the user experience and identify bottlenecks, we used the application ourselves. We documented all the REST APIs called during the main user flow. Our observations revealed that some APIs were redundant, and others returned more data than necessary for the frontend. We streamlined the API calls by removing unused endpoints and trimming unnecessary data payloads.

Optimizing ORM Usage

The backend application utilized MikroORM as the ORM layer to connect with the PostgreSQL database. In MikroORM, relationships between tables are handled using the populate parameter. We noticed that some frequently used APIs were populating unnecessary relationships, resulting in excessive queries to the database and increased latency. By eliminating these redundant relationships, we significantly reduced the number of database queries and improved API response times.

Database Optimization

Recognizing that databases often become the bottleneck in applications, we focused on optimizing the database performance. We enabled enhanced metrics in RDS to analyze heavily used and costliest queries. Additionally, we enabled the pg_stat_statements extension to collect detailed query metrics. Analysis with EXPLAIN and EXPLAIN ANALYZE in PostgreSQL revealed that some queries performed full table scans on large tables, slowing them down. We implemented indexes and partial indexes to speed up these queries. Indexing, often an underappreciated solution, resolved many performance issues.

Monitoring and Metrics

To gain deeper insights, we set up a monitoring system in our EKS cluster using the open-source SigNoz monitoring platform. SigNoz provides comprehensive metrics, including P99 latency of APIs, the most used APIs, most used queries, Kubernetes pod and node metrics, database query metrics, costliest queries, CPU and memory usage, and more. This monitoring setup also allowed us to view all application exceptions and trace API requests to identify bottlenecks more effectively.

Scaling and Load Testing

We considered adding Redis caching to further speed up API responses. However, after achieving the desired performance improvements, we decided against it to avoid the complexities of cache invalidation issues.

We also integrated Karpenter into our EKS cluster, enabling auto-scaling and adding more nodes as needed to handle increased load.

To ensure our optimizations were effective, we conducted distributed load testing using Artillery, Playwright, and AWS Fargate. This comprehensive load testing confirmed that our improvements could handle the expected user load efficiently.

Conclusion

Through a combination of API optimization, database indexing, enhanced monitoring, and auto-scaling, we successfully improved the performance of our client's assessment platform. These measures ensured that the platform could handle over 5000 concurrent users smoothly, providing a seamless and responsive user experience.

At CyberMind Works, we specialize in delivering robust and scalable software solutions tailored to meet your unique needs. Whether you’re experiencing performance issues or need a custom-built application, our team has the expertise to transform your challenges into successes. Partner with us to leverage our comprehensive approach to software development and optimization, ensuring your platform performs flawlessly under any load.

Ready to take your platform to the next level? Contact us today to learn how we can help you achieve your performance and scalability goals.

How We Improved the Performance of an Assessment Platform to Handle 5000+ Concurrent Users

Initial Audit and Analysis

Identifying API Inefficiencies

Optimizing ORM Usage

Database Optimization

Monitoring and Metrics

Scaling and Load Testing

Conclusion

About Boopesh Mahendran

CONTACT US

How can we at CMW help?

Reach out to us here!