Building a Scalable Online Code Execution Platform (LeetCode Clone)
Designing a reliable, scalable, and safe code execution pipeline: Judge0 sandboxing, async polling, verdict computation (AC/WA/TLE/CE), and persistence with PostgreSQL.
Overview
When I started building a LeetCode-style platform, I thought the hardest part would be the UI. I was wrong. The real challenge turned out to be designing a reliable, scalable, and safe code execution pipeline.
In this post, I’ll walk through how I designed and built a full-stack online judge that can evaluate code submissions asynchronously, handle multiple languages, and return verdicts like AC, WA, TLE, and CE.
The Problem
Running untrusted user code is dangerous and expensive:
- You can’t run it on your own server directly
- It can infinite-loop
- It can eat memory
- It can crash your process
At the same time, users expect:
- Fast feedback
- Accurate verdicts
- Support for multiple languages
- Reliable result storage
So the core problem became: how do you build a safe, asynchronous, scalable code evaluation system?
High-Level Architecture
The system is split into:
Client → API → Submission DB → Judge0 → Polling Worker → Verdict → Client
- Next.js frontend for UI
- Node.js API for submissions & DB
- Judge0 as the sandboxed execution engine
- PostgreSQL for persistence
High-level flow:
Why Judge0 + Async Polling
Instead of running code myself, I use Judge0, which runs code in containers, enforces time/memory limits, and supports many languages.
But Judge0 is async: you submit code, get a token, and you must poll for the result later.
So I designed the system around batch submission, async polling, and deferred verdict storage.
Submission Pipeline
When a user clicks “Submit”:
- Code + language + problem ID → API
Backend flow:
- API stores submission as PENDING
- API sends batch request to Judge0
- Judge0 returns tokens
- Backend polls Judge0 for results
- Backend matches outputs vs expected and computes verdict
- DB is updated
- Frontend shows final result
This keeps the API fast, the UI responsive, and execution fully isolated.
Verdict System
Each test case produces stdout, stderr, time, and memory. The backend compares outputs and sets verdicts like:
- Accepted (AC)
- Wrong Answer (WA)
- Time Limit Exceeded (TLE)
- Runtime Error (RE)
- Compilation Error (CE)
Only the backend decides verdicts — never the client.
Security Model
Key rules:
- Never execute user code on my server
- Never trust client verdicts
- All evaluation is sandboxed, time-limited, and memory-limited
Data Model
Core entities:
- User
- Problem
- TestCase
- Submission
- SubmissionResult
This enables submission history, solved tracking, and per-user progress.
Performance Problems I Hit
- Too many polls → rate limits
- Large batches → slow UI
- Mapping language IDs reliably
- Handling partial failures
What I’d Change At Scale
If this had 100k users, I’d:
- Replace polling with background workers + queues (Redis/BullMQ)
- Cache problem data
- Split submission service and worker service
- Consider running my own sandbox instead of Judge0
- Shard the submissions database
Final Thoughts
This project taught me how real judge systems are designed, how to think in async pipelines, and how to build safer execution systems.
It’s no longer “a LeetCode clone” — it’s a distributed evaluation system.