Building a Scalable Online Code Execution Platform (LeetCode Clone)

Designing a reliable, scalable, and safe code execution pipeline: Judge0 sandboxing, async polling, verdict computation (AC/WA/TLE/CE), and persistence with PostgreSQL.

Next.jsNode.jsJudge0PrismaPostgreSQLAsync Pipelines

Overview

When I started building a LeetCode-style platform, I thought the hardest part would be the UI. I was wrong. The real challenge turned out to be designing a reliable, scalable, and safe code execution pipeline.

In this post, I’ll walk through how I designed and built a full-stack online judge that can evaluate code submissions asynchronously, handle multiple languages, and return verdicts like AC, WA, TLE, and CE.

The Problem

Running untrusted user code is dangerous and expensive:

You can’t run it on your own server directly
It can infinite-loop
It can eat memory
It can crash your process

At the same time, users expect:

Fast feedback
Accurate verdicts
Support for multiple languages
Reliable result storage

So the core problem became: how do you build a safe, asynchronous, scalable code evaluation system?

High-Level Architecture

The system is split into:

Client → API → Submission DB → Judge0 → Polling Worker → Verdict → Client

Next.js frontend for UI
Node.js API for submissions & DB
Judge0 as the sandboxed execution engine
PostgreSQL for persistence

High-level flow:

Why Judge0 + Async Polling

Instead of running code myself, I use Judge0, which runs code in containers, enforces time/memory limits, and supports many languages.

But Judge0 is async: you submit code, get a token, and you must poll for the result later.

So I designed the system around batch submission, async polling, and deferred verdict storage.

Submission Pipeline

When a user clicks “Submit”:

Code + language + problem ID → API

Backend flow:

API stores submission as PENDING
API sends batch request to Judge0
Judge0 returns tokens
Backend polls Judge0 for results
Backend matches outputs vs expected and computes verdict
DB is updated
Frontend shows final result

This keeps the API fast, the UI responsive, and execution fully isolated.

Verdict System

Each test case produces stdout, stderr, time, and memory. The backend compares outputs and sets verdicts like:

Accepted (AC)
Wrong Answer (WA)
Time Limit Exceeded (TLE)
Runtime Error (RE)
Compilation Error (CE)

Only the backend decides verdicts — never the client.

Security Model

Key rules:

Never execute user code on my server
Never trust client verdicts
All evaluation is sandboxed, time-limited, and memory-limited

Data Model

Core entities:

User
Problem
TestCase
Submission
SubmissionResult

This enables submission history, solved tracking, and per-user progress.

Performance Problems I Hit

Too many polls → rate limits
Large batches → slow UI
Mapping language IDs reliably
Handling partial failures

What I’d Change At Scale

If this had 100k users, I’d:

Replace polling with background workers + queues (Redis/BullMQ)
Cache problem data
Split submission service and worker service
Consider running my own sandbox instead of Judge0
Shard the submissions database

Final Thoughts

This project taught me how real judge systems are designed, how to think in async pipelines, and how to build safer execution systems.

It’s no longer “a LeetCode clone” — it’s a distributed evaluation system.