Building a Realtime 2D Multiplayer Metaverse in the Browser

A technical breakdown of the architecture behind a browser-based 2D multiplayer world: server-authoritative movement, WebSockets state sync, Phaser rendering, and proximity video calling via WebRTC.

Next.jsReactPhaserWebSocketsWebRTCNode.jsExpressPrismaPostgreSQLTurborepo

Overview

Over the last few months, I’ve been building a browser-based 2D multiplayer metaverse inspired by products like Gather, Zep, and Spatial. The goal was simple on paper: let multiple users join a shared space, move around, see each other in real time, chat, and even start a video call when they’re nearby.

In practice, this turned into a deep dive into realtime systems, server-authoritative movement, state synchronization, and multiplayer architecture.

This post is a technical breakdown of how I designed and built the system.

High-Level Architecture

The project is structured as a monorepo:

apps/web → Next.js frontend (Phaser client)
  apps/http → REST API (auth, spaces, admin, metadata)
  apps/ws → WebSocket server (realtime multiplayer)
  packages/db → Prisma + database client
  • HTTP server handles auth, spaces, admin actions
  • WebSocket server handles presence, movement, chat, and realtime state
  • Frontend uses Phaser to render the world and players

Why WebSockets + Server Authority

For a multiplayer system, I wanted:

  • Low-latency updates
  • No client-side cheating
  • A single source of truth

So I went with WebSockets for a persistent realtime connection and server-authoritative movement.

Every movement works like this:

  • Client sends intent: { x, y }
  • Server validates collision + bounds
  • Server updates state
  • Server broadcasts the new position to everyone

This ensures: no teleport hacks, no desync, and everyone stays in sync.

World & Collision System

Visually, the world is rendered using Phaser 3 with tilemaps.

Internally, the map uses floor/wall/foreground layers and a collision grid.

Movement is grid-based (32px steps) and the server uses the collision grid to reject invalid moves.

This separation between visual tiles and logical collision grid turned out to be critical for correctness.

Presence System & Room Management

Each “space” is a separate multiplayer room. When a user joins:

  • JWT is verified
  • User is fetched from DB
  • Spawn point is assigned
  • User is added to room state
  • Presence is broadcast to others

When a user leaves:

  • They’re removed from the room
  • A “player left” event is broadcast

Each room maintains connected players, their positions, and their avatars.

Realtime Chat

Chat is implemented as room-scoped messages, server-validated, and broadcast only to users in the same space.

This avoids cross-room leaks and trusting clients blindly.

Proximity Video Calling with WebRTC

One of the most fun features was 1-on-1 video calling when two players are nearby.

Flow:

  • Client detects proximity
  • Allows “start call”
  • Signaling happens over WebSocket
  • Media flows peer-to-peer using WebRTC

This keeps latency low and server load minimal.

Admin System

Admins can create maps, create elements, and design spaces visually. This makes the world data-driven instead of hardcoded.

Hard Problems & Lessons

Some interesting challenges:

  • Smoothing movement without client prediction
  • Preventing jitter
  • Keeping Phaser rendering and server state in sync
  • Managing room lifecycles
  • Designing message protocols

What I’d Do If I Scaled This

  • Interest management (only send nearby players)
  • Snapshot + delta compression
  • Region-based servers
  • Dedicated SFU for voice
  • Proper matchmaking & sharding

Closing Thoughts

This project taught me more about realtime systems and multiplayer architecture than any tutorial ever could. It’s still evolving, but the core architecture is now solid and extensible.