Building a Realtime 2D Multiplayer Metaverse in the Browser
A technical breakdown of the architecture behind a browser-based 2D multiplayer world: server-authoritative movement, WebSockets state sync, Phaser rendering, and proximity video calling via WebRTC.
Overview
Over the last few months, I’ve been building a browser-based 2D multiplayer metaverse inspired by products like Gather, Zep, and Spatial. The goal was simple on paper: let multiple users join a shared space, move around, see each other in real time, chat, and even start a video call when they’re nearby.
In practice, this turned into a deep dive into realtime systems, server-authoritative movement, state synchronization, and multiplayer architecture.
This post is a technical breakdown of how I designed and built the system.
High-Level Architecture
The project is structured as a monorepo:
apps/web → Next.js frontend (Phaser client) apps/http → REST API (auth, spaces, admin, metadata) apps/ws → WebSocket server (realtime multiplayer) packages/db → Prisma + database client
- HTTP server handles auth, spaces, admin actions
- WebSocket server handles presence, movement, chat, and realtime state
- Frontend uses Phaser to render the world and players
Why WebSockets + Server Authority
For a multiplayer system, I wanted:
- Low-latency updates
- No client-side cheating
- A single source of truth
So I went with WebSockets for a persistent realtime connection and server-authoritative movement.
Every movement works like this:
- Client sends intent: { x, y }
- Server validates collision + bounds
- Server updates state
- Server broadcasts the new position to everyone
This ensures: no teleport hacks, no desync, and everyone stays in sync.
World & Collision System
Visually, the world is rendered using Phaser 3 with tilemaps.
Internally, the map uses floor/wall/foreground layers and a collision grid.
Movement is grid-based (32px steps) and the server uses the collision grid to reject invalid moves.
This separation between visual tiles and logical collision grid turned out to be critical for correctness.
Presence System & Room Management
Each “space” is a separate multiplayer room. When a user joins:
- JWT is verified
- User is fetched from DB
- Spawn point is assigned
- User is added to room state
- Presence is broadcast to others
When a user leaves:
- They’re removed from the room
- A “player left” event is broadcast
Each room maintains connected players, their positions, and their avatars.
Realtime Chat
Chat is implemented as room-scoped messages, server-validated, and broadcast only to users in the same space.
This avoids cross-room leaks and trusting clients blindly.
Proximity Video Calling with WebRTC
One of the most fun features was 1-on-1 video calling when two players are nearby.
Flow:
- Client detects proximity
- Allows “start call”
- Signaling happens over WebSocket
- Media flows peer-to-peer using WebRTC
This keeps latency low and server load minimal.
Admin System
Admins can create maps, create elements, and design spaces visually. This makes the world data-driven instead of hardcoded.
Hard Problems & Lessons
Some interesting challenges:
- Smoothing movement without client prediction
- Preventing jitter
- Keeping Phaser rendering and server state in sync
- Managing room lifecycles
- Designing message protocols
What I’d Do If I Scaled This
- Interest management (only send nearby players)
- Snapshot + delta compression
- Region-based servers
- Dedicated SFU for voice
- Proper matchmaking & sharding
Closing Thoughts
This project taught me more about realtime systems and multiplayer architecture than any tutorial ever could. It’s still evolving, but the core architecture is now solid and extensible.