CRDT and Real-time Collaboration 2026 — Multi-User Sync Architecture à la Figma/Notion with Yjs, Automerge, WebSocket, and Presence/Awareness
Posted on: 4/17/2026 7:11:25 AM
Table of contents
- 1. Why real-time collaboration became default UX in 2026
- 2. The journey from Google Wave to mature CRDTs
- 3. OT vs CRDT — An in-depth comparison for technology choosers
- 4. CRDT theory — state-based vs op-based, and why YATA won
- 5. Yjs — internal architecture, shared types, and update format
- 6. Automerge 3 — JSON-first, columnar storage, and sync protocol
- 7. Production architecture — the four most common patterns in 2026
- 8. Persistence — snapshots, log compaction, and versioning
- 9. Scaling — room-based sharding, tombstone GC, and backpressure
- 10. Security — auth, room permissions, and end-to-end encryption
- 11. Six common anti-patterns in CRDT production
- 12. A 2026 go-live checklist for real-time collaboration systems
- 13. The future — AI agents as the Nth CRDT peer
- 14. Conclusion
- 15. References
1. Why real-time collaboration became default UX in 2026
Ten years ago, having a "Save" button in a SaaS product was considered normal. In 2026 the opposite is true — a product that still has a Save button feels dated. Users are used to Figma, Notion, Linear, Google Docs, Miro, and FigJam: you type — the other person sees it instantly; you drag a block — the whole meeting watches your cursor move; you go offline for ten minutes, come back, and no "conflict" dialog asks you to pick a version. Behind that experience sits a family of algorithms rooted in the early 2000s but only truly production-ready in the last five years: CRDTs — Conflict-free Replicated Data Types.
This article is an in-depth handbook for engineers building or evaluating a real collaboration system. We'll cover four layers: theory (what a CRDT is and how it differs from Operational Transformation), implementation (Yjs and Automerge 3 — the two libraries dominating the market), backend architecture (WebSocket transport, presence/awareness, persistence, scaling), and finally the anti-patterns plus a go-live checklist for teams choosing technology in 2026.
Real-time isn't just chat
Three layers of "real-time" are often conflated: broadcast (chat, notifications — handled well by SignalR/Socket.io), shared state (presence, cursors, "who's looking with me" — Liveblocks/Phoenix Channels), and collaborative documents (text, JSON, drawing — Yjs/Automerge). This article focuses on the third layer, which takes the most work but also creates the clearest product differentiation.
2. The journey from Google Wave to mature CRDTs
To see why CRDTs win many 2026 use cases over OT, you need the 25-year arc. Many design decisions in Yjs and Automerge are direct reactions to failures of earlier systems.
3. OT vs CRDT — An in-depth comparison for technology choosers
This is the first and most important architectural decision. Don't trust the "CRDTs are always better" claim — Google Docs still uses OT, Quip uses OT, Etherpad uses OT. CRDTs win some problems, OT wins others. The table below is an honest comparison based on real production experience.
| Criterion | Operational Transformation (OT) | CRDT (Yjs / Automerge) |
|---|---|---|
| Ordering arbiter | Central server required | None needed (peer-to-peer feasible) |
| Offline editing | Hard — must re-transform on reconnect | Easy — merges naturally on reconnect |
| Document memory | Only the current snapshot | Needs metadata (tombstones, logical timestamps) |
| Algorithmic complexity | High (transform function hard to get right for rich text) | Moderate (op + merge rules well-defined) |
| Rich text formatting | Quill OT, ShareDB OT are mature | Yjs Y.XmlFragment, Automerge Rich Text recently stabilized |
| Per-user undo/redo | Needs complex custom logic | Yjs UndoManager built in |
| Peak throughput | High with a well-tuned server (Google Docs level) | High, but needs tombstone GC |
| Ease of reasoning about correctness | Hard — transform property is tricky to verify | Easier — mathematical proofs of convergence exist |
| Strongest use case | Server-centric, online-only document (Google Docs) | Local-first, offline-capable, peer-to-peer (Linear, Figma) |
| Weakest use case | Mobile offline, peer-to-peer | Very large documents (>100 MB) — tombstones balloon |
Quick decision rule
If your product needs (1) offline-first, (2) mobile, (3) wants to reuse open-source editors (Tiptap, Slate, Lexical, ProseMirror), or (4) has a future peer-to-peer need — pick CRDT. If you need (1) online-only, (2) heavy engineering resources, (3) an existing long-tenured OT team, or (4) very large documents with few concurrent ops — OT is still a safe bet. In 2026, the default for new teams is CRDT.
4. CRDT theory — state-based vs op-based, and why YATA won
CRDTs come in two main families. Understanding the difference helps you read Yjs or Automerge source without getting lost.
4.1. State-based CRDTs (CvRDT — Convergent)
Each replica holds full state and defines a merge function that must have three properties: commutative (a+b = b+a), associative ((a+b)+c = a+(b+c)), and idempotent (a+a = a). If all three hold, every replica merging states in any order reaches the same result — that's eventual consistency.
Classic example: the G-Counter (grow-only counter). Each replica keeps a map {nodeId: localCount}. The counter value is the sum of all localCount values. Merge is element-wise max. Property: if two replicas increment simultaneously then sync, the result is always the correct total.
Upside of state-based: simple, no causal ordering needed. Downside: you must send the whole state each sync — impractical for large documents. That's why production rarely uses pure state-based CRDTs for text/JSON documents.
4.2. Operation-based CRDTs (CmRDT — Commutative)
Replicas send operations instead of state. Requirements: ops must commute (applying them in different orders gives the same result), and the transport must be reliable + at-most-once + causal-ordered (parent ops arrive before child ops).
Example: the OR-Set (observed-remove set): when adding an element, tag it with a unique id; when removing, record which ids have been removed. Concurrent add and remove of the same element resolves to add-wins (remove only clears ids it has observed).
Op-based is more bandwidth-efficient but demands a stronger transport layer. Yjs and Automerge are both op-based with an optimization: the op log is compressed into binary updates that can be repackaged as "snapshots" or "deltas".
4.3. List/Text CRDTs — YATA (Yjs) and RGA (Automerge)
The hardest list CRDT problem: two users both insert a character at position 5 — who wins? Indexes (numeric) don't work (they shift after inserts). The solution: assign each character a stable identifier (ID = nodeId + clock), describe an insert as "insert X to the right of Y", then use a tie-breaking rule when both end up at the same position.
graph LR
subgraph U1["User A types "X" after "He""]
A1["H"] --> A2["e"] --> A3["X"]
end
subgraph U2["User B types "Y" after "He""]
B1["H"] --> B2["e"] --> B3["Y"]
end
subgraph MERGE["After merge — YATA tie-breaks by clientID"]
M1["H"] --> M2["e"] --> M3["X (A.5)"] --> M4["Y (B.7)"]
end
style A3 fill:#e94560,color:#fff
style B3 fill:#4CAF50,color:#fff
style M3 fill:#e94560,color:#fff
style M4 fill:#4CAF50,color:#fff
Yjs's YATA is simpler than RGA: each item has origin (the ID of the character to the left at creation), rightOrigin (the character to the right at creation), and tie-breaks by (clientID, clock). On merge, the new item is "nested" between origin and rightOrigin by a deterministic rule. Efficiency: O(N) for normal inserts, with index hashing optimizable toward O(1).
5. Yjs — internal architecture, shared types, and update format
Yjs is the most popular text CRDT in 2026. It's not an editor and has no UI — it's a shared data model: you structure your data with Y.Map, Y.Array, Y.Text, and Y.XmlFragment, and every change automatically syncs with every other peer.
graph TB
subgraph CLIENT["Yjs Client (Browser/Node)"]
DOC["Y.Doc
(root container)"]
TYPES["Shared Types
Y.Text / Y.Array / Y.Map / Y.XmlFragment"]
STORE["DocStore
(Item list, indexed by clientID)"]
ENCODER["Update Encoder
(binary, lib0)"]
AWARE["Awareness Protocol
(presence, cursor, user)"]
end
subgraph TRANSPORT["Provider (transport agnostic)"]
WS["y-websocket"]
WEBRTC["y-webrtc"]
REDIS["y-redis"]
IDB["y-indexeddb (persistence)"]
end
subgraph BACKEND["Backend"]
SYNCSERVER["Sync Server
(broadcasts updates)"]
DB[("Persistence
Postgres / S3 / LevelDB")]
PUBSUB["Redis Pub/Sub
(cross-node)"]
end
DOC --> TYPES --> STORE --> ENCODER
DOC --> AWARE
ENCODER --> WS
ENCODER --> WEBRTC
ENCODER --> REDIS
ENCODER --> IDB
AWARE --> WS
WS --> SYNCSERVER
SYNCSERVER --> DB
SYNCSERVER --> PUBSUB
PUBSUB --> SYNCSERVER
style DOC fill:#e94560,color:#fff
style ENCODER fill:#e94560,color:#fff
style SYNCSERVER fill:#2c3e50,color:#fff
5.1. Shared types and composability
You can nest shared types inside each other: Y.Map<string, Y.Array<Y.Map>> describes a complete Trello board — a map of columns → array of cards → map of card fields. Each sub-tree change is encoded as a minimal update, no need to rebroadcast the whole board.
// Structure of a Notion-like document
import * as Y from 'yjs'
const doc = new Y.Doc()
const blocks = doc.getArray('blocks')
const heading = new Y.Map()
heading.set('type', 'heading')
heading.set('text', new Y.Text('CRDT 2026'))
blocks.push([heading])
const paragraph = new Y.Map()
paragraph.set('type', 'paragraph')
paragraph.set('text', new Y.Text('Hello collaborative world'))
blocks.push([paragraph])
// Every other user will automatically see these 2 blocks after sync
5.2. Binary update format and sync protocol
A Yjs update is tightly optimized binary: VarInt for numbers, dictionary encoding for repeated characters, run-length encoding for consecutive ids. A 1,000-character paragraph typed sequentially compresses to ~150 bytes of update because consecutive IDs get run-length-encoded into a single range.
The sync protocol has two steps (sync step 1 and step 2): the client sends a state vector (a map clientID → max clock seen), and the server returns a diff update (only the ops the client doesn't have). This is why Yjs syncs fast even with large documents: client A has 1 MB of state, reopens after 5 minutes offline, and sync costs only a few KB if there weren't many changes.
Tombstones never truly disappear
When you delete a character, Yjs doesn't really delete it — it marks it deleted. The tombstone keeps the ID so late-arriving ops can still anchor correctly. A heavily edited document can balloon over time. Production strategy: periodically snapshot with Y.encodeStateAsUpdate(doc) to produce a new update that only contains the current state; old unneeded tombstones get compressed.
5.3. The awareness protocol — presence and cursors
Awareness is a concept separate from the document: it's ephemeral state (cursor position, selection range, "user X is viewing"). Not persisted, no tombstones, expires after ~30 seconds without a heartbeat.
// Presence on the client
import { Awareness } from 'y-protocols/awareness'
const awareness = new Awareness(doc)
awareness.setLocalStateField('user', { name: 'Anh Tu', color: '#e94560' })
awareness.setLocalStateField('cursor', { anchor: 120, head: 145 })
awareness.on('change', () => {
for (const [clientId, state] of awareness.getStates()) {
if (clientId === doc.clientID) continue
renderRemoteCursor(clientId, state.user, state.cursor)
}
})
6. Automerge 3 — JSON-first, columnar storage, and sync protocol
Automerge 3 is Yjs's main rival. Different philosophy: Yjs prioritizes text editors, Automerge prioritizes arbitrary JSON documents. If your app isn't an editor but structured data (kanban board, todo list, config sync), Automerge feels more like "just a JSON object".
| Criterion | Yjs | Automerge 3 |
|---|---|---|
| Core language | JavaScript (with C++/Rust ports) | Rust (browser via WASM) |
| API style | Shared types (Y.Map, Y.Text, ...) | JSON proxy + change function |
| Text performance | Best in benchmarks | On par since v3; still slightly slower |
| Arbitrary JSON nesting | Possible but requires declaration | Natural like a regular object |
| Storage format | Binary update list | Columnar binary (better compression) |
| Sync protocol | State vector exchange | Heads-based + bloom filter |
| Multi-language | JavaScript primary, Rust port (yrs) | Rust core, JS/Python/Swift bindings official |
| Editor ecosystem | Tiptap, Slate, ProseMirror, Quill, Lexical, Monaco, CodeMirror | Custom integration needed for most editors |
| When to pick | Rich text editor is the core (Notion, Linear) | Arbitrary JSON documents, native mobile, multi-language stack |
// Automerge 3 — feels like a regular JSON object
import { next as Automerge } from '@automerge/automerge'
let doc = Automerge.from({
todos: [],
filter: 'all'
})
doc = Automerge.change(doc, d => {
d.todos.push({ id: 1, text: 'Learn CRDT', done: false })
d.todos.push({ id: 2, text: 'Refactor backend', done: false })
})
// Sync with other peers
const sync = Automerge.initSyncState()
const [nextDoc, nextSync, message] = Automerge.generateSyncMessage(doc, sync)
// send message over WebSocket / HTTP / any transport
7. Production architecture — the four most common patterns in 2026
The client code is the easy part. The backend is where 90% of production bugs happen. There are four architectural patterns to choose between, each with clear trade-offs.
7.1. Pattern A — Monolithic WebSocket node keeping state in RAM
Each document is "pinned" to a single node. Clients connect to that node via WebSocket. The node keeps the Y.Doc in memory and broadcasts updates between clients on the same node. Periodic snapshotting to disk (every 30 s).
graph LR
C1["Client 1"] --> WS1["WS Node A
(Y.Doc room1)"]
C2["Client 2"] --> WS1
C3["Client 3"] --> WS2["WS Node B
(Y.Doc room2)"]
LB["Load Balancer
(sticky by roomId)"] --> WS1
LB --> WS2
WS1 --> DB[("Snapshot Store
S3 / Postgres")]
WS2 --> DB
style WS1 fill:#e94560,color:#fff
style WS2 fill:#e94560,color:#fff
Fits: under 100k concurrent users, under 10k concurrent rooms, moderate document size. Problems: a node restart loses presence, horizontal scaling needs sticky sessions, cold start is slow when loading documents from disk.
7.2. Pattern B — Stateless WebSocket nodes + Redis pub/sub
No WebSocket node "owns" a fixed room. Updates arriving at a node are decoded → pushed through a Redis pub/sub channel doc:{roomId} → every node subscribed to the channel receives it and broadcasts to its own clients. Document state lives in Redis (or a leader node via Raft).
graph TB
subgraph CLIENTS["Clients"]
C1["Client 1"]
C2["Client 2"]
C3["Client 3"]
C4["Client 4"]
end
subgraph NODES["WebSocket Nodes (stateless)"]
N1["Node A"]
N2["Node B"]
N3["Node C"]
end
subgraph SHARED["Shared State"]
REDIS[("Redis
Pub/Sub + Stream
doc:{roomId}")]
BLOB[("Persistence
Postgres / S3
snapshot + log")]
end
C1 --> N1
C2 --> N2
C3 --> N2
C4 --> N3
N1 <--> REDIS
N2 <--> REDIS
N3 <--> REDIS
REDIS --> BLOB
style REDIS fill:#e94560,color:#fff
style BLOB fill:#2c3e50,color:#fff
Fits: 100k+ concurrent users, Kubernetes with many nodes, need to restart nodes without breaking clients. Problems: Redis becomes a SPOF (need Cluster/Sentinel), high Redis bandwidth cost without filtering, document state needs a leader-based mechanism to avoid write conflicts.
7.3. Pattern C — Actor model (Orleans / Erlang / Cloudflare Durable Objects)
Each room is an actor (a grain in Orleans, a GenServer in Phoenix, a Durable Object in Cloudflare Workers). The actor system guarantees single-writer per room — no race conditions. Clients are routed to the right actor; the actor holds state in RAM and persists asynchronously.
Cloudflare Durable Objects is the most polished implementation for the web today: each document = one Durable Object, running at the edge near users, persisting to Cloudflare's SSD storage. Liveblocks and PartyKit are built on similar ideas.
Fits: global apps that need low latency, teams fine with platform lock-in. Problems: higher cost than Pattern B, harder to debug without actor-model familiarity.
7.4. Pattern D — Managed service (Liveblocks / PartyKit / Pusher)
You don't build the backend. Liveblocks handles WebSocket, persistence, auth, and presence. You pay USD/month based on MAU. Clear trade-off: fast launch, low engineering effort, but vendor lock-in and cost scaling linearly with scale.
Pattern selection rule
Startup at idea-validation stage → Pattern D (Liveblocks). After Series A, MAU > 100k → migrate to Pattern B (Redis). Strong budget and team → Pattern C (Durable Objects/Orleans). Pattern A should be used only for prototypes or internal tools under 1,000 users.
8. Persistence — snapshots, log compaction, and versioning
A common mistake: persist every Yjs update straight into Postgres. After a month, the doc_updates table has 50 million rows and loading a document takes 10 seconds. The right approach combines an append-only log with periodic snapshots.
graph LR
UPD["Update arrives
(binary, ~100B)"] --> APPEND["Append to
log table"]
APPEND --> CHECK{Log size
>= threshold?}
CHECK -->|No| END1[Done]
CHECK -->|Yes| MERGE["Apply all updates
into an in-memory Y.Doc"]
MERGE --> SNAP["Y.encodeStateAsUpdate
=> binary snapshot"]
SNAP --> WRITE["Write snapshot to
doc_snapshot table/S3"]
WRITE --> DELETE["Delete old log entries
(before the snapshot)"]
DELETE --> END2[Done]
style MERGE fill:#e94560,color:#fff
style SNAP fill:#4CAF50,color:#fff
Suggested Postgres schema:
-- Append-only log, very fast to write
CREATE TABLE doc_update (
id BIGSERIAL PRIMARY KEY,
doc_id UUID NOT NULL,
update_data BYTEA NOT NULL, -- Yjs binary update
client_id BIGINT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_update_doc ON doc_update(doc_id, id);
-- Periodic snapshot — fast to load
CREATE TABLE doc_snapshot (
doc_id UUID PRIMARY KEY,
snapshot BYTEA NOT NULL, -- encodeStateAsUpdate
last_update_id BIGINT NOT NULL, -- final log id included in the snapshot
state_vector BYTEA NOT NULL, -- for diff sync
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- On load: doc_snapshot.snapshot + doc_update WHERE id > last_update_id
Common thresholds: snapshot every 100 updates or 1 MB of accumulated log size. Snapshots bigger than 5 MB should move to S3 with the URL stored in Postgres.
8.1. Versioning and time travel
Yjs supports snapshots that can be "frozen" into versions and then diffed. Y.snapshot(doc) returns a small object containing just the state vector; combined with the update log, the document can be reconstructed at any historical point. This is the mechanism behind Notion's "Page History" and Figma's "Version History".
9. Scaling — room-based sharding, tombstone GC, and backpressure
9.1. Room-based sharding
Real-time collaboration documents don't need global consistency — only per-room consistency. This is a beautiful property for scaling: you fully shard by roomId. Each shard can be a consumer group, a Redis cluster, or a Durable Object.
Suggested hash function: shard = consistent_hash(roomId) % N where N is the WebSocket node count. When N changes (auto-scale), consistent hashing ensures only a small fraction of rooms migrate.
9.2. Tombstone GC
The longer a document lives, the more tombstones it accumulates. Yjs doesn't auto-GC because late ops still need anchors. The pragmatic approach: periodically create a "compaction snapshot" — not fully deleting tombstones but packing them into a single block. Mature production stacks use Yjs document v2 (in preview in 2026), which supports "permanent delete" after a safe-time threshold (>1 day = no more delayed ops possible).
9.3. Backpressure when users type too fast
An auto-typing keyboard at 100 chars/second generates 100 updates/second. Multiplied by 50 users in the same room, that's 5,000 messages/second to broadcast. Backpressure patterns:
- Client debounce: Yjs
Y.Transactionbatches many changes into a single update. - Server batching: wait 50 ms and merge all incoming updates into one broadcast.
- Drop awareness: cursor updates can be dropped if a client can't keep up — nothing is persisted, no harm done.
10. Security — auth, room permissions, and end-to-end encryption
Before CRDTs enter the picture, a plain WebSocket server already has two familiar auth problems: who can connect, and once connected, who can join which room. With CRDTs, a third appears: who's allowed to apply which update.
10.1. JWT handshake
WebSocket can't carry headers after the upgrade. Two common approaches: send the JWT in the query string on connect (wss://server/yjs?token=xxx), or use a connection cookie. The server verifies the token, attaches the userId to the connection state, and every subsequent message is checked against that userId.
10.2. Room permissions
When a client subscribes to room doc:{roomId}, the server checks whether userId has access. Cache permissions in Redis with a 60 s TTL to avoid hitting the DB on every message. When permissions change (an admin revokes), publish an event permission:revoked:{userId}:{roomId} so every node disconnects the relevant connections.
10.3. End-to-end encryption with CRDTs
This is a big advantage of CRDTs: because merging is deterministic and the server doesn't need to understand content, you can encrypt updates on the client with a key the server doesn't know. The server just relays binary blobs. Common pattern: a room key derived from a shared password, each Yjs update encrypted with AES-GCM before going over the WebSocket.
E2EE trades off server-side awareness
When you encrypt updates, the server can't run content-based logic (search, mention notifications, full-text indexing). Every such feature must move to the client or use a delegated relay that can decrypt. Weigh carefully before going E2EE.
11. Six common anti-patterns in CRDT production
- Persisting every update straight into Postgres without snapshots. The table balloons and document loading slows. Ship snapshot + log compaction from day one.
- Forgetting to broadcast updates over Redis pub/sub when horizontally scaling. User on node A types, user on node B sees nothing. Test with a multi-instance load balancer from the start.
- Sticky sessions that last too long. A user disconnects, reconnects to another node, and waits 5 seconds for the document to load from the DB. Pattern B (stateless + Redis) avoids this.
- Not debouncing updates. A 50 KB paste produces 50,000 tiny ops instead of one transaction. Always wrap bulk changes in
doc.transact(() => ...). - Awareness leak. Disconnect without cleaning up awareness state — users see "ghost" cursors for users who've left. Handle cleanup in the
onClosehandler. - No per-room quotas. One user sending a 1 MB text over WebSocket can OOM a node. Impose message-size limits (e.g. 256 KB), per-user connection caps (10), and document-size caps (50 MB).
12. A 2026 go-live checklist for real-time collaboration systems
| Item | Requirement |
|---|---|
| Choose the CRDT | Benchmarked Yjs vs Automerge on a real sample document (10 MB, 50 users, 1,000 ops/s) |
| Editor integration | Tiptap/ProseMirror/Slate/Lexical selected, Yjs plugins verified for every needed rich-text feature |
| Transport | WebSocket with 30 s ping/pong heartbeats, exponential-backoff reconnect, long-polling fallback when proxies block WS |
| Persistence | Append log + snapshot every 100 updates, large snapshots in S3, an offline log-compaction script |
| Scaling | Stateless WS nodes + Redis pub/sub, room-based sharding, auto-scale on CPU + connection count |
| Auth | JWT in query string, refresh before expiry, room permission cache TTL 60 s |
| Awareness | Cursor + selection broadcast, 30 s expiry, cleanup on disconnect |
| Backpressure | 50 ms server debounce, drop cursors under overload, 256 KB message size limit |
| Observability | OpenTelemetry traces for every sync round-trip, metrics: connection count, doc size, updates/s, snapshot lag |
| Disaster recovery | Hourly snapshot backups, replay log from S3 within the last 24 hours, RPO < 1 minute |
| Versioning | Time-travel UI for users, "named version" tagged snapshots, fork from an older version |
| Testing | Load test with 10k concurrent WS connections, chaos testing (kill random nodes, network partitions), property-based tests for merge convergence |
13. The future — AI agents as the Nth CRDT peer
2026 brings a new perspective: if a human user is a CRDT peer, why can't an AI agent be one? ElectricSQL's "AI agents as CRDT peers" post argues this is far more natural than designing a bespoke RPC protocol for agents writing into documents.
Concretely: a Claude agent generates a paragraph and applies it to Y.Text just like the user typing. If the user is typing simultaneously, Yjs merges automatically — no "AI overrides user" or "user overrides AI". Both coexist as equal peers. This is the pattern Notion AI and Linear AI are adopting, and it's the cleanest path for generative agents inside multi-user documents.
When designing a new collaboration system in 2026, plan for three peer types from the start: human, AI agent, and integration bot. All three write through the same CRDT layer with the same permission model. That's the durable architecture for the next decade.
14. Conclusion
Real-time collaboration is no longer a "nice to have" SaaS feature in 2026 — it's the baseline expectation. CRDTs have solved the hardest part (merge convergence) through mathematics, leaving engineers with the practical parts: pick the right library (Yjs for editors, Automerge for arbitrary JSON), design the right backend (Pattern B for scale, Pattern D for quick launch), persist correctly (snapshot + log), and avoid the easy anti-patterns.
Don't wait six months post-launch to retrofit collaboration — the migration cost later is always 5-10× higher than building it in from the start. Also don't over-engineer: a three-person startup doesn't need Pattern C on day one; a $99/month Liveblocks subscription is enough to validate the idea before investing in a backend.
Three questions to answer before starting: (1) Is your document mostly text or JSON? (chooses Yjs vs Automerge). (2) Do you need offline-first or online-only? (chooses CRDT vs OT). (3) What's your expected MAU in the next 12 months? (chooses the backend pattern). With those three answers, every remaining technical decision becomes clear.
15. References
- Yjs — Shared data types for building collaborative software (GitHub)
- Yjs Documentation — Introduction and shared types
- y-websocket — WebSocket connector and awareness protocol
- Automerge — JSON CRDT library (GitHub)
- CRDT Benchmarks — Yjs, Automerge, and ShareDB compared
- crdt.tech — Papers, implementations, and CRDT resources
- Local-first software (Ink & Switch) — foundational local-first philosophy
- How CRDTs make multiplayer text editing part of Zed's DNA — Zed Blog
- Architectures for Central Server Collaboration — Matthew Weidner
- AI agents as CRDT peers — building collaborative AI with Yjs (ElectricSQL)
- Liveblocks Yjs — hosted CRDT infrastructure
- PartyKit Documentation — edge multiplayer platform
Background Jobs on .NET 10 in 2026 — Hangfire, Quartz.NET, and MassTransit: Schedulers, Retry, Distributed Lock, and the Outbox Pattern for Production Async Workflows
Passkeys & WebAuthn 2026 — Replacing Passwords with FIDO2, Platform Authenticators, and Phishing-resistant Auth on .NET 10 and Vue
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.