kiwicopple 6 months ago

A couple of years ago we shared[0] an experimental extension for CRDTs in Postgres. This was a good start, but it had a major limitation: CRDTs can be very chatty and so they quickly fill up the WAL.

We just released a new version which solves this problem: https://github.com/supabase/pg_crdt

The new approach uses UNLOGGED tables for documents and logged tables for changes. This enables us to merge changes efficiently in-memory, then persists just the changes directly to the WAL (previously it was sending the entire CRDT).

The new version also takes advantage an advanced in-memory object persistence feature in Postgres called "expanded datum"[1] and some optimizations to this feature that are coming in Postgres 18[2]. This allows for more complex operations without serializing and deserializing the documents between every operation, which gets progressively slower as documents grow.

What is a CRDT? A CRDT (Conflict-free Replicated Data Type) is a data structure that syncs across multiple devices or users. They are used in multi-player applications like Figma and Notion. For example: A shared text editor where multiple people type at once, and everyone sees the same final result without manual conflict resolution.

The end goal here is that you will be able to store CRDTs as a data type in your database, and treat Postgres simply as a "peer". A good example would be storing text for a blog post:

    create table posts (
      id serial primary key,
      title text not null
      content autodoc not null default '{}'
    );
Repo: https://github.com/supabase/pg_crdt

Docs: https://supabase.github.io/pg_crdt/automerge/

----

[0] initial release/discussion: https://news.ycombinator.com/item?id=33931971

[1] expanded datum: https://www.postgresql.org/docs/current/storage-toast.html#S...

[2] PG18 optimizations: https://www.postgresql.org/message-id/3363452.1737483125%40s...

  • michelpp 6 months ago

    Main contributor here, had a lot of fun working with automerge and its amazing C API around the core Rust library and collaborating with Supabase. This allowed me to take advantage of expanded datum features and improvement in postgres 18 driven by a feature I needed for another project storing large in-memory sparse matrices [1][2]. Based on these new features, lots of rich object data model API can now be used very efficiently from sql when doing multiple complex operations in a function.

    This is still experimental bleeding edge stuff that's only available on the unreleased pg 18 so far (using the current dev mainline branch) but will open up a lot of possibilities in the future. Future improvements I hope to make to pg_crdt include implementing the sync() api for easy peer-to-peer synchronization and a full round-trip example application. Word is automerge 3.0 will have some nice new features and I look forward to that.

    As an HN specific side note, there was a choice to make initially to use either Rust via pgrx or the automerge C API. At the time (and maybe still?) pgrx did not support expanded datum, which I think would be an amazing addition, but it was out of my scope, and my familiarity with the postgres C extension API made us choose C. Happy to discuss further improvements in this regard as I go up the Rust learning curve myself.

    [1] https://www.postgresql.org/message-id/flat/647219.1736019347...

    [2] https://onesparse.com/docs.html