On the use of Upsert (Alternative title: When not to use Upsert)
Published: 2024-04-06

I think this is a classic example of when all you have is a hammer everything looks like a nail

This is a short summary about how I spent almost a whole afternoon trying to make upsert work for something I now think upsert is not suitable for.

Initial usecase & constrains

  • I was making a CRUD application
  • I wanted to use the pg_uuidv7 extension to generate the PK (id column) for the table i was working with

What I tried

DEFAULT column (id)

  • DEFAULT is not what is populated with when column value is NULL, it’s only used when INSERT has not specified that column only.

INSERT..ON CONFLICT

Issues

  • I came real close to the final solution with this, with this, users of the API had the ability to specify their own ID and the query will simply have it. It’ll see that it’s not created so it’ll create it.
    • In the best cases it’ll be NULL or "" (empty string), in which cases our uuid generation function will kick in
  • So it seems upsert really is not the ideal candidate for this
  • Because we’re managing uuid generation at database level, we need to do something to the id column if it does not
  • It seems like this kind of upsert (when you want to control content of the ID column) is not super suitable, I found these kind of upserts very useful in bulk upsert kind of operations etc. But this was a simple create or update thing that I wanted to do as part of a basic crud API.

Example

INSERT INTO player AS t (id, name, address)
VALUES (
  coalesce(NULLIF(sqlc.narg(id)::text,''), uuid_generate_v7()::text),
  $1, $2
)
ON CONFLICT (id)
    DO UPDATE SET
        name = coalesce(excluded.name, t.name),
        address = coalesce(excluded.address, t.address)
    RETURNING
        id;
  • When id is NULL, '', or an existing id. In all these 3 cases it’ll work as expected.
  • When user will provide an id which is not-existent in the db and not NULL or an empty string, this query will accept it! So now our db as an id which could be any string while other records have uuidv7 id values. This will mess everything up even if say the id column has other constraints such as UNIQUE. Esp. if we want to do operations expecting that all values of id will be of format uuidv7.
  • So we need to give up on this for our usecase of simple crud api
  • This blogpost(Upsert UUIDs with sqlc Adriano Caloiaro’s personal blog) also has the same problem, no matter how clever we try to be with the uuid generation, if the identifier we upsert(on conflict) on is the one that we want to have control over, this logic is flawed.

GENERATED column (id)

What were the more suitable options

  • Have the uuid generation in the application(backend) if the id is empty
  • Have custom code which checks existence before update. Which is susceptible to race conditions
  • Have different procedures for insert and update.
  • Use MERGE
    • MERGE looks interesting but it has some issues

    • It has a USING (join) thing where instead of table we can use VALUES

          -- name: UpsertTeam :exec
          MERGE INTO team AS t
          USING (
                VALUES(coalesce(NULLIF(sqlc.narg(id)::text, ''), uuid_generate_v7 ()::text), $1, $2, $3)
          ) AS new (id, name, short_name, abbr)
          ON team.id = new.id
          WHEN matched THEN
              UPDATE SET
                  name = coalesce(excluded.name, t.name),
                  short_name = coalesce(excluded.short_name, t.short_name),
                  abbr = coalesce(excluded.abbr, t.abbr)
          WHEN NOT matched THEN
              INSERT VALUES (id, name, short_name, abbr);
      
  • What we doin now? at the moment i don’t want to be clever anymore. simple separate routes for update and insert is what i am going to do.

Resources on upsert