TypeIDs for Knowledge Bases
TypeID
TypeID is a micro-standard combining a number of well-thought-out decisions.
TypeIDs are a modern, type-safe extension of UUIDv7. Inspired by a similar use of prefixes in Stripe's APIs.
TypeIDs are canonically encoded as lowercase strings consisting of three parts:
- A type prefix (at most 63 characters in all lowercase snake_case ASCII
[a-z_]).- An underscore '_' separator
- A 128-bit UUIDv7 encoded as a 26-character string using a modified base32 encoding.
user_2x4y6z8a0b1c2d3e4f5g6h7j8k
└──┘ └────────────────────────┘
type uuid suffix (base32)
Most identifiers, barring a specialized context, should include a "type" for the identifier. Interconnected computer systems are always passing around identifiers — for someone holding an identifier in their database it is really valuable to know who that identifier belongs to as well as what the identifier is for. For example in a financial institution handling investments, md_quote_2x4y6z8a0b1c2d3e4f5g6h7j8k can tell the receiver that they're holding a reference to a stock price quote, owned by the market data (md) system. Identifier types give engineers metadata, which can be especially useful when there's a bug in the system — no need to stress whether the integer identifier you're currently holding came from the user table or news table.
UUIDv7 has some nice properties. They're sortable by creation time, because they include a timestamp prefix, which improves temporally-close item retrieval in a database. UUIDv7 also includes a universally unique suffix (not as many bits in the space as UUIDv4, but enough), so they can be generated on distributed systems and will not collide.
TypeID chooses to use the Crockford Base32 encoding for the UUIDv7 bytes. They could have chosen the Base64 encoding for more compact identifiers (26 characters for Base32, compared with 22 characters for Base64), but Base64 is not as suited to TypeID's humane purpose. TypeIDs are intended to be easily human readable, and ergonomically easy to copy and paste. Base32 removes any characters that can be visually confused for one another, and along with the use of _ as the separator allow a double mouse click anywhere on the identifier to select the full identifier on all major operating systems.
Since hearing of TypeID, I've defaulted to using it anywhere I want UUIDs.
TypeID and Knowledge Bases
If we use Knowledge Bases on different machines for the same repository, merging commits becomes necessary. In this scenario, simple integer Identifiers are prone to conflict and awful to resolve. Instead, we'll use both a TypeID and an integer identifier. The TypeID will serve as the collision-free primary identifier across machines, and the system will recalculate the integer identifier for every commit. I expect that coding agents, while they should be alright working with TypeIDs, will have an easier time working with the integer identifiers (and they'll use fewer tokens).
A TypeID package exists in opam, but I don't want to pull in all of Batteries just to use a couple utility functions. So I pulled in and adapted the typeid code, placed in Data.Uuid.
Related commits: