Nix Base32 Encoding¶
Nix uses a custom base32 encoding that differs from both RFC 4648 and other common base32 variants.
Alphabet¶
Standard (RFC 4648): A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 2 3 4 5 6 7
Nix: 0 1 2 3 4 5 6 7 8 9 a b c d f g h i j k l m n p q r s v w x y z
The Nix alphabet has 32 characters: digits 0-9 and lowercase letters, but with e, o, t, u removed.
Why these four? They could be confused with other characters (e/3, o/0, t/+), though the exact rationale is historical.
Bit extraction¶
This is where Nix base32 differs most significantly from RFC 4648. The bit extraction order is reversed.
RFC 4648 approach¶
Standard base32 processes input left-to-right, taking 5-bit groups from the most significant bits first:
Input bytes: [b0] [b1] [b2] ...
Bits: 76543210 76543210 76543210 ...
Groups: |4444433333|22222|11111|00000|...
Nix approach¶
Nix processes from the last 5-bit position down to the first, extracting bits across byte boundaries:
for i in range(out_len - 1, -1, -1): # high positions first
b = i * 5
j = b // 8 # byte index
k = b % 8 # bit offset within byte
c = (data[j] >> k) # bits from byte j
if j + 1 < len(data):
c |= data[j + 1] << (8 - k) # bits from byte j+1
output.append(CHARS[c & 0x1f])
This means:
- The first output character encodes the highest 5-bit group
- Bits are extracted from the input in little-endian byte order
- Cross-byte extraction: a 5-bit group can span two adjacent input bytes
Consequence¶
The same input bytes produce completely different outputs between RFC 4648 and Nix base32, even if you swap the alphabets. The encoding is structurally different, not just an alphabet substitution.
Output length¶
For n input bytes, the output is ceil(n * 8 / 5) characters:
| Input bytes | Output chars | Usage |
|---|---|---|
| 0 | 0 | — |
| 16 | 26 | MD5 (not used by Nix) |
| 20 | 32 | Store path hashes |
| 32 | 52 | SHA-256 hashes |
| 64 | 103 | SHA-512 hashes |
Decoding¶
Decoding reverses the process: iterate over the input string in reverse, placing 5-bit groups into the output byte array at the appropriate positions.
for i, ch in enumerate(reversed(s)):
digit = ALPHABET.index(ch)
b = i * 5
j = b // 8
k = b % 8
result[j] |= (digit << k) & 0xFF
carry = digit >> (8 - k)
if carry and j + 1 < out_len:
result[j + 1] |= carry
Comparison table¶
| Property | RFC 4648 | Nix |
|---|---|---|
| Alphabet | A-Z2-7 |
0-9a-z minus eotu |
| Padding | = pad to multiple of 8 chars |
No padding |
| Bit order | MSB-first, left-to-right | LSB-first, right-to-left |
| Case | Case-insensitive | Lowercase only |
| Byte order | Big-endian grouping | Little-endian grouping |
Example¶
SHA-256 of "hello":
Hex: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Nix base32: 094qif9n4cq4fdg459qzbhg1c6wywawwaaivx0k0x8xhbyx4vwic
The same bytes in RFC 4648 base32 would produce a completely different (and longer, with padding) string.