Skip to content

Nix Base32 Encoding

Nix uses a custom base32 encoding that differs from both RFC 4648 and other common base32 variants.

Alphabet

Standard (RFC 4648):  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 2 3 4 5 6 7
Nix:                  0 1 2 3 4 5 6 7 8 9 a b c d f g h i j k l m n p q r s v w x y z

The Nix alphabet has 32 characters: digits 0-9 and lowercase letters, but with e, o, t, u removed.

Why these four? They could be confused with other characters (e/3, o/0, t/+), though the exact rationale is historical.

Bit extraction

This is where Nix base32 differs most significantly from RFC 4648. The bit extraction order is reversed.

RFC 4648 approach

Standard base32 processes input left-to-right, taking 5-bit groups from the most significant bits first:

Input bytes:   [b0] [b1] [b2] ...
Bits:          76543210 76543210 76543210 ...
Groups:        |4444433333|22222|11111|00000|...

Nix approach

Nix processes from the last 5-bit position down to the first, extracting bits across byte boundaries:

for i in range(out_len - 1, -1, -1):   # high positions first
    b = i * 5
    j = b // 8                          # byte index
    k = b % 8                           # bit offset within byte
    c = (data[j] >> k)                  # bits from byte j
    if j + 1 < len(data):
        c |= data[j + 1] << (8 - k)    # bits from byte j+1
    output.append(CHARS[c & 0x1f])

This means:

  • The first output character encodes the highest 5-bit group
  • Bits are extracted from the input in little-endian byte order
  • Cross-byte extraction: a 5-bit group can span two adjacent input bytes

Consequence

The same input bytes produce completely different outputs between RFC 4648 and Nix base32, even if you swap the alphabets. The encoding is structurally different, not just an alphabet substitution.

Output length

For n input bytes, the output is ceil(n * 8 / 5) characters:

Input bytes Output chars Usage
0 0
16 26 MD5 (not used by Nix)
20 32 Store path hashes
32 52 SHA-256 hashes
64 103 SHA-512 hashes

Decoding

Decoding reverses the process: iterate over the input string in reverse, placing 5-bit groups into the output byte array at the appropriate positions.

for i, ch in enumerate(reversed(s)):
    digit = ALPHABET.index(ch)
    b = i * 5
    j = b // 8
    k = b % 8
    result[j] |= (digit << k) & 0xFF
    carry = digit >> (8 - k)
    if carry and j + 1 < out_len:
        result[j + 1] |= carry

Comparison table

Property RFC 4648 Nix
Alphabet A-Z2-7 0-9a-z minus eotu
Padding = pad to multiple of 8 chars No padding
Bit order MSB-first, left-to-right LSB-first, right-to-left
Case Case-insensitive Lowercase only
Byte order Big-endian grouping Little-endian grouping

Example

SHA-256 of "hello":

Hex:        2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Nix base32: 094qif9n4cq4fdg459qzbhg1c6wywawwaaivx0k0x8xhbyx4vwic

The same bytes in RFC 4648 base32 would produce a completely different (and longer, with padding) string.