Validating Usernames in a Smart Contract

Alexander CodesAlexander Codes
5 min read

Introduction

One of the first challenges in building an application is handling user registration.

We often need to allow users to choose their own username, while ensuring that each one is unique.

Usernames also usually require validation, with constraints on the minimum and maximum number of characters and the types of characters allowed.

These constraints are easy to handle in traditional web applications, but they can be tricky to implement within the confines of a smart contract.

The Puzzle

Let's write a smart contract method that validates a username against the following constraints:

  • A username must contain 5-15 characters (same as X handles)

  • A username can only contain lowercase letters, digits, and hyphens

  • A username cannot contain consecutive hyphens

  • A username cannot start or end with a hyphen

๐Ÿ’ก
The smart contract will be written in Algorand Python.

Validating the Character Set

The valid character set is:

>>> import string
>>> print(f"{string.ascii_lowercase}{string.digits}-")
'abcdefghijklmnopqrstuvwxyz0123456789-'

In regular Python, it's easy to check whether a string only contains these characters:

VALID_CHARS = "abcdefghijklmnopqrstuvwxyz0123456789-"

def is_valid(username: str) -> bool:
    return all(c in VALID_CHARS for c in username)

But if we adopt a similar approach in a smart contract, we'll end up iterating over VALID_CHARS too many times and running out of opcodes.

It would look something like this:

for a in username:
    for b in VALID_CHARS:
        if a == b:
            continue
    raise ValueError("Invalid character")

The goal should be to iterate over the username once and avoid iterating over anything else.

So how can we achieve that?

Encoding

Lowercase letters, digits, and hyphens are all supported in ASCII encoding.

There are 256 ASCII characters, each of which can be represented by a single 8-bit integer.

To check whether a particular character in a string is valid as part of a username, we can convert it to an integer and check whether it falls in the correct range.

The valid numbers corresponding to the character set are:

  • 45 = hyphen

  • [48, 57] = the digits [0, 9]

  • [97, 122] = the lowercase letters, a-z.

Which we can check for in Python using:

valid_number = lambda x: x == 45 or 48 <= x <= 57 or 97 <= x <= 122

Adapting the previous example:

def is_valid(username: str) -> bool:
    valid_number = lambda x: x == 45 or 48 <= x <= 57 or 97 <= x <= 122
    return all(valid_number(ord(c)) for c in username)

The ord function in Python converts a character to its Unicode number.

Smart Contract: First Approach

To make it a bit more readable, let's define some constants:

HYPHEN = 45
ZERO = 48
NINE = 57
LOWER_A = 97
LOWER_Z = 122

Then we can validate a username against the constraints as follows:

class Registration(ARC4Contract):

    @arc4.abimethod
    def validate_username(self, username: String) -> None:
        assert 5 <= username.bytes.length <= 15, "Username must be between 5 and 15 characters"

        prev = UInt64(HYPHEN)
        for byte in username.bytes:
            curr = op.btoi(byte) # ord
            assert not curr == prev == HYPHEN, "Username cannot start with a hyphen or contain consecutive hyphens"
            assert curr == HYPHEN or LOWER_A <= curr <= LOWER_Z or ZERO <= curr <= NINE, "Username can only contain lowercase letters, digits, or hyphens"
            prev = curr
        assert prev != HYPHEN, "Username cannot end with a hyphen"

A username cannot start or end with a hyphen, and it can't contain any consecutive hyphens.

To track this, we iterate over the username one byte at a time, storing the last byte seen in a variable called prev.

If at any point we find that the current character and the previous character are both hyphens, the username is invalid.

prev is initially set to the hyphen number, which means we don't need to implement separate logic to check that the first character is not a hyphen.

If it is, this will raise an error:

assert not curr == prev == HYPHEN

After the username has been iterated over, we need to add one more step to check that the last character is not a hyphen:

assert prev != HYPHEN

This works pretty well, and can validate a username of up to 15 characters while staying in the opcode budget.

But the conditional steps leave some room for improvement.

Smart Contract: Second Approach

Since there are only 256 ASCII characters, we can store the mapping in a bitmask.

Each set bit represents a valid character.

For example, if the bitmask is 00100000..., it means that the ASCII character corresponding to the number 2 is valid.

Instead of chaining conditions:

curr == HYPHEN or LOWER_A <= curr <= LOWER_Z or ZERO <= curr <= NINE

All we need to do is get the corresponding bit: op.getbit(bitmask, curr), and check whether it's set.

We can construct the bitmask in regular Python, and then use it as a constant in algopy:

from functools import reduce

VALID_CHARS = "abcdefghijklmnopqrstuvwxyz0123456789-"
ords = {ord(c) for c in VALID_CHARS}
bitmask = reduce(lambda acc, i: acc | (1 << (255 - i)), filter(ords.__contains__, range(256)), 0).to_bytes(32, "big")

The updated contract is:

VALID_CHARS = b'\x00\x00\x00\x00\x00\x04\xff\xc0\x00\x00\x00\x00\x7f\xff\xff\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

class Registration(ARC4Contract):

    @arc4.abimethod
    def validate_username(self, username: String) -> None:
        assert 5 <= username.bytes.length <= 15, "Username must be between 5 and 15 characters"

        prev = UInt64(HYPHEN)
        for byte in username.bytes:
            curr = op.btoi(byte)
            assert not curr == prev == HYPHEN, "Username cannot start with a hyphen or contain consecutive hyphens"
            assert op.getbit(Bytes(USERNAME_CHARS), curr), "Username must only contain lowercase alphanumeric characters and hyphens"
            prev = curr
        assert prev != HYPHEN, "Username cannot end with a hyphen"

Conclusion

Bitmasks can be a great way to save opcodes in smart contracts.

They allow us to efficiently represent a large number of binary variables.

We can precompute the values off chain, enabling quick single-operation lookups within the contract.

Check out my previous article 'Building a Sudoku Validator on Algorand' if you want to take a deeper dive on bitmasks.

1
Subscribe to my newsletter

Read articles from Alexander Codes directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alexander Codes
Alexander Codes

Data Engineer โŒจ๏ธ | Pythonista ๐Ÿ | Blogger ๐Ÿ“– You can support my work at https://app.nf.domains/name/alexandercodes.algo