Black binocular on round device

Tries are cool, have you heard of them? I first learned about them when I was doing the CS50 course on edX. In this course, we have to implement a trie and use it to find words that are spelled incorrectly or do not even exist in a excerpt of text.

But what is trie, anyway? Trie is this tree-like data structure whose nodes maps, commonly, to characters. That way, if we are looking at the root node (that represent an empty string), and traverse to a node representing the letters “M”, “A”, “R”, “I”, “O”, for example, we would be able to check if the world “MARIO” exists in our trie. But we can add another word such as “MARIA” and we can use the already existing characters to save up some space.

Trie with the words "MARIA" and "MARIO added

So by adding words to a trie, we can lookup easily if the words exist or are valid. There are various posts on tries on the internet if you want to learn more about them, but in this one we are going to take a look into a problem: let’s try (pun unintended) to add a wildcard (i.e. a symbol used to replace or represent one character) in our trie structure.

Well, let’s start with a common trie implementation. We will make a trie with a hashmap that maps characters in other nodes and also with an isLeaf boolean value to indicate if a word is finished at that node or not.

class Trie:
  def __init__(self):
    self.children: Dict[str, Trie] = {}
    self.isLeaf: bool = False

Now let’s add two methods: insert in which we insert a word to the trie and find to check if a word is in the trie.

In the insert method we want to iterate through the characters in the word and traverse them on our trie, creating new Tries where they do not exist. When we are at our final node we set the isLeaf to True.

def insert(self, word: str) -> None:
  node = self
  for char in word:
    if char not in node.children:
      node.children[char] = Trie()
    node = node.children[char]
  node.isLeaf = True

Similarly, in our first find method implementation, we will iterate in our word while traversing the trie. If at some point our trie does not have the children for the character we are at, our word does not exist. When we finish iterating through the word the isLeaf property should be returned, because we could be in the middle of a inserted word, and not at the end of it (imagine for example we are trying to find the word “FULL” in a trie that has only the word “FULLY” inserted in it).

def find(self, word: str) -> bool:
  node = self
  for char in word:
    if char not in node.children:
      return False
    node = node.children[char]
  return node.isLeaf

Okay, so our basic trie is implemented. To add a wildcard search functionality to it we will have to modify our find method. Now, when the char is a wildcard we will traverse once all the children of the node, and keep looking in the children.

While we are traversing the trie we have to keep track of 3 different values:

  • The node we are at the Trie
  • The character index we are looking
  • The matching word until now

and every time we find a matching word we append it to the results array. The following GIF illustrates the process when we are searching for “W?RD” in a trie that has “WORD” and “WARD” inserted.

Trie with the words "MARIA" and "MARIO added

So in the code, we will use a Depth First Search iterative approach (using a stack) to achieve this.

def find(self, word: str) -> List[str]:
  stack = [(self, 0, '')]
  result = []
  while stack:
    curr, count, currWord = stack.pop()
    if count == len(word):
      if curr.isLeaf: result.append(currWord)
    currChar = word[count]
    if currChar == WILDCARD:
      for childChar, node in curr.children.items():
        stack.append((node, count + 1, currWord + childChar))
    if currChar in curr.children:
      node = curr.children[currChar]
      stack.append((node, count + 1, currWord + currChar))
  return result

The complete code for this implementation with some tests can be found here. If you want to learn more about tries checkout this awesome article on basecs.