Remove method for Ternary Search Tree

ariak · July 21st, 2007, 01:48 PM

Hi,

I enjoyed reading parts of this book as it really shins lights on the inner work of most sought after collection algorithm from a very simplistic point of view.

In Chapter 14, where Ternary Search Tree is discussed, authors do not include a removal method to exclude a word from the tree. Although, I assume such algorithm is intended for initial huge number of insertions and subsequence lookups, I too intend to use it for the reasons aforementioned. But I also need to very rarely remove a word from the tree (perhaps for every 100 insertions, one removal).

At first glance, I assumed writing a remove method would be simple but I was wrong. One has to take in many possible scenarios to effectively and correctly accomplish this task. After a few minutes of ponder, I came to the conclusion to do the following:

Find the last character of the word with the search method

Starting from that node, go up the tree, removing potentially "qualified" nodes

Going up the tree branch requires an additional node to the "node" class, let's call it a "parent node." So the insertion method had to be modified... All good so far.

Removing each node is based on many assumptions as to whether its has any child node depending on it, or even larger and smaller "sub-branches" hanging from it. I successfully wrote a remove method that "cosmetically" removes a set of nodes that are part of the "word." That is, if the search method is run against our word, it shows that it has been removed -- I can attest to that.

But removal of the word does not guarantee that "all" the character in the tree would be eliminated. That's due to the fact that some nodes are a part of another word(s). So one has to be very careful in implementation of the removal method.

I thought I had everything working perfectly until I noticed (thanks to vigorous testing) that number of nodes are not removed from the tree because they contain a sub-branch (larger or/and smaller) but the parent to these sub-branches that is also part of "removing" word can be safely removed _IF_ all the sub-trees are moved up the tree to its parent which requires a whole of work (perhaps a method that insert a single node to a branch and that single node can have an entire sub-tree hanging from it).

So obviously some of the node may truly belong to any word but the removal of such node requires rearranging a whole lot of sub-trees around which seems to be a very daunting task.

At the end, I would like to know whether has anyone implemented a sane and fast removal method for Ternary Search Tree that might want to share before I go on with my own implementation which seems to be not very elegant?

How about word replacement method? ... And how about a thread-safe version of this class? Is it safe to assume if a method uses recursion, it is best to be entirely synchronized? :)

Thank you