Monthly Archives: June 2015

Printing Pretty Ascii Trees

I’m terrible at writing recursive algorithms. Absolutely horrible. So for practice I decided to try printing a binary tree where all branches have two children, all in the console. It turns out this is a pretty useful quick-n-dirty debugging tool when creating trees.

I really like formatting of Tree for linux so I decided to steal it.

After 3 straight hours of trying out stupid stuff I finally come to a working solution (my earlier attempts were around 150 lines of code, the solution was like 70):

Which will print cool trees like this one:

I then realized that I could use singly linked lists to represent trees with any number of children, which is totally necessary for the parser I am currently writing. So after 30 minutes I was able to form version 2 for arbitrary trees; this version is surprisingly right around the same length and complexity as the previous version:

And will print trees like this monster:

I also ended up finding a pretty cool implementation by a random stack-overflow user that prints trees in horizontal format (which turns out to be much more limited for the purposes of a console, since it can’t print larger trees, but might look a lot prettier for small trees):

It may be preferable to upgrade to the limited edition Box Drawing characters for best viewing pleasure:

Capture

 

In which case you’ll probably want to use some specific character codes (I used 195, 196 and 192 in the above image).

TwitterRedditFacebookShare

Parsing C Style Expressions

Turns out that constructing a hand-written C-style parser has a few parts that were very difficult for me.

The first thing was realizing that Backus Naur Form (BNF) largely sucks if you want to hand-write your own parser. BNF is really verbose and expressing simple things like optional terminals or lists is difficult. BNF is also poor for expressing operator precedence, as many intermediate and redundant non-terminals are required to be evaluated during parse-tree derivation. As an alternative Extended Backus Naur Form is perfect for languages that plan to use hand-written parsers instead of parsers created by parser generators. Left-factoring a BNF for LL parsing is also not very useful since handling infinite recursion with hand-written code is trivial.

The second thing is that parsing expressions with various types of operators can be really difficult, especially if there’s a lack of confidence in recursion (like myself). Creating a parse tree given a string representing an expression is a very recursive problem.

In C expressions consist of atoms and operators. An atom would be a literal, constant value, identifier, or an expression wrapped in parentheses. Operators are the usual + or – kind of tokens.

If each operator has an associated precedence value there are a few different algorithms out there with references for learning. I ended up face in the dirt and in the end derived what is known as “precedence climbing“. According to Eli Bendersky precedence climbing is what is currently used by Clang to parse C++ expressions! That ought to instill some perceived merit. From what I can tell Lua 5.3 uses (well, very close to) the same method.

The idea of precedence climbing is to think of two major recursive operations:

  • Compute righthand-side node
  • Make a binary operator node and connect lefthand-side and righthand-side children nodes

The first point is the complex one (that is, conceptually complex). The algorithm starts given a lefthand-side node, however, righthand-side nodes do not come in through the input stream in tree format; the next token can represent a node that should be much deeper in the tree — this means that computing the righthand-side node ought to be the main recursive path.

Realizing that the righthand-side node computation is the recursive path led me to notice a key observation that tipped me off to a working algorithm.

Say we have the following input string as an expression: A 2 B 1 C 4 D 3 E 7 F

Numbers are operators, and the number itself is precedence (higher number is higher precedence), letters are atoms (like a const int variable). Here’s the valid parse tree:

chart

The lowest leaves are evaluated first. It’s easy to see that the tree itself encodes the operator precedence.

If we begin parsing our input string the first atom is A, which will always be a lefthand-side node for most any parsing algorithm used, and will likely be the left-most node in the tree. The next token is the 2 operator followed by B. It’s easy enough to construct the subtree of node 2 pointing to A and B.

The next input token is the operator 1 and atom C. C is bound by operator precedence to the operator 4, though the current state of the algorithm has yet to even read in the token 4. Studying this scenario is what tipped me off to a working solution; C must be treated as a lefthand-side node, though at the current state is considered a potential righthand-side node.

Wikipedia, and this link from earlier, both show great pseudo code for the precedence climbing algorithm. The main difference between the two links is wikipedia includes a nested for-loop in favor of less overall recursive calls. My own code ended up looking something like after I cleaned it up from influences of previous links:

In the end I’m quite happy with the result, and even hooked up a nice ascii-tree printer courtesy of a random stack-overflow user. Here’s a dot product and initialization trees in ascii:

My favorite part about the operator precedence climbing algorithm is how it handles parentheses and prefix unary operators: parentheses can be considered an atom, and when the atom function finds a parentheses is just calls the expression parsing function directly and returns the result! The same can be done for prefix unary operators (if they have really high precedence). The algorithm also trivially handles right-associativity. I haven’t yet thought about unary postfix operators, so if any reader has thoughts on this topic please do comment!

Here’s psuedo-y atom code: