Learning ANTLR 2: Some more example

In this post, I will go through a project that will recognize integers in curly braces such as {1,2,3} and {1,2,{3,4}}.

It might be useful in JAVA code refactoring that convert java array initialization. EG

static short[] data = {1,2,3};
to
static String data = "\u0001\u0002\u0003";

For more efficient initialization since JAVA do data[0] = 1, data[1] = 2 and data are not in compacted byte.

ANTLR structures

There are two main component: runtime and ANTLR tool

Tool

Runing ANTLR will generate parser and lexer that recognizes sentences described by the grammar.

Runtime

Runtime is a library of classes and methods needed by parser, lexer and token

Running ANTLR tool

After running the ANTLR tool on grammar.g4, we will get a bunch java files: If our grammar look like:

grammar ArrayInit;
/** A rule called init that matches comma-separated values between {...}. */
init : '{' value (',' value)* '}' ;
/** A value can be either a nested array/struct or a simple integer (INT) */
value : init | INT;

INT: [0-9] + ;
WS : [\t\r\n] + -> skip;

Generated files from ANTLR

ArrayParser.java

It will contain the parser that will recognize the array language syntax.

ArrayInit.tokens

Each token type will be assigned a token number and stored here.

ArrayInitListener.Java

A tree walk will fire “events” to a listener object that we provide.

Compile ANTLR code

javac *.java

If error, use this:

export CLASSPATH=".:/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH"

We use grun to run it and print out tokens crated by the lexer

grun ArrayInit init -tokens
{1,2,3}
ctlr-d

Perform actions based on parse tree

An application need to extract data from the parse tree to perform action. We will use call-back and react to input.

We will extend the base listener class and perform override enter and exit for method Init and Value

/** Convert short array inits like {1,2,3} to "\u0001\u0002\u0003" */
public class ShortToUnicodeString extends ArrayInitBaseListener {
    /** Translate { to " */
    @Override
    public void enterInit(ArrayInitParser.InitContext ctx) {
        System.out.print('"');
    }
    /** Translate } to " */
    @Override
        public void exitInit(ArrayInitParser.InitContext ctx) {
    System.out.print('"');
    }
    @Override
        public void enterValue(ArrayInitParser.ValueContext ctx) {
        // Assumes no nested array initializers
        int value = Integer.valueOf(ctx.INT().getText());
        System.out.printf("\\u%04x", value);
    }
}

When we enterValue, we will ask the context object for the INT token to find out the int value.

Then we will create a parse tree and parse it. Then use a walker to walk the tree. While walking the tree, it will trigger all enter and exit along the way.

ParseTreeWalker walker = new ParseTreeWalker();
// Walk the tree created during the parse, trigger callbacks
walker.walk(new ShortToUnicodeString(), tree);

The only thing we doodle with is the listener. It isolate the language application from the grammar.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax