In this post, I will go through a project that will recognize integers in curly braces such as {1,2,3} and {1,2,{3,4}}.
It might be useful in JAVA code refactoring that convert java array initialization. EG
static short[] data = {1,2,3};
to
static String data = "\u0001\u0002\u0003";
For more efficient initialization since JAVA do data[0] = 1, data[1] = 2 and data are not in compacted byte.
ANTLR structures
There are two main component: runtime and ANTLR tool
Tool
Runing ANTLR will generate parser and lexer that recognizes sentences described by the grammar.
Runtime
Runtime is a library of classes and methods needed by parser, lexer and token
Running ANTLR tool
After running the ANTLR tool on grammar.g4, we will get a bunch java files: If our grammar look like:
grammar ArrayInit;
/** A rule called init that matches comma-separated values between {...}. */
init : '{' value (',' value)* '}' ;
/** A value can be either a nested array/struct or a simple integer (INT) */
value : init | INT;
INT: [0-9] + ;
WS : [\t\r\n] + -> skip;
Generated files from ANTLR
ArrayParser.java
It will contain the parser that will recognize the array language syntax.
ArrayInit.tokens
Each token type will be assigned a token number and stored here.
ArrayInitListener.Java
A tree walk will fire “events” to a listener object that we provide.
Compile ANTLR code
javac *.java
If error, use this:
export CLASSPATH=".:/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH"
We use grun to run it and print out tokens crated by the lexer
grun ArrayInit init -tokens
{1,2,3}
ctlr-d
Perform actions based on parse tree
An application need to extract data from the parse tree to perform action. We will use call-back and react to input.
We will extend the base listener class and perform override enter and exit for method Init and Value
/** Convert short array inits like {1,2,3} to "\u0001\u0002\u0003" */
public class ShortToUnicodeString extends ArrayInitBaseListener {
/** Translate { to " */
@Override
public void enterInit(ArrayInitParser.InitContext ctx) {
System.out.print('"');
}
/** Translate } to " */
@Override
public void exitInit(ArrayInitParser.InitContext ctx) {
System.out.print('"');
}
@Override
public void enterValue(ArrayInitParser.ValueContext ctx) {
// Assumes no nested array initializers
int value = Integer.valueOf(ctx.INT().getText());
System.out.printf("\\u%04x", value);
}
}
When we enterValue, we will ask the context object for the INT token to find out the int value.
Then we will create a parse tree and parse it. Then use a walker to walk the tree. While walking the tree, it will trigger all enter and exit along the way.
ParseTreeWalker walker = new ParseTreeWalker();
// Walk the tree created during the parse, trigger callbacks
walker.walk(new ShortToUnicodeString(), tree);
The only thing we doodle with is the listener. It isolate the language application from the grammar.