Learning ANTLR 3: More examples are coming

Matching arithmetic expression

An arithmetic grammar for +-x/ and parenthesis:

grammar Expr;

/** The start rule; begin parsing here. */
prog:   stat+ ;         // a program contains multiple statement
//a statement can either be expr \n or assignment \n or \n
stat:   expr NEWLINE                
    |   ID '=' expr NEWLINE        
    |   NEWLINE                   
    ;

expr:   expr ('*'|'/') expr   
    |   expr ('+'|'-') expr   
    |   INT                    
    |   ID                    
    |   '(' expr ')'         
    ;

ID  :   [a-zA-Z]+ ;      // match identifiers <label id="code.tour.expr.3"/>
INT :   [0-9]+ ;         // match integers
NEWLINE:'\r'? '\n' ;     // return newlines to parser (is end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace

Some highlights

  1. Grammar consists of rules for synatctic structure and rules for vocabulary symbols(tokens).
  2. program consisits of multiple statements
  3. Seperate alternatives of a rule with |
  4. Group symbols with parenthesis into subrules(‘*’ | ‘/’)
  5. We can do left recursion(invoke itself at the start of alternative rules.

To test, we run

**$ antlr4 Expr.g4
$ javac Expr*.java
$ grun Expr prog -gui t.expr # launches org.antlr.v4.runtime.misc.TestRig**

We will see the parse tree for input in t.expr and **invoke the parser with rule prog **.

Calling a rule method is like invoking that rule. We can call any parser rule method.

Building a calculator

We need to perform some actions using parse-tree visitor.

To do so wee need to add

1.label the alternative rules so we can have different visitor per rule so we can have different event for different input.

expr: expr op=('*'|'/') expr # MulDiv
| expr op=('+'|'-') expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
  1. We need to define token names for the operator literals so we can reference token names as Java constants in the visitor
MUL:'*';
DIV:'/';

This time, we will use the visitor pattern and ANTLR will generate a method for each labeled alternative name

$ antlr4 -no-listener -visitor LabeledExpr.g4

It will also also generate default visitor implementation and we can subclass it. We will need to override methods associated with statement and expression alternatives.

Some interesting override is:

    @Override
    public Integer visitMulDiv(LabeledExprParser.MulDivContext ctx) {
        int left = visit(ctx.expr(0));  // get value of left subexpression
        int right = visit(ctx.expr(1)); // get value of right subexpression
        if ( ctx.op.getType() == LabeledExprParser.MUL ) return left * right;
        return left / right; // must be DIV
    }

Here, when we visit multi or divide, we will first infer the left and right expression. Then we will use ctx.op.getType to determine the type of the op. Then do operation accordingly.

Translation of java Source code

import java.util.List;
import java.util.Map;
public class Demo {
    void f(int x, String y) { }
    int[ ] g(/*no args*/) { return null; }
    List<Map<String, Integer>>[] h() { return null; }
}

We want to translate the source code into interface(just like header definition) while preserving whitespace and common.

<br />interface IDemo {
    void f(int x, String y);
    int[ ] g(/*no args*/);
    List<Map<String, Integer>>[] h();
}

We are given a Java Grammar and make a parse tree. We will derive interface from class name and grab signature(return type, name and arg list).

WE want to know how a class is decleared and how the method is defined.

classDeclaration
    :   'class' Identifier typeParameters? ('extends' type)?
        ('implements' typeList)?
        classBody
    ;
methodDeclaration
    :   type Identifier formalParameters ('[' ']')* methodDeclarationRest
    |   'void' Identifier formalParameters methodDeclarationRest
    ;

We will use antlr4 to get the visitors

$ antlr4 Java.g4

We do a

ls Java*.java 

and got

 JavaBaseListener.java JavaListener.java
 JavaLexer.java JavaParser.java

We will then extract an extended class from JavaBaseListener, (Java here means a listener for the Java grammar). We need to have 3 strategies:

  1. print out interface header at the start of a class
  2. print a terminating } at the end of class definition
  3. When we see each method definition, print out signature.
    @Override
    public void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx){
        System.out.println("interface I"+ctx.Identifier()+" {");
    }

    @Override
    public void exitClassDeclaration(JavaParser.ClassDeclarationContext ctx) {
        System.out.println("}");
    }

    @Override
    public void enterMethodDeclaration(
        JavaParser.MethodDeclarationContext ctx
    )
    {
        // need parser to get tokens
        TokenStream tokens = parser.getTokenStream();
        String type = "void";
        if ( ctx.type()!=null ) {
            type = tokens.getText(ctx.type());
        }
        String args = tokens.getText(ctx.formalParameters());
        System.out.println("\t"+type+" "+ctx.Identifier()+args+";");
    }

We also need to write a main to run the parser and walk the tree with listener.

public class ExtractInterfaceTool {
    public static void main(String[] args) throws Exception {
        String inputFile = null;
        if ( args.length>0 ) inputFile = args[0];
        InputStream is = System.in;
        if ( inputFile!=null ) {
            is = new FileInputStream(inputFile);
        }
        ANTLRInputStream input = new ANTLRInputStream(is);

        JavaLexer lexer = new JavaLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaParser parser = new JavaParser(tokens);
        ParseTree tree = parser.compilationUnit(); // parse

        ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
        ExtractInterfaceListener extractor = new ExtractInterfaceListener(parser);
        walker.walk(extractor, tree); // initiate walk of tree with listener
    }

embedding actions in a grammar

For more control, we can embed actions in grammar. These actions will be copied into the recursive-descent parser. actions are code snippets surrounded by curly braces.

Disclaimer :code snippets are from the ANTLR reference. If you found this note to be infrindging upon its copy right, please contact me and I will take it down.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax