Blogged by Ujihisa. Standard methods of programming and thoughts including Clojure, Vim, LLVM, Haskell, Ruby and Mathematics written by a Japanese programmer. github/ujihisa

Sunday, December 20, 2009

Yacc, JavaCC and Racc

To compare the three parser generators, here I'll show an easy sample written in them.

Target Syntax

The following codes are accepted.

  • "1+2"
  • "23423423432 + 923401"
  • "23432 + 2"

The compiler will calculate the single addition and shows the value.

The following codes aren't accepted.

  • "1+2+3"
  • "1-2"
  • "1+"

JavaCC

The following code is from Standard Compiler by Minero Aoki.

The filename is A.jj.

PARSER_BEGIN(A)
import java.io.*;

class A {
  static public void main(String[] args) {
    for (String arg : args) {
      try {
        System.out.println(evaluate(arg));
      }
      catch (ParseException ex) {
        System.err.println(ex.getMessage());
      }
    }
  }

  static public long evaluate(String src) throws ParseException {
    Reader reader = new StringReader(src);
    return new A(reader).expr();
  }
}
PARSER_END(A)

SKIP: { <[" ", "\t", "\r", "\n"]> }

TOKEN: {
  <INTEGER: (["0"-"9"])+>
}

long expr():
{
  Token x, y;
}
{
  x=<INTEGER> "+" y=<INTEGER> <EOF>
    {
      return Long.parseLong(x.image) + Long.parseLong(y.image);
    }
}

To run this, type the following commends

$ javacc A.jj
$ javac A.java
$ java A '1 +  3'
4

This build process produces the following files automatically.

  • A.class
  • A.java
  • AConstants.class
  • AConstants.java
  • ATokenManager.class
  • ATokenManager.java
  • ParseException.class
  • ParseException.java
  • SimpleCharStream.class
  • SimpleCharStream.java
  • Token.class
  • Token.java
  • TokenMgrError.class
  • TokenMgrError.java

Yacc and Lex

a.y

%token NUMBER

%%

expr : NUMBER '+' NUMBER { printf("%d", $1 + $3); }

%%
#include <stdio.h>
#include "lex.yy.c"

a.l

%{
#include "a.tab.h"
%}
%%
[0-9]+    {sscanf(yytext,"%d",&yylval); return(NUMBER);}
[ \r\n\t]   ;
.         return(yytext[0]);
%%
#ifndef yywrap
yywrap() { return 1; }
#endif

run

$ bison -d a.y && lex a.l
$ gcc a.tab.c -ly -ll
$ ./a.out

files automatically generated

  • a.out
  • a.tab.c
  • a.tab.h
  • lex.yy.c

Racc

I referred this blog entry http://d.hatena.ne.jp/morphine57/20090824/1251129740

a.racc

class A
  token NUM
rule
   expr : NUM '+' NUM { result = val[0] + val[2] }
end
---- header
require 'strscan'
---- inner
  def parse(str)
    @tokens = []
    s = StringScanner.new(str)
    until s.eos?
      case
      when s.scan(/[0-9]+/)
        @tokens << [:NUM, s[0].to_i]
      when s.skip(/[ \t\r\n]/)
      else
        @tokens << [s.getch, nil]
      end
    end
    do_parse()
  end

  def next_token
    @tokens.shift
  end

And runs

$ racc a.racc
$ ruby192 -r ./a.tab.rb -e 'p A.new.parse "1+ 2"'
3

Files automatically generated

  • a.tab.rb

Statistics

The lines of code which I have to write by myself:

  • JavaCC: 39
  • Yacc: 9 + 11
  • Racc: 26

The lines of code which are automatically generated:

  • JavaCC: 1446
  • Yacc: 3280
  • Racc: 117 (assuming the gem library racc is already installed)

3 comments:

  1. For java, it's too producing files.i think so.

    ReplyDelete
  2. The only thing I would suggest is that if you don't mind a static parser, you can eliminate the "options" part of the grammar completely. That only saves 3 lines, though...

    ReplyDelete
  3. Thanks!

    (I updated this blog entry now to eliminate the "options" and added the lists of automagically generated by yacc and racc)

    ReplyDelete

Followers