---
---
Indrek's SableCC page

SableCC is a cool parser generator that generates LALR(1) based parsers in the Java language. Its approach to parser technology is extremely minimalistic and clean. This results in very readable and maintainable grammars as opposed to other systems (notably messy bison and alikes where code and grammar rules are mixed).

I have created a system for SableCC to generate alternative output for other programming languages. Like C++ and C#. Of course this is not limited to only targeting languages. For example you can write a XSS or XSL stylesheet to generate an alternative tree walker.

The system used to consist of three parts but that has changed now. All code has been intergrated into the SableCC and there is no longer dependencies on Apache Xalan.

This release is built on the sablecc-3-beta.3 that contains the advanced AST transformation syntax developed by Komivi Agbakpem. There is not much documentation available, all you have are some examples on this site and elsewhere. Also note that beta.3 is quite different from beta.2 as it requires complete specification of AST target production operator and does not allow any ambiguities. See http://www.sablecc.org/ for more information.

News

2004-11-14 - releasing beta.3.altgen.20041114
Two new backends - Python backend developed by Fidel Viegas and C backend developed by me. The C backend needs some more work, there are some ideas here that have not been completely verified. Other changes include fuller xss/xpath support, slight changes to C# to make generated parsers CLS-compliant, a few patches to the O'Caml backend.

Installation

Binary sablecc.altgen distribution file: sablecc-3-beta.3.altgen.20041114.zip. Installation should be exactly the same as for regular sablecc. You only need this file, it contains all of sablecc and the new output generation framework.

Subversion repository to the source code:

    Taged releases: svn://svn.sablecc.org/developers/indrek/tags/
    Sablecc.altgen: svn://svn.sablecc.org/developers/indrek/sandbox/sablecc-indrek/

Please also read the README.altgen.

Supported languages (backends)

  • java - Java (copies the original SableCC's Java output)
  • csharp - C# (mimics the Java interfaces, C# style)
  • cxx - C++ (by Indrek Mandre)
  • ocaml - O'Caml (by Patrick Lam)
  • python - Python (by Fidel Viegas)
  • c - C (by Indrek Mandre), see its README. Download an example project.
  • dotgraph - graph file of the AST for the Graphviz's dot tool, see simple tree example and full tree example.
  • xml

Technology

The development and evolution of alternative generation system:

  • The very first version of alternative generation system was java based
    • It was extensible through Java plugins
    • Only one plugin was written - C++
  • After that I wrote an XML generating plugin
    • Example of generated XML: parser.xml
    • All normalised data required to generate a parser were put into XML
    • This includes lexer and parser tables, AST structure, parser rules, etc.
  • When I had XML I could use XSL - XSL templates to generate parsers
  • Almost anything out there is happy with XML data and could be used
  • It turned out XSL templates were not pretty, so alternative solution was desired
  • I wrote XSS - a new language and a translator to do xss->xsl conversion.
    • See XSS documentation
    • If you want to write your own backend I suggest you also study existing files
  • XSS itself was much simpler to use than XSL. Compare:
  • XSS was still dependant on XSL and the huge Apache Xalan XSLT
  • I wrote an XSS interpreter (XSS2), that implemented all the XSS features and a subset of the W3C XPath language
  • XSS2 was integrated into sablecc source indrek's branch, it's small and lean and there are no dependencies anymore
  • Future developments:
    • Rewrite of some parts of the sablecc3 beta - organised by Etienne Gagnon
    • SableScript being written by Etienne Gagnon
    • Integrate SableScript into XSS to drop all dependencies on Java
    • Merging the indrek's branch of sablecc to the main branch

Deprecated and old packages

The alternative generation used to consist of multiple packages and was built on Apache Xalan-Java to get XSL support. That has changed now but I'll provide links to the old stuff anyway for people who might want it. Most of it is either incorporated into the altgen sablecc package or rewritten/overriden by other solutions.

The newly developed SableCC version 3 introduces several new features. The most important of those is ability to specify transformation rules to create an AST (Abstract Syntax Tree) out of the more mundane default grammar tree (CST - Concerete Syntax Tree). This greatly reduces work needed to be carried out by the programmer and also results in cleaner grammars.

This is a simple example I threw together to get into the SableCC3 transformation magic. Here's the link to the complete example sablecc3example.tar.gz and you can view the example grammar inline: test.sablecc3.txt.

1 + 2 * 3

This took some work to get converted and tested (two days) from my college thesis project's plain SableCC grammar. It is a PHP language grammar. Oriented towards version 4.

It's currently distributed under the GNU LGPL license. Get it here: php4sablecc-0.9e.tar.gz. And here's the grammar itself for a quick look: php4.sablecc3.txt. Note that the grammar has been updated to work with sablecc-3-beta.3.altgen.20040327 and later.

The PHP4 grammar is also stored in the SableCC's subversion repository. If you're looking for bleeding edge or changes that have not made it to the release please check it out:

    svn://svn.sablecc.org/developers/indrek/sandbox/php4/

Here's the changelog:

  • 15-11-2003 break and continue can now take expression as argument
  • 30-10-2003 allow case insensitive <?php, ignore any other <?other as html
  • 30-10-2003 hack: explicitly allow ! $a = foo() that translates to !($a = foo())
  • 30-10-2003 hack: include-s can now be silenced (with @)
  • 15-10-2003 added support for missing backtick operator
  • 26-07-2003 the list() operator mishandled empty fields
  • 24-07-2003 jar building script make_jar.sh was missing .dat files
  • 24-07-2003 heredoc line numbers were not correctly parsed
  • 24-07-2003 lexer did not properly accept "{"

Missing, incomplete or buggy parts:

  • serious problems with some expressions ambiguities Due to bison's lack operator precedence attitude expressions like $a = 1 + $b = 2 + $c = 4 + 8; are allowed and I have no idea how to duplicate this weird syntax with sablecc. I made some exceptions, like @include() and !$var = foo() but inherently the problem remains unsolved.
  • no readonly expressions like default arguments for functions or class variables exists, eg. it accepts incorrect arguments that call functions, use variables etc.
            function foo ($bar = foo2()) { } /* illeagal by zend but this accepts */
  • recursive heredoc blocks are not allowed.. don't ask.
  • old_function is not supported
  • when assigning a reference the rvalue can be invalid: $a = &42;
  • the same goes for building arrays out of references: array (&42);
  • the same goes for calling functions: foobar ($a, $b, &42);
  • it could be that ternary operation has wrong associativity, not sure, haven't checked
  • if file ends with ?>\n then the ending newline is not removed
  • lots of bugs

Please note that you need a custom lexer class for the generated parser to function. You can find that in the archive.

<?php
  echo 'Hello, world!\n';

  for ( $i = 1; $i < 10; $i++ ) echo $i;

  function func (&$b, $c = "wof")
  {
    return $c;
  }
?>
 
---
Copyright © 2001-2024 Indrek Mandre