Memory management, pooling. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The C backend is designed to be fast, memory efficient and easy to use. For the effect of memory management we are using a system of pools. Pools track the allocated memory, and when you destroy a pool all alloced memory with that pool is destroyed. Also pools help one to recycle used nodes resulting in more efficient running. The general use of pools: * nodes will survive destruction of parser and lexer objects, they are only tied to pools * never mix nodes from different pools (eg. setting node from one pool as sub-node of node of another pool). Same goes for freeing nodes, free using the same pool that was used at creating them. * the last rule has one exception - when cloning nodes you can use a different pool Currently always use the same pool for lexical analyzer and parser. This rule might be revised later. A safe bet is to create and release a pool for every parsing. That will always work and you will not get leaks. Right now the parser leaks discared nodes so they don't get recycled and the lexer token value allocation has no way of releasing the data separately (unless pool is destroyed). Creating and destroying pools is relatively cheap operation and you might also want to look into nodes_alloc and tokens_alloc parameters. Parameters, customisation ~~~~~~~~~~~~~~~~~~~~~~~~~ The C backend is highly customizable using a set of command line parameters that can be passed to SableCC. Here's a list of them and general description. package - the default name of package from sablecc source file libname - library name, by default derived from the package name by replacing dots with underscores. Is used at determining the file names generated, which are $libname.c and $libname.h. malloc, free, realloc - use these parameters if you need to globally override the functions used for basic memory management inside the parser. inline_keyword - by default 'inline', depends on the compilers used, it is used in some internal functions for getting the compiler to inline them. nodes_alloc - nodes to alloc when the pool is empty, creates a slight nodes dispatch overhead as the pool is filled but in general is a good thing as it reduces calls to malloc. tokens_alloc - memory alloced per token pool increase. This memory is used for token values. Token pool is not really a pool, it just tracks of alloced memory and gets it more and more, without releasing. Only when pool is destroyed is the data taken up by tokens freed. parser_initial_stack_size - stack size, depends on the grammar what to pick for it, it is increased every time when it runs out by 2* lexer_initial_tokenbuf_size - memory space used for initially holding token values as they are read. So can grow pretty big, again by increments of 2*. When token is read and accepted by lexer, it is copied to the memory inside the pool. lexer_backbuffer_chunk_size - when lexer has accepted a token and wants to push the data back that was read but did not make up the tail of token, then chains of buffers of this side are alloced. Depends on the grammar/usage if incrementing this is necessary. omit_long_errors - omit long stringed error message that give alist of alternative tokens that should be in a spot, isntead a general dumb "parse error at x,y is given". Saves some memory (executable size). omit_ignored_token_values - ignored token values are not placed into pooled token memory space at lexical analyzer level. If you use your lexer always with parser turn this on as it saves memory. omit_node_names - omit node name strings. If you never want to use the node_name() function, then you can turn this on. Will save a tiny bit in executable size. omit_token_defaults - by default tokens that are of explicit textual value can have their values statically inside the executable. This is designed to save memory as we don't need to put values into pooled token space. You can use this option to turn that behaviour off. It will save a bit on executable size but for most cases is of no use. omit_alloca_for_malloc - use malloc() instead of alloca at reverse node iterator (REVERSE_PRODUCTION_ITERATE). As the nodes use a single linked list to represent the * operator, reverse traversal is problematic. For that a buffer is allocated, filled in order with all the nodes and then iterated in reverse order. In case your compiler does not support alloca or you are afraid of running out of stack space (due to small stack or long parsed lists) better use malloc. But alloca is usually faster. parser_inline - by default the generated parser uses a sort of internal command interpreter for rule reductions. There is another way to do that - to explicitly inline all the code into the parser. That will result in larger executable, a longer compile time and usually in faster parsing. But the latter greatly depends on your cache size. As a rule on CPU-s with small executable cache the performance should actually be worse. But this depends a lot, you need to test. prefix - as there are no namespaces in the C language, the only way to use differrent parsers inside one program currently is prefixing. Define this parameter and you'll get a parser that is completely prefixed as you set. But be careful, the code you write for one prefix won't compile for others - you'll have to do a lot of work if you decide to change later on, so decide on the prefix from early on. The prefix is automatically mapped to a wider range of prefixes for different language objects, you might want to tune them by hand: prefix_macro, prefix_type, prefix_function and prefix_nodetype. release_discarded_nodes - when transforming many nodes are often discarded or unused in the parsing process. Usually those are reduntant language constructs that are not needed in the final AST. By turning this option on the parser will look through the parse stack and release unused nodes as it goes. This might make the parser work couple of per cent slower but it can be a great saver on memory usage (the actual amount saved depends on the AST transforms). user_properties_include - argument is a path to file containing the user defined property struct that is attached to every node. By default the name is 'struct user_properties' and the variable inside node struct is named 'props'. Currently the argument is just #included into the parser's .h file. This behaviour will probably change in future releases to directly include the content into the header to get rid of additional header file dependency. After generating the parser have a look on the generated .h file, at the header it has all the chosen parameters at generation written down.