block by GerHobbelt 3772069

JavaScript: a state-machine-based lexer + recursive descent (LL(k)) grammar parser + abstract syntax tree (AST) + generator

Full Screen

Basic Compiler Technology

Lexers, Parsers, Code Generators are of all ages (old skool: lex/flex, yacc/bison (modern-ish: PCCTS/ANTLR), burg/lburg/…)

Here’s an example of a JavaScript based domain-specific language being parsed and converted into HTML. It is very similar to wiki and MarkDown formats; the only purpose of this code is to showcase the parsing technology, so a minimum number of ‘shortcuts’ and other production code ‘smartness’ is applied.

The recursive descent parser works by evaluating the grammar rules in order of precedence, where each function call is a grammar rule match attempt and the first one returning success is a ‘grammar rule match’ which will allow the parser to continue parsing the input.

Also, the input is assumed to be size limited, i.e. a infinite lookahead grammar is fine with us. (This doesn’t work in situations where you need to parse a stream, but that is a problem area that few of us have to deal with in actual reality. Mostly we like limited lookahead grammars because those will guarantee a good parsing performance; infinite look-ahead can produce a quadratic or even exponential run time.)