Unsorted Documentation

How to write a grammar

  • Go look at the JsonGrammar & read the software architecture below.

Tips / Troubleshooting

  • set $lexer->useCache to false to disable cache
  • $lexer->debug = true to print debug information
  • $lexer->inspect_loop = 23 to print the full directive being checked on the 23rd loop
  • rewind can cause an infinite loop. Ex: match == : & onMatch[rewind] == 1. The : is matched, then we rewind 1, then the : is matched & we rewind 1 & so on

v0.5 Architecture

The Lexer . The Token manages our placement in the input string. The lexer manages the directive stack and The Grammars declare directives which have instructions to perform when they're matched. These instructions can modify the token, the lexer, and create new Asymmetrical Syntax Trees (ASTs) to create structured representation of the source.

The Pieces

  • Lexer: takes Grammars to process an input string using a Token
    • $directiveStack: A multi-layered stack of directives. Each layer can have multiple directives. Each layer has a 'started' and 'unstarted' list.
    • $astStack: The stack of ASTs. Generally the head ast is operated on
  • Token: Contains the input string. Manages our position in the input string.
  • Grammar: Declares directives
    • directive: A set of targets & instructions. Generally contains a string or regex (target) to match against & instructions for what to do upon that match.
  • Ast: An Asymmetrical Syntax Tree... holds values & can be output as an array

Setting up the lexer environment

  1. Lexer is initialized
  2. Grammars are added to the lexer.
  3. An Ast is created to be the root
    • Lexer has convenience methods, or you can create your own AST to lex.
  4. The ast is set as the head
  5. A Token is created from the input string
  6. For each grammar onLexerStart($lexer, $ast, $token) is called
  7. Preform the lexing (see below)
  8. For each grammar onLexerEnd($lexer, $ast, $token) is called

The Lexing

  1. using a while loop, set $token = $token->next(), which returns itself with an updated buffer, or false, if there are no more characters to process
    • next() adds one character to the buffer at a time.
  2. Each started directive is checked for match and stop. If there are no started directives, then unstarted directives are checked for start
  3. If started directives stop, they're moved back into unstarted. & unstarted when started are added to started & removed from unstarted
    • When unstarted directives are started, $unstartedDirective->_matches is set to the result of regex/string matching.
  4. Any regexes that passed in step #3 are now processed for their instructions, in the order those instructions were declared.
  5. thens are processed & any target directives are added to a new layer of the directive stack
  6. Repeat from #1 until the token has been fully buffered