Unsorted Documentation
How to write a grammar
- Go look at the JsonGrammar & read the software architecture below.
Tips / Troubleshooting
- set
$lexer->useCache
to false to disable cache -
$lexer->debug = true
to print debug information -
$lexer->inspect_loop = 23
to print the full directive being checked on the 23rd loop -
rewind
can cause an infinite loop. Ex:match == :
&onMatch[rewind] == 1
. The:
is matched, then we rewind 1, then the:
is matched & we rewind 1 & so on
v0.5 Architecture
The Lexer . The Token manages our placement in the input string. The lexer manages the directive stack and The Grammars declare directives which have instructions to perform when they're matched. These instructions can modify the token, the lexer, and create new Asymmetrical Syntax Trees (ASTs) to create structured representation of the source.
The Pieces
-
Lexer:
takes Grammars to process an input string using a Token-
$directiveStack
: A multi-layered stack of directives. Each layer can have multiple directives. Each layer has a 'started' and 'unstarted' list. -
$astStack
: The stack of ASTs. Generally the head ast is operated on
-
-
Token
: Contains the input string. Manages our position in the input string. -
Grammar
: Declares directives-
directive
: A set of targets & instructions. Generally contains a string or regex (target) to match against & instructions for what to do upon that match.
-
-
Ast
: An Asymmetrical Syntax Tree... holds values & can be output as an array
Setting up the lexer environment
- Lexer is initialized
- Grammars are added to the lexer.
- An
Ast
is created to be the root- Lexer has convenience methods, or you can create your own AST to lex.
- The ast is set as the
head
- A
Token
is created from the inputstring
- For each grammar
onLexerStart($lexer, $ast, $token)
is called - Preform the lexing (see below)
- For each grammar
onLexerEnd($lexer, $ast, $token)
is called
The Lexing
- using a while loop, set
$token = $token->next()
, which returns itself with an updated buffer, orfalse
, if there are no more characters to process-
next()
adds one character to the buffer at a time.
-
- Each
started
directive is checked formatch
andstop
. If there are nostarted
directives, thenunstarted
directives are checked forstart
- If
started
directives stop, they're moved back intounstarted
. &unstarted
when started are added tostarted
& removed fromunstarted
- When
unstarted
directives are started,$unstartedDirective->_matches
is set to the result of regex/string matching.
- When
- Any regexes that passed in step #3 are now processed for their instructions, in the order those instructions were declared.
-
then
s are processed & any target directives are added to a new layer of the directive stack - Repeat from #1 until the token has been fully buffered