Write a Grammar

A Grammar is an array declaration of directives that define instructions. Those instructions may call built-in commands or may explicitly call methods on the grammar, the lexer, the token, or the head ast.

First: Look at the example in docs/GrammarExample.md
Then:

Tips

  • $grammar->getDirectives(':directive_name') returns an associative array of directives.
  • then directive:
    • You can pass a directive declaration to then, like then :name=>['start'=>[/*instructions*/]] to override the target directive
    • You can pass :directive_name.stop to use the stop as start.
      • idk if you can override in this case, but I think you can
  • then.pop :directive_name X lets you pop X layers when :directive_name is matched.
  • inherit :directive.stop or inherit :directive.start lets you auto-execute all commands from the named directive & instruction set.
  • Pass :+name to then to create a new directive, rather than loading from/merging with an existing directive.
    • :_blank and :_blank-name are deprecated alternatives

Troubleshooting Tips

  • Always rewind BEFORE buffer.clear, or no rewind is performed.
  • match has special handling & the recommended style is 'match'=>'string' or 'match'=>'/regex/'. The alternate styles like match string or match /regex/ should work, but might make problems.
  • set $lexer->useCache to false to disable cache.
  • $lexer->debug = true to print debug information
  • $lexer->stop_loop = 30 to stop processsing on loop 30 & print debug info.
  • rewind can cause an infinite loop. Ex: The instructions match == : & rewind == 1 on the same directive. The : is matched, then we rewind 1, then the : is matched & we rewind 1 & so on.
  • stop instruction ALWAYS acts upon the top directive list at the time it is executed. If the current directive is not in the directive list's started, then nothing happens. Meaning it is NOT added to the unstarted list.
  • directive.inherit instruction ALWAYS ignores the match instruction of the inherited directive.

Recommended structure

To keep files smaller & more organized, I keep my directives inside traits that my grammar uses.

  • MyGrammarClass extends \Tlf\Lexer\Grammar
    • use MyGrammar\Main_Directives
    • use MyGrammar\Comments_Directives,
    • function buildDirectives(): $this->directives = array_merge( comments_directives, main_directives)
      • override onGrammarAdded() to implement this
    • onLexerStart()/onLexerEnd() if needed
    • methods your directives will call

Structure of directives

The form is $directives -> directive_name -> instruction set -> array of instructions. There are two instruction sets start, stop. There is a third instruction set, but I plan to remove or change it.

<?php  
protected $directives = [  
    'php_open'=>[  
        'start'=>[  
            'match'=>'<?php',   
            //instructions go here  
        ],  
        'stop'=>[  
            'match'=>'?>',  
            //instructions go here  
        ],  
    ],  
]  
  1. When <?php matches, php_open becomes started.
  2. On subsequent loops stop will be checked.
  3. When ?> matches, php_open becomes stopped.

Notes

  • The subsequent instructions only execute if match passes.
  • match is NOT a required instruction
  • match does NOT have to be the first instruction
  • match has a lot of special handling to handle merging of overridden directives.

Declaring instructions

Many commands have a shorthand and a longhand like stop and directive.stop
Examples:

  • 'command arg1 arg2' => 'arg3'
  • 'command arg1 arg2 //comment' => 'arg3'
  • 'command arg1 ...' => ['arg2', 'arg3', 'arg4']

Instead of a command, you can use a namespace:method to directly call a method on an object from internally defined namespace targets or from one of the available grammars.
The available namespace targets are defined here:

$namespaceTargets = [  
    'lexer'=>$this,  
    'token'=>$this->token,  
    'ast'=>$this->getHead(),  
];  
$grammarTargets = $this->grammars;  
$grammarTargets['this'] = $directive->_grammar ?? null;  

Special object-calling

Some commands, like ast.new allow you give values that call objects+methods in much the same way as instructions. This is on a command-by-command basis

The format is _namespace:method arg1 arg2 arg3
Example:

['ast.new'=>[  
    '_type'=>'class',  
    'name'=> '_token:buffer',  
    'docblock'=> '_lexer:unsetPrevious docblock'  
    ]  
]