Extending the lexer

Convert code or other text into a structured tree (multi-dimensional array).

In This File:

  • create a grammar with directives & handler functions
  • test a grammar
  • a complex directive that builds an AST without php
  • How to write an extension

Extending

To extend the lexer, we will do four things, iteratively.

  1. Create a Grammar
  2. Create an array of Directives
  3. Create a trait for callbacks Directives use
  4. Write Directive tests

To run it, I suggest using your preferred test suite. Though, the built-in directive tests (composer) require taeluf/tester

1. Grammar

We'll be recreating part of the Bash Grammar.

<?php  
  
class MyBashGrammar extends \Tlf\Lexer\Grammar {  
  
    use MyBashGrammarCallbacks;  
  
    public function getNamespace(){return 'mybash';}  
  
    protected $directives = [];  
  
    public function __construct(){  
        $file = file_get_contents(__DIR__.'/directives.php');  
        $directives = require($file);  
        $this->directives = $directives;  
        // @tip use `array_merge()` to load directives from multiple files  
    }  
  
    public function onLexerStart($lexer,$file,$token){  
        // if you have any additional setup to do  
    }  
}  

2. Directives

directives.php

return [  
        'root'=>[  
            'is'=>[  
                // ':comment',  
                ':docblock',  
                ':function',  
            ],  
        ],  
  
        'docblock'=>[  
            'start'=>[  
                'match'=>'##',  
            ],  
            'stop'=>[  
                'match'=>'/(^\s*[^\#])/m',  
                'rewind 2',  
                'this:handleDocblockEnd',  
                'buffer.clear',  
                // 'forward 2'  
            ]  
        ],  
  
        'function'=>[  
            'start'=>[  
                'match'=>'/(?:function\s+)?([a-zA-Z\_0-9]*)(?:(?:\s*\(\))|\s+)\{/',  
                'this:handleFunction',  
                'stop',  
                'buffer.clear',  
            ]  
        ],  
        // an additional 'comment' directive is below  
    ];  

3. Callbacks

We'll write a trait with function to handle directive calls. You'll notice this:handleFunction in the directives above. That will call handleFunction(...) on your trait below.

<?php  
trait MyBashGrammarCallbacks {  
  
    public function handleDocblockEnd($lexer, $ast, $token, $directive){  
        $block = $token->buffer();  
        $clean_input = preg_replace('/^\s*#+/m','',$block);  
        $db_grammar = new \Tlf\Lexer\DocblockGrammar();  
        $ast = $db_grammar->buildAstWithAttributes(explode("\n",$clean_input));  
        $lexer->setPrevious('docblock', $ast);  
    }  
  
    public function handleFunction($lexer, $ast, $token, $directive){  
        // $func_name = $token->match(1);  
        $func = new \Tlf\Lexer\Ast('function');  
        $func->name = $token->match(1);  
        $func->docblock = $lexer->previous('docblock');  
        $lexer->getHead()->add('function', $func);  
    }  

4. Directive Tests

You need to be using taeluf/tester for built-in tests to work

<?php  
  
class MyBashGrammarTest extends extends \Tlf\Lexer\Test\Tester {  
  
    protected $my_tests = [  
        'Comments'=>[  
            // the 'comment' directive is below and can be added to the `MyGrammar` that is above  
            'start'=>'comment', // t  
            'input'=>"var=\"abc\"\n#I am a comment\nvarb=\"def\"",  
            'expect'=>[  
                "comments"=>[  
                    0=>[  
                        'type'=>'comment',  
                        'src'=>'#I am a comment',  
                        'description'=> "I am a comment",  
                    ]  
                ],  
            ],  
        ],  
    ];  
  
    public function testBashDirectives(){  
        $myGrammar = new \MyGrammar();  
        $grammars = [  
            $myGrammar  
        ];  
        // $docGram->buildDirectives();  
  
        $this->runDirectiveTests($grammars, $this->my_tests);  
    }  
  
}  

A more complex directive

<?php  
// you would put this in your directives class  
$directives = [  
    'comment'=>[  
        'start'=>[  
            'match'=>'/#[^\#]/',  
            'rewind 2',  
            'buffer.clear',  
            'forward 1',  
            // you can create & modify ASTs all in the directive code, without php  
            'ast.new'=>[  
                '_addto'=>'comments',  
                '_type'=>'comment',  
                'src'=>'_token:buffer',  
            ],  
            'buffer.clear //again',  
        ],  
  
        // `match` gets called for each char after `start`  
        'match'=>[  
            'match'=>'/@[a-zA-Z0-9]/', // match an @attribute  
            'rewind 1',  
            'ast.append src',  
            'rewind 1 // again',  
            'ast.append description',  
            'forward 2',  
            'buffer.clear',  
            'then :+'=>[ // the :+ means that we're defining a new directive rather than referencing an existing one  
                'start'=>[  
                    //just immediately start  
                    'match'=>'',  
                    'rewind 1',  
                ],  
                'stop'=>[  
                    // i honestly don't know why I have this here.  
                    'match'=>'/(\\r|\\n)/',  
                    'rewind 1',  
                    'ast.append src',  
                    'buffer.clear',  
                ]  
            ],  
  
        ],  
        'stop'=>[  
            'match'=>'/(\\r|\\n)/',  
            'rewind'=>1,  
            'ast.append src',  
            'ast.append description',  
            'forward'=>1,  
            'buffer.clear',  
        ],  
    ]  
];