Php Lexer

A declarative lexer seamlessly hooking into php functions, building ASTs for multiple languages.

See README2.md for new documentation

Development Status / Roadmap

May 13, 2022 Update:

Branch v0.8 is highly tested for PhpGrammar. It does not support PHP 8.0+ features. Most other Php Features are supported, but likely not all.

The documentation is ... not good.

This project is in use by Code Scrawl & will be further developed as i personally need it to be. For example today (may 13), i added support for use ($var) on anonymous functions, because i was unable to generate proper documentation for Lil Db

I dream, one day, of supporting many languages & possibly transpilation. I doubt that dream will be realized.

I don't plan to significantly change the internal implementation of the lexer or grammars. I think it could be a lot better, but this would require serious development time I'm not likely to give to this project.

Install

For development, it depends upon taeluf/php/php-tests, which is installed via composer and taeluf/php/CodeScrawl, which is NOT installed via composer because of circular dependency sometimes causing havoc.

composer require taeluf/lexer v0.8.x-dev   

or in your composer.json

{"require":{ "taeluf/lexer": "v0.8.x-dev"}}  

Generate an AST

See docs/Examples.md for more examples
Example:

$lexer = new \Tlf\Lexer();  
$lexer->useCache = false; // cache is disabled only for testing  
$lexer->addGrammar($phpGrammar = new \Tlf\Lexer\PhpGrammar());  
  
$ast = $lexer->lexFile(dirname(__DIR__).'/php/SampleClass.php');  
  
// An array detailing the file   
$tree = $ast->getTree();   

See test/input/php/lex/SampleClass.php for the input file and test/output/php/tree/SampleClass.js for the output $tree.

Status of Grammars

  • Php: Early implementation that catches most class information (in a lazy form) but may have bugs
  • Docblock: Currently handles /* style, cleans up indentation, removes leading * from each line, and processes simple attributes (start a line with * @something description).
    • Coming soon (maybe): Processsing of @‌method_attributes(arg1,arg2)
  • Bash: Coming soon, but will only catch function declarations & their docblocks.
    • the docblocks start with ## and each subsequent line must start with whitespace then # or just #.
    • I'm writing it so i can document git-bent
  • Javascript: Coming soon, but will only catch docblocks, classes, methods, static functions, and mayyybee properties on classes.

Write a Grammar

A Grammar is an array declaration of directives that define instructions. Those instructions may call built-in commands or may explicitly call methods on a grammar, the lexer, the token, or the head ast.

Writing a grammar is very involved, so please see docs/GrammarWriting.md for details.

Warning

  • Sometimes when you run the lexer, there will be echod output. Use output buffering if you want to stop this.
  • During onLexerEnd(...), Docblock does $ast->add('docblock', $lexer->previous('docblock')) IF there's a previous docblock set.

Contribute

  • Need features? Check out the Status.md document and see what needs to be done. Open up an issue if you're working on something, so we don't double efforts.