Lexer

Parse code and other text into structured trees.

The Lexer loops over the characters in a string, and processes them through Grammars to generate Asymmetrical Syntax Trees (AST) - Basically just a multi-dimensional array that describes the code.

Documentation

  • Installation & Getting Started are below
  • Create a Parser - Create a language parser that builds an AST, or modify an existing parser.

Other/Old Documentation

Supported Languages

Additional language support can be added through new grammars. See docs/Extend.md.

  • Docblocks: Parses standard docblocks (/** */) for description and @param. Somewhat language independent.
  • Php: Very Good (php 7.4 - 8.2)
  • Bash: Broken. There was an old grammar which would parse functions and doclbocks (using ##\n#), but it has not been returned to functionality with the current lexer.

Install

composer require taeluf/lexer v0.8-new-php-grammar.x-dev   

or in your composer.json

{"require":{ "taeluf/lexer": "v0.8-new-php-grammar.x-dev"}}  

Parse with CLI

This prints an Asymmetrical Syntax Tree (AST) as JSON.

# only file path is required  
bin/lex file rel/path/to/file.php -nocache -debug -stop_at -1  

Basic Usage

For built-in grammars, it is easiest to use the helper class.

<?php  
$file = '/path/to/file/';  
$helper = new \Tlf\Lexer\Helper();  
$lexer = $helper->get_lexer_for_file($full_path); // \Tlf\Lexer  
  
# These are the default settings  
$lexer->useCache = true; // set false to disable caching  
$lexer->debug = false; // set true to show debug messages  
$lexer->stop_loop = -1; // set to a positive int to stop processing & print current lexer status (for debugging)  
  
$ast = $lexer->lexFile($full_path); // \Tlf\Lexer\Ast  
  
print_r($ast->getTree()); // array  

Example Tree

Grammars determine tree structure. This is a php file in this repo. See code/Ast/StringAst.php. This example only has one method and no properties. From the root of this repo, run bin/lex file code/Lexer.php for an example of a larger class.

{  
    "type": "file",  
    "ext": "php",  
    "name": "StringAst",  
    "path": "\/path-to-downloads-dir\/Lexer\/code\/Ast\/StringAst.php",  
    "namespace": {  
        "type": "namespace",  
        "name": "Tlf\\Lexer",  
        "declaration": "namespace Tlf\\Lexer;",  
        "class": [  
            {  
                "type": "class",  
                "namespace": "Tlf\\Lexer",  
                "fqn": "Tlf\\Lexer\\StringAst",  
                "name": "StringAst",  
                "extends": "Ast",  
                "declaration": "class StringAst extends Ast",  
                "methods": [  
                    {  
                        "type": "method",  
                        "args": [  
                            {  
                                "type": "arg",  
                                "name": "sourceTree",  
                                "value": "null",  
                                "declaration": "$sourceTree = null"  
                            }  
                        ],  
                        "modifiers": [  
                            "public"  
                        ],  
                        "name": "getTree",  
                        "body": "return $this->get('value');",  
                        "declaration": "public function getTree($sourceTree = null)"  
                    }  
                ]  
            }  
        ]  
    }  
}  

A php class property looks like:

{  
    "type": "property",  
    "modifiers": [  
        "public",  
        "int"  
    ],  
    "docblock": {  
        "type": "docblock",  
        "description": "The loop we're on. 0 means not started. 1 means we're executing the first loop. 2 means 2nd loop, etc."  
    },  
    "name": "loop_count",  
    "value": "0",  
    "declaration": "public int $loop_count = 0;"  
}