Docblock + Comments Grammar

July 13, 2021

I decided to just do a custom php implementation of the whole thing. The grammar handles catching the start & end of the docblock & calling the appropriate php function. This seemed easier, because I couldn't really store anything to the ast until I had already gone through the entire thing & I need to know about lines & it just seemed really complex, so writing pure php seemed easier.

I still think it should be re-written as a proper grammar, but there's no explicit reason to do this.

I may want a grammar that processes strings containing attributes, though, like so I can capture @attr(arg, arg2, etc) and description

But I don't need that right now, so I'm not doing it right now

The "OLD" notes below are likely a good starting point for writing a proper docblock grammar.

OLD (before July 13, 2021)

I wrote most of a Docblock Grammar. Then started trying to handle the padding issue & it was non-starter. So that has to basically be scratched so I can implement this new version.

I think I'll just go through the breakdown WITH attributes & set that one up, instead of trying to work attributes in after setting up the simpler attribute-free one.

I would mayybe like to setup some smaller tests. But Idunno. I'm probably okay with just parsing one big docblock & making sure its how I expect it. Especially since docblocks aren't complex & don't have a lot of components.

Some rules

  • leading whitespace is always removed from the first line. Like /** something becomes something
  • left-pad consists of whitespace * whitespace & is converted to full whitespace
    • the * is always replaced with a (space) before further processing
  • leftPadToRemove = the length of whichever line has the shortest left-pad.
    • Does NOT count any lines after a start-of-line @attribute
    • Does NOT count the first line
    • Does NOT count empty lines (though they will be trimmed)
    • Then each non-empty line has that much left-pad removed from them.
  • Attributes MUST be [a-zA-Z_]+
  • Attributes MAY have parenthetical arguments like @attr(arg1, arg2)
    • Can occur anywhere
    • Can only have a description if it's the first thing on a line
      • \n * something [this](/this) thing yes please. There is no description for [this](/this)
      • \n * [this](/this) thing yes please. thing yes please is the description
  • Attributes without parenthetical arguments
    • MUST be the first non-whitespace non-star character. Like \n * @attr whatever
    • Captures a description until the next @attribute

The Lexer breakdown

  /* abc  
   * def  
   *    ghi  
       *   jkl  
      mno  
   */  
  1. match /* & discard all before.
  2. match \n & store abc as docblock.line1
  3. match * & replace it with
  4. match \n & push the line def onto doblock.lines
  5. match * & replace it with
  6. match \n & push the line ghi onto docblock.lines
  7. match \n & push the line mno onto docblock.lines
  8. match */ & set the line to docblock.last_line
  9. call php method to clean up the docblock, which will do:
    • trim the first line
    • iterate over all lines & find the shortest left-pad
    • remove left-padding from all lines per the rules
    • trim the last line
    • combine all the lines for the description & the attributes.

Breakdown with attributes

  /* abc  
   * def  
     @param this is a sentence about @param  
         line 2 param  
   * @param(two,three) has a description. [something](/something)  
   */  
  1. match /* & discard all before.
  2. match \n & store abc as docblock.line1
  3. match * & replace it with
  4. match \n & push the line def onto doblock.lines
  5. match @param , create new attribute ast & set its name
  6. match \n. Store the line on the attribute ast as attr.line1
  7. match \n. Store the line line 2 param on attr.lines
    • it will be trimmed via the same left-pad trimming as docblock lines
  8. match * & replace it with
  9. match @param(, terminate processing of any prior @attribute. Create new attribute ast & set its name & create an empty argslist
  10. match ,. push two onto attribute.args
  11. match ). push three onto attribute.args.
  12. match [set name. create attribute.args` list.
    • Append it to @param(two](/. new attribute ast)s attributes`
    • OR add the attribute to the docblock? (probably not)
  13. match ). push something to attribute.args. immediately stop this directive.
  14. match \n. Store has a description. [something](/something) as attribute.line1 on @param(two,three)
  15. match */. set the line to attribute.last_line
  16. call php method to clean up the docblock, which will do:
    • trim the first line
    • iterate over all lines & find the shortest left-pad
    • remove left-padding from all lines per the rules
    • trim the last line
    • combine all the lines for the description & the attributes.

Older notes

The indentation problem

    /* abc  
     * def  
     *    ghi  
         *   jkl  
        mno  
    */  

Should become:

abc  
def  
   ghi  
      jkl  
 mno  

So the indent to remove is the shortest indent, other than the first line.

The shortest indent is:
*, before def

For the rest the lines, it doesn't matter if they have a star or not.
It just matters the distance from the 0 column to the first non-star char

So we find the SMALLEST distance from the 0 column to the first non-star char

The first line is always free from indentation considerations, due to how the grammar processes

The old notes

  • Write DocBlock + Comments Grammar
    • src, description, and attributes
    • Can come from multiple different styles (##\n#\n#\n# or /**\n*\n* or // comment or ! comment or whatever)
      • Maybe there is a docblock cleanup step that is non lexerryy?? Or I have to add something to make it flexible ...
      • I suppose I could have a method that modifies one of my directives, so basically you just set the docblock type to /*, then the relevant directives are modified accordingly. This is a fairly convoluted approach, but it would also work. It could even be "aware" of what other language grammar is present, maybe. (maybe not though).
    • catches @name and the description about it, @name(arg1,arg2) description about it, ... what about @arg prop_name description about it & catch prop_name??? I think this is a case-by-case kind of thing. Maybe depending upon the particular attribute? maybe the language grammar can define it

/**
*
*/

or /** */

or

or

or
//
//
//

So, first I have to make a decision about "what is a docblock?", well the common answer is:

/**  
*  
*  
*/  

But if I want docblock functionality in different languages (bash), how tf do I do that?
Well I've proposed:

##  
#  
#  
#  

Which doesn't supply a clear terminator, but it makes sense.

I could also do just a series of single-line comments. But I think that breaks a common expectation. So no. I won't do that.

So right now, I'm counting ##\n#\n# and /**\n*\n*/ as well as their single-line counterparts ## stuff and /** */

But the ## stuff breaks the docblock contract

But how do I count ## whatever as a docblock but NOT catch it as a comment?

by simply defining things separately.... basically. The # would start a comment, but listen for docblock. Then if the next char is #, then that means we have a docblock & the comment listening can stop & we go into docblock mode.

But how does docblock processing actually work? Lets start with a single line example.

/** I am a simple docblock */  

/* starts a docblock
*/ ends a docblock
I am a simple docblock is the description / body

Then

/**  
* Simple multi-line docblock  
*/  

So ...
previously, I was just parsing all of the stars out after getting the end of the docblock. I could do that still, I suppose, but its kind of a jank solution, especially considering I have a lexer!

So the things I want to remove are:
^\s+\*
If there is no * on a line,

  1. Get a /* to start a docblock
  2. Get a \n to end a line
    • rewind 1
    • append to body. Append to description?
    • forward 1
  3. Get a \n or a * and discard what's before *

Attributes:

  • Start-of-line attributes like @param $argName description of arg
  • in-line attributes like I want you to @see TheThing
  • start-of-line like @param($argName, string) description of the arg
  • in-line like: Lets [TheThing](/TheThing)

I don't really care about the This is @abad Inline Type. Because its just ... not really clear how that should be interpreted & I don't think I want to make decisions about it & Code Scrawl doesn't use this style. Or it hasn't, anyway.