Skip to content

Added troff/nroff lexer#264

Closed
rhaberkorn wants to merge 2 commits intoScintillaOrg:masterfrom
rhaberkorn:troff
Closed

Added troff/nroff lexer#264
rhaberkorn wants to merge 2 commits intoScintillaOrg:masterfrom
rhaberkorn:troff

Conversation

@rhaberkorn
Copy link
Copy Markdown
Contributor

@rhaberkorn rhaberkorn commented Aug 18, 2024

Among many other things, this allows lexing of manpage source documents.
The lexer is macro-package agnostic, though. Troff is also used as a typesetting package.

This lexer goes further than most Troff lexers by actually trying to parse its syntax instead of only applying heuristics. Unfortunately, the language is strictly speaking unparseable even with intimate knowledge of the syntax of all of its requests as its parsing behavior often depends on runtime state (and we all know that the halting problem is undecidable). Any Troff lexer is therefore necessarily a compromise.
This implementation tries hard to guarantee at least the following:

  • Reliably detect the beginning of requests and commands (macro calls) within known restrictions. This works even after flow control commands.
  • Highlight all of the escape sequences supported by Groff. However, currently it is not taken into account, that backslash often has to be escaped (\\) in string assignments and macro definitions. I am not sure if we can reliably predict where to resolve these indirections. (What does the bash lexer do in this case, anyway?)
  • Fold macro definitions, ignore blocks and flow control blocks (between \{ and \}).

Currently there are the following restrictions:

  • Escapes are not interpreted everywhere e.g. as part of command names.
  • For the same reasons, changing control characters via .cc or .c2 will
    not affect lexing - subsequent requests will not be styled correctly.
    Luckily this feature is rarely used.
  • Line feeds cannot be escaped everywhere - this would require a state machine for all parsing.
    However, the C lexer apparently has the same restriction.
  • It is impossible to predict which macro argument is a numeric expression or where the number is actually treated as text. Currently, they are highlighted in conditional expression (like after .if) and after all presumed macro calls, ie. if the command is not a known request from the keywordlist.
  • Also, escapes with levels of indirection (eg. \\$1) cannot currently
    be highlighted, as it is impossible to predict the context in which an
    expansion will be used. Perhaps we should add an exception at least for \\$n as they are practically never used without escaping the backslash?
  • Indirect blocks (.ami, .ami1, .dei and .dei1) cannot be folded.
  • No effort is done to highlight any of the preprocessors (tbl, pic, grap...). This could theoretically be done, but I have already spent 10 times more time on this lexer than I initially planned.

SciTECO already added support for this lexer..

Here's a proper SciTE properties file for supporting the Lexer.
It was only tested with SciTE 5.13, but I don't think that much has changed in the properties syntax. I can also send it to the Scintilla mailing list once this gets merged, unless @nyamatongwe just takes it from here.

@nyamatongwe
Copy link
Copy Markdown
Member

I will look at this after 5.4.0 is released in a few days.

@nyamatongwe
Copy link
Copy Markdown
Member

There are some shadowed variables which produces warnings. Shadowed variables can make it easier for maintainers to add new bugs so its best to rename either the inner or outer variables. This is best done by the original author as they are more likely to see problems in behaviour. This log is from Visual C++ Code Analysis and similar results are produced by cppcheck.

G:\u\hg\pull\lexilla\lexers\LexTroff.cxx(850): warning C6246: 
Local declaration of 'i' hides declaration of the same name in outer scope. 
For additional information, see previous declaration at line '793' of 'g:\u\hg\pull\lexilla\lexers\lextroff.cxx'.
G:\u\hg\pull\lexilla\lexers\LexTroff.cxx(851): warning C6246: 
Local declaration of 'style' hides declaration of the same name in outer scope. 
For additional information, see previous declaration at line '796' of 'g:\u\hg\pull\lexilla\lexers\lextroff.cxx'.

Among many other things, this allows lexing of manpage source documents.
The lexer is macro-package agnostic, though.

Lexer restrictions are documented in LexTroff.cxx.
@rhaberkorn
Copy link
Copy Markdown
Contributor Author

cppcheck doesn't warn about anything by default for some strange reason. Even more strangely, adding -Wshadow to CXXFLAGS/WARNINGS does not produce any warnings about the shadowed variables.

Anyway, I refactored my code and also renamed a few variables, to conform to the camel case style of lexilla.

The troff branch is rebased on top of master.

nyamatongwe pushed a commit that referenced this pull request Aug 23, 2024
Among many other things, this allows lexing of manpage source documents.
The lexer is macro-package agnostic, though.

Lexer restrictions are documented in LexTroff.cxx.
@nyamatongwe nyamatongwe added the committed Issue fixed in repository but not in release label Aug 23, 2024
@nyamatongwe
Copy link
Copy Markdown
Member

Squashed then committed with some minor changes.

Changed header inclusion order to match scripts\HeaderOrder.txt. Fixed a couple of misspellings 'similiar' and 'supportin'. Added to LexillaHistory.html and cppcheck.suppress.

@nyamatongwe
Copy link
Copy Markdown
Member

cppcheck doesn't warn about anything by default

The command line used (from the directory above lexilla is:

cppcheck -j 8 --enable=all --suppressions-list=lexilla/cppcheck.suppress --max-configs=120 -I lexilla/include -I lexilla/access -I lexilla/lexlib -I scintilla/include --template=gcc --quiet lexilla

@rhaberkorn
Copy link
Copy Markdown
Contributor Author

Thank you! Will you add Troff support to the upcoming SciTE release?

IMHO you can close this PR.

@nyamatongwe
Copy link
Copy Markdown
Member

The troff.properties file includes *.mm in file.patterns.troff which conflicts with its use in file.patterns.cpp in cpp.properties for Objective C++ for Apple development.

I couldn't find *.mm associated with troff on the popular file extension sites.

@rhaberkorn
Copy link
Copy Markdown
Contributor Author

rhaberkorn commented Aug 27, 2024

I couldn't find *.mm associated with troff on the popular file extension sites.

It's for the Memorandum Macros, a classic macro package. See man groff_mm and here.

Still, as Troff is generally little used nowadays, you should probably give precedence to Objective C++ and leave the *.mm extension in troff.properties commented out.

Even the Groff repository contains only a single letter.mm file. I used the Memorandum Macros here, so it was obvious I had to give precedence to Troff in my SciTECO lexer config. ;-)

@nyamatongwe
Copy link
Copy Markdown
Member

Added troff.properties to SciTE but to avoid long menus it is inactive until user enables it by changing imports.exclude.

https://sourceforge.net/p/scintilla/scite/ci/dcaf189779c4ab8408e40e0b98b61f44a1ffa02c/

@rhaberkorn
Copy link
Copy Markdown
Contributor Author

Great, thanks! I will drop a note on the Groff mailing list once it gets into the next SciTE release.

PS: Why did you remove all of the comments in troff.properties?

@rhaberkorn rhaberkorn closed this Aug 29, 2024
@nyamatongwe
Copy link
Copy Markdown
Member

PS: Why did you remove all of the comments in troff.properties?

That's the version embedded into the single file executable which doesn't have any comments since you can't read inside the executable and comments could muck up its slightly different format.

I'd actually forgotten to add the main copy of troff.properties so committing that:
https://sourceforge.net/p/scintilla/scite/ci/1726b0201bc43b76c15bb68e8242f62d8543ff2b/

@nyamatongwe nyamatongwe removed the committed Issue fixed in repository but not in release label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants