Skip to content

Inconsistent lexing of '-' at line start by bash lexer #202

@nyamatongwe

Description

@nyamatongwe

The following text is lexed inconsistently by the bash lexer with -a as different styles [7,8] and -b as [8,8] where 7=SCE_SH_OPERATOR and 8=SCE_SH_IDENTIFIER.

-a
#
-b
#

Styles will be applied differently depending on where a lex starts: add a character to the comment on line 4 (after -b) and -b will change to have two styles.

This is caused by the test IsASpace(sc.chPrev) in 'test operator or short and long option' since sc.chPrev is initialized to 0 instead of to the character before the start of lexing.

The constructor of StyleContext can be changed to initialize chPrev to the preceding character (often '\n') with this patch.

--- a/lexlib/StyleContext.cxx
+++ b/lexlib/StyleContext.cxx
@@ -40,6 +40,8 @@ StyleContext::StyleContext(Sci_PositionU startPos, Sci_PositionU length,
 	styler.StartAt(startPos /*, chMask*/);
 	styler.StartSegment(startPos);
 
+	chPrev = GetRelativeCharacter(-1);
+
 	// Variable width is now 0 so GetNextChar gets the char at currentPos into chNext/widthNext
 	GetNextChar();
 	ch = chNext;

However, that doesn't fix the -a at the start of file as there is no preceding '\n' to place in chPrev. There could be a pretend '\n' before the file start but that is likely to be unexpected. It would also be unusual to treat '\0' as a space in IsASpace. To fix the file start case, an additional clause can be added.

--- a/lexers/LexBash.cxx
+++ b/lexers/LexBash.cxx
@@ -1092,7 +1092,7 @@ void SCI_METHOD LexerBash::Lex(Sci_PositionU startPos, Sci_Position length, int
 			} else if (sc.ch == '-' && // test operator or short and long option
 					   cmdState != CmdState::Arithmetic &&
 					   (IsUpperOrLowerCase(sc.chNext) || sc.chNext == '-') &&
-					   IsASpace(sc.chPrev)) {
+					   ((sc.chPrev == 0) || IsASpace(sc.chPrev))) {
 				sc.SetState(SCE_SH_WORD | insideCommand);
 				sc.Forward();
 			} else if (setBashOperator.Contains(sc.ch)) {

Patch 2 supersedes the need for patch 1 but its likely safer to make both changes for more stable lexing of unexpected cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bashCaused by the bash lexerlexlibThe utility library used by lexers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions