Skip to content

Auto Detect UTF-8 Encoding for French is broken in Notepad++ 7.6.x (detects vietnamese windows-1258) #5202

@MetaChuh

Description

@MetaChuh

Description of the Issue

Auto Detect UTF-8 Encoding for French is broken in Notepad++ 7.6.x

Steps to Reproduce the Issue

  1. create a new utf-8 document in notepad++
  2. paste or write the word "Mosaïque" in it
  3. save as test.txt
  4. close test.txt in notepad++ and reopen it
    (or close notepad++ and reopen it if sessions are enabled)

same for this file (direct download link to paquet.xml):
https://zone.spip.net/trac/spip-zone/export/HEAD/spip-zone/_plugins_/mosaique/trunk/paquet.xml

further information and files to test, submitted by Franckybleu, at: https://notepad-plus-plus.org/community/topic/16873/encodage/3

Expected Behavior

test.txt should be detected as utf-8

Actual Behavior

test.txt is detected as vietnamese windows-1258

Debug Information

Notepad++ v7.6.2 (32-bit)
Build time : Jan 1 2019 - 00:00:08
Path : C:\Program Files (x86)\Notepad++\notepad++.exe
Admin mode : OFF
Local Conf mode : OFF
OS : Windows 7 (64-bit)
Plugins : DSpellCheck.dll mimeTools.dll NppConverter.dll NppExport.dll

Edit:
if you write the word “Réservation” to a new utf-8 file and save and reopen it it is also detected as vietnamese.

@guy038 has discovered, that the sentence "Cette mosaïque était jolie" in a new utf-8 file will be detected correctly as utf-8
but further tests show that a utf-8 file only containing one of the words mosaïque or était will be detected as vietnamese, so only the combination of in this example an existing ï and an existing é will detect correctly.

same vietnamese detection happens with spanish, except if an ñ is present.
german characters seem to work fine

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions