Simple word indexer (15)
. Continued from the previous.
Positioning to a character (3)
I made a much simplified variant of my testing program, in order to learn more about the nature of the bug I described in my previous article. The test file now contains only one character. It is written, then read, in only a single stream.
When the source file is compiled with the command:
cc siworin15.c -o x
and run as simply:
./x
no infinite loop occurs! The output is:
Line  35, i = 1
Line  38, i = 1
Line  41, error 84 Invalid or incomplete multibyte or wide character
And that is correct. If however the program is run as:
./x loop
the output is:
Line  35, i = 0
Line  38, i = 0
Line  35, i = 1
and the program never terminates, unless a signal is sent by
   pressing ctrl-c.
The difference is, that if there is no command line argument, the program starts at the second byte (byte number 1) of the file. The file contains the byte sequence c3-a1 (in hex), which is the UTF-8 encoding for Unicode character e1, meaning á. That second byte, hex a1, is invalid because it starts with the bits 10, so the byte cannot be the start of a UTF-8 encoding, only a follow-up byte.
The fseek succeeds, and fgetwc
   (get a wide character from a multibyte stream) sets the error
   code to:
84 Invalid or incomplete multibyte or wide character.
Correctly handled.
In the other test case, calling the program with a command line
   argument (any, "loop" is just an example), an fseek
   is done to the first byte (byte 0) of the file (without any effect,
   as the file pointer was already at the start), and fgetw
   reads a correct 2-byte character from it. THEN the fseek
   to the incorrect position (second byte, byte number one) is done, and
   THAT gets the GNU glibc library (2.31, 2.33) into an endless loop.
This means the bug is less severe than I first thought. In the case
   in which I might have needed to fseek to a possibly
   incorrect character position in a file, for
   providing context
   for a found search word, I would NOT first have read the full
   character. Because I don’t know where that starts. So I probably
   would not have encountered the bug. I only encountered it in a
   test program that does things that do not make sense in a real-life
   application.
Yet, I insist that a library function must never get into an infinite loop, so the bug should be repaired. But it is less urgent than I thought.
Update 23 August 2021
The day before yesterday
   I wrote
   I hadn’t tested with glibc version 2.34, which is the latest stable
   version. Today however I did, with a library, loader and locale
   freshly compiled and locally installed, from sources in
   glibc-2.34.tar.xz, downloaded from
   GNU itself.
   Result: glibc 2.34 also contains the bug, as do 2.28 (under Debian),
   2.31 (Mint and Ubuntu) and 2.34 (Ubuntu).
Update 2: Bug reported
To the next article (GPLv3), and see also this one on the same sub-subject.
Copyright © 2021 by R. Harmsen, all rights reserved.