|
noexcept |
Inspects at most the given number of bytes of a buffer to determine how many bytes must be read from it in order to successfully perform a conversion of one UTF-8 code to a single wide character in UCS format and optionally writes the resulting wide character into a provided buffer.
[out] | pUcs | is an optional pointer to a buffer where the converted wide character can be written. |
[in] | pUtf | is a pointer to a multibyte character given in UTF-8 format. The pUtf pointer can be NULL . In this case the function ignores other parameters and simply returns 0 behaving similarly to the mbtowc function as defined by the C standard for state-independent encodings such as UTF-8. |
[in] | cbUtf | is a number of bytes for function to inspect at most. An actual byte size of the character can be less than cbUtf. If the actual length is less than the specified value, the rest bytes are ignored. |
pUtf
pointer is NULL
, the function returns 0 ignoring other parameters. If pUtf
is not a NULL
pointer, the function either returns 0, if pUtf
points to a null character, or an actual number of bytes of the UTF-8 code, or -1 (with errno
set to EILSEQ), if cbUtf
bytes of a character pointed to by pUtf
do not provide information needed to perform the conversion or the resulting code cannot be represented with the wchar_t
type.pUtf
. The validity can be ascertained with a call to u8check_cp with the output value of pUcs
given as the parameter. If pUcs
is NULL
or the caller needs to perform the check before the call to u8towc
, one can use the u8check function. But the first method is preferable for efficiency reasons.The function implements the conversion of a UTF-8 encoded character to the corresponding UCS-encoded form held by a wide character independently of the current locale. The result of the conversion depends upon the size of the wchar_t
type. For instance, on Windows sizeof(wchar_t)
equals 2 which is not enough to cover all possible Unicode code points. In this case a conversion to the UCS-2 will take place. On the other hand, some Linux compilers define the size of the wchar_t
type as 4 which results in UCS-4 based conversion performed by the function.
If the pUtf
UTF-8 code, when converted to wchar_t, results in a loss of data, the function returns -1 and sets errno
to EILSEQ
.
The interface is similar to mbtowc defined by the C standard, except that the maximum possible byte length of the UTF-8 characters is UTF8_MAX_LEN
and not MB_CUR_MAX
or MB_LEN_MAX
.