chsvlib
chsv helper source code

◆ u8towc()

int Chusov::String::u8towc ( wchar_t *restrict  pUcs,
const char *restrict  pUtf,
std::size_t  cbUtf 
)
noexcept

Inspects at most the given number of bytes of a buffer to determine how many bytes must be read from it in order to successfully perform a conversion of one UTF-8 code to a single wide character in UCS format and optionally writes the resulting wide character into a provided buffer.

Parameters
[out]pUcsis an optional pointer to a buffer where the converted wide character can be written.
[in]pUtfis a pointer to a multibyte character given in UTF-8 format. The pUtf pointer can be NULL. In this case the function ignores other parameters and simply returns 0 behaving similarly to the mbtowc function as defined by the C standard for state-independent encodings such as UTF-8.
[in]cbUtfis a number of bytes for function to inspect at most. An actual byte size of the character can be less than cbUtf. If the actual length is less than the specified value, the rest bytes are ignored.
Returns
If the pUtf pointer is NULL, the function returns 0 ignoring other parameters. If pUtf is not a NULL pointer, the function either returns 0, if pUtf points to a null character, or an actual number of bytes of the UTF-8 code, or -1 (with errno set to EILSEQ), if cbUtf bytes of a character pointed to by pUtf do not provide information needed to perform the conversion or the resulting code cannot be represented with the wchar_t type.
Warning
A successful call of the function does not guarantee validity of the multibyte code in pUtf. The validity can be ascertained with a call to u8check_cp with the output value of pUcs given as the parameter. If pUcs is NULL or the caller needs to perform the check before the call to u8towc, one can use the u8check function. But the first method is preferable for efficiency reasons.

The function implements the conversion of a UTF-8 encoded character to the corresponding UCS-encoded form held by a wide character independently of the current locale. The result of the conversion depends upon the size of the wchar_t type. For instance, on Windows sizeof(wchar_t) equals 2 which is not enough to cover all possible Unicode code points. In this case a conversion to the UCS-2 will take place. On the other hand, some Linux compilers define the size of the wchar_t type as 4 which results in UCS-4 based conversion performed by the function.

If the pUtf UTF-8 code, when converted to wchar_t, results in a loss of data, the function returns -1 and sets errno to EILSEQ.

The interface is similar to mbtowc defined by the C standard, except that the maximum possible byte length of the UTF-8 characters is UTF8_MAX_LEN and not MB_CUR_MAX or MB_LEN_MAX.

See also
wctou8;
u8len;
u8stowcs.