chsvlib
chsv helper source code

◆ u8toucp()

int Chusov::String::u8toucp ( ucp_t *restrict  pUcp,
const char *restrict  pUtf,
std::size_t  cbUtf 
)
noexcept

Inspects at most the given number of bytes of a buffer to determine how many bytes must be read from it in order to successfully perform a conversion of one UTF-8 code to a single Unicode code point and optionally writes the resulting code point into a provided buffer.

Parameters
[out]pUcpis an optional pointer to a buffer where the converted Unicode code point can be written.
[in]pUtfis a pointer to a multibyte character given in UTF-8 format. The pUtf pointer can be NULL. In this case the function ignores other parameters and simply returns 0 behaving similarly to the mbtowc function as defined by the C standard for state-independent encodings such as UTF-8.
[in]cbUtfis a number of bytes for function to inspect at most. An actual byte size of the character can be less than cbUtf. If the actual length is less than the specified value, the rest bytes are ignored.
Returns
If the pUtf pointer is NULL, the function returns 0 ignoring other parameters. If pUtf is not a NULL pointer, the function either returns 0, if pUtf points to a null character, or -1 (with errno set to EILSEQ), if cbUtf bytes of a character pointed to by pUtf do not provide information needed to perform the conversion, or a number of bytes of the UTF-8 code otherwise.
Warning
A successful call of the function does not guarantee validity of the multibyte code in pUtf. The validity can be ascertained with a call to u8check_cp with the output value of pUcp given as the parameter. If pUcp is NULL or the caller needs to perform the check before the call to u8toucp, one can use the u8check function. But the first method is preferable for efficiency reasons.

The function is built in a portable way to perform the conversion from UTF-8 to a Unicode 11.0 code point independently of the current locale.

The interface is similar to mbtowc defined by the C standard, except that the maximum possible byte length of the UTF-8 characters is UTF8_MAX_LEN and not MB_CUR_MAX or MB_LEN_MAX.

See also
ucptou8;
u8len;
u8stoucps.