chsvlib
chsv helper source code

◆ ConvertUCSToUTF8()

std:: size_t Chusov::String::ConvertUCSToUTF8 ( char *restrict  pUtf,
std::size_t  cbUtf,
const wchar_t *restrict  pUcs,
std::size_t *  pcchUcs 
)
noexcept

Converts a wide UCS2/UCS-4 string to its multibyte UTF-8 equivalent and returns a number of bytes occupied by the converted multibyte string independently on the current C locale.

Parameters
[out]pUtfis a pointer to an output buffer receiving the converted multibyte string in UTF-8 format. The size of the buffer, in bytes, is specified by cbUtf value. An actual length of the converted string, in bytes, is returned by the function. If the null character is found in the input string the function stops the conversion and returns. The null terminator is not converted.
[in]cbUtfspecifies a byte size of the pUtf buffer. If the pUtf is NULL the parameter specifies a number of bytes of the output required by the caller or that would have been written if the caller would have passed a non-NULL buffer of cbUtf bytes of size.
[in]pUcsis a pointer to a wide string to be converted to its multibyte equivalent. The string must be represented in the UCS-2 format (or in the UCS-4 format if objects of the wchar_t type are able to store UCS-4 characters). Its size, in wide characters is given by an input value of pcchUcs or by an input zero-terminator. Thus, if pcchUcs is not NULL or if it holds a value that is not (size_t) -1, the input wide string need not to be zero-terminated.
[in,out]pcchUcson input specifies a length, in wide characters, of the string to be converted. On output the pointer holds a number of wide characters successfully converted and, if pUtf is not NULL, written to the output buffer. The parameter can be NULL. If it is NULL, or the value it is associated with is (size_t) -1, the input string must be zero-terminated. Also if a null character appears among the first *pcchUcs characters of the string, the last part of the string is ignored.
Returns
On success the function returns a number of bytes that are written (or could been written if pUtf was NULL) to cbUtf bytes of the output buffer without the zero-terminator which is not converted. On failure the function returns (size_t) -1 setting corresponding chsvlib error code.
Remarks
The length of the source wide string is given in characters by the input value of pcchUcs parameter or by the null character if pcchUcs is NULL, *pcchStringMB is (size_t) -1 or if 0 terminator appears among the *pcchUcs characters of the source string.
If the size of the pUtf buffer is not large enough to store all of the converted complete symbols, the function stops the conversion and returns a number of bytes successfully written to the pString buffer. No incomplete symbols are written to the output.
If the pcchUcs pointer is NULL or if *pcchUcs is (size_t) -1, the input string must be zero-terminated.
If the pUtf is NULL, the function returns a number of bytes needed to store *pcchUcs characters of the converted string (or the whole string if pcchUcs is NULL) without the null terminator. In both cases output value of the pcchUcs (if specified) contains a number of characters converted so that the resulting string is not longer than cbUtf bytes.
Note
Set cbUtf to (size_t) -1 to get the required size of the output buffer to hold the entire converted string.
Remarks
This function and the ConvertUTF8ToUCS function are implemented as an addition to the ConvertWideToMBS and ConvertMBSToWide functions respectively. This additions work with UTF-8 and UCS-2/UCS-4 encodings directly independently on the C locale settings while ConvertWideToMBS and ConvertMBSToWide perform conversions between encodings specified by the current environment settings. Particularly libraries shipped with MS C/C++ compilers do not handle encodings correctly with code points of variable byte length, for instance the UTF-8 encoding. Therefore ConvertWideToMBS and ConvertMBSToWide, which rely on the standard library calls, have the same limitations. At the same time ConvertUCSToUTF8 and ConvertUTF8ToUCS implement their own conversion mechanisms which allow working with UTF-8 even when their code is compiled using the MS compilers.
Warning
The function neither converts nor generates the terminating zero character in the output string even if a zero wide character is in the input.
See also
ConvertUTF8ToUCS;
ConvertWideToMBS;
ConvertMBSToWide;
wctou8;
wcstou8s;
wctou8_s;
wcstou8s_s.