chsvlib
chsv helper source code

◆ ConvertUTF8ToUCS()

std:: size_t Chusov::String::ConvertUTF8ToUCS ( wchar_t *restrict  pStringW,
std::size_t  cchStringW,
const char *restrict  pStringMB,
std::size_t  cbStringMB 
)
noexcept

Converts a multibyte UTF-8 string into its wide UCS-2 (or UCS-4) equivalent and returns a number of successfully converted characters independently on the current C locale.

Parameters
[out]pStringWis an optional buffer capable to store cchStringW wide UCS-2 (or UCS-4 if objects of the wchar_t type are able to store UCS-4 characters) characters of the converted wide string. If the pointer is NULL, cchStringW specifies a number of wide characters that, at most, are required by the caller. Anyway the function returns a number of wide characters that would have been produced if a valid pStringW buffer of cchStringW character size have been passed to the function with the same values of other parameters. The resulting string is not zero-terminated.
[in]cchStringWspecifies a capacity of the output buffer or, if the pStringW is NULL, a number of wide characters would have been written to the output as a result of the conversion.
[in]pStringMBspecifies an input UTF-8 multibyte string. The length of the string is specified by cbStringMB in bytes or, if cbStringMB is (size_t) -1 by the terminating zero character. Also if the terminating zero appears among the first cbStringMB bytes of the input string, it is ignored together with the rest characters. Thus if pStringMB is zero-terminated, a value of cbStringMB can be (size_t) -1; also if cbStringMB specifies an actual length of the converted string, the latter need not to be zero-terminated.
[in]cbStringMBis a length, in bytes, of the multibyte string. If cbStringMB is (size_t) -1, the string must be zero-terminated. If the null terminator is found among cbStringMB bytes of the multibyte string, the rest part of the string is not taken into account as well as the terminator itself.
Returns
If pStringW is NULL the function returns a number of characters would have been produced if a valid pStringW buffer of cchStringW character size have been passed to the function with the same other parameters. If pStringW is not NULL the function returns an actual length of the converted string in wide characters. The returned value does not include a terminating zero which is not converted. On failure the function returns (size_t) -1 setting corresponding chsvlib error code. If the conversion fails because of an invalid multibyte sequence, the function fails and sets chsvlib error code to CHSVERROR_INVALID_CHAR_SEQUENCE.
Remarks
The length of the source multibyte string is given by cbStringMB, in bytes, or by the null character if cbStringMB is (size_t) -1 or if 0 terminator appears among the cbStringMB characters of the source string.
If the pStringW is NULL, the function returns a number of characters would have been produced if a valid pStringW buffer of cchStringW character size have been passed to the function with the same other parameters.
If the size of the pStringW buffer is not large enough to store all of the converted symbols, the function stops conversion and returns a number of characters successfully written to the pStringW buffer.
If the null character is found in the input string the function stops the conversion and returns. The null terminator is not converted.
If the last part of the source string constitutes an incomplete but potentially valid multibyte character, it is ignored by the function.
This function and the ConvertUCSToUTF8 function are implemented as an addition to the ConvertMBSToWide and ConvertWideToMBS functions respectively. This additions work with UTF-8 and UCS-2/UCS-4 encodings directly independently on the C locale settings while ConvertWideToMBS and ConvertMBSToWide perform conversions between encodings specified by the current environment settings. Particularly libraries shipped with MS C/C++ compilers do not handle encodings correctly with code points of variable byte length, for instance the UTF-8 encoding. Therefore ConvertWideToMBS and ConvertMBSToWide, which rely on the standard library calls, have the same limitations. At the same time ConvertUCSToUTF8 and ConvertUTF8ToUCS implement their own conversion mechanisms which allow working with UTF-8 even when their code is compiled using the MS compilers.
Warning
The function neither converts nor generates the terminating zero character in the output string even if a zero character is in the input.
See also
ConvertUCSToUTF8;
ConvertMBSToWide;
ConvertWideToMBS;
u8towc;
u8stowcs;
u8stowcs_s.