chsvlib
chsv helper source code

◆ u8stowcs_s()

errno_t Chusov::String::u8stowcs_s ( std::size_t *restrict  pcchConverted,
wchar_t *restrict  pUcs,
rsize_t  cchUcsMax,
const char *restrict  pszUtf,
rsize_t  cchUtf 
)
noexcept

Converts the specified sequence of UTF-8 characters into a sequence of corresponding UCS-2 (or UCS-4) wide characters and, if the specified output buffer is given by a non-NULL pointer, stores at most the specified number of wide characters into the specified array terminating the output string by a null wide character.

Parameters
[out]pcchConvertedis a mandatory pointer to an output buffer, which receives a number corresponding to a result of the conversion operation. If the pointer is NULL, there is a runtime-constraint violation. If the pointer is not NULL and either any other runtime constraint is violated or an encoding error occurs, the value of pcchConverted is set to (size_t) -1. Otherwise, the function writes there a number of multibyte UTF-8 characters of the pszUtf string, that are successfully converted to their wide UCS-2 (UCS-4) equivalents, not counting the terminating null character (if any).
[out]pUcsis an optional pointer to an output buffer receiving at most cchUcsMax wide characters (see remarks) of the UCS-2 (or UCS-4 if an object of the wchar_t type is large enough to contain UCS-4 characters) string, the UTF-8 string is converted to. If a runtime-constraint violation occurs, and the pUcs pointer is not NULL, and cchUcsMax is greater than 0 and less or equal to RSIZE_MAX, pUcs[0] is set to null.
[in]cchUcsMaxis a maximal number of wide characters to be written to the pUcs buffer. If the pUcs pointer is NULL the value of cchUcsMax must be zero. If pUcs is not NULL, cchUcsMax must not be greater than RSIZE_MAX nor equal zero. If cchUtf characters of the UTF-8 string do not contain a null character, cchUcsMax must be greater than cchUtf. If any of these conditions are not met, there is a runtime-constraint violation (see below).
[in]pszUtfis a mandatory pointer to a UTF-8 string to be converted into its UCS-2 (or UCS-4) equivalent. No UTF-8 characters that follow a null character (which is converted into a null wide character and, if pUcs is not NULL, stored) will be examined or converted. If the pUcs pointer to the output buffer is NULL, the UTF-8 string must be zero-terminated, because all of its multibyte characters are converted while the value of cchUtf is ignored. If pUcs is not NULL, a maximal number of multibyte characters to be converted is specified by cchUtf. If the UTF-8 string is zero-terminated within the bound of cchUtf characters, the null character is converted and written to the output buffer, and the rest part of the multibyte string is ignored. If cchUtf is greater than the maximal number of characters to write to the output buffer, the input UTF-8 string must be zero-terminated within the bound given by the value of cchUcsMax. Otherwise, there is a runtime-constraint violation.
[in]cchUtfis a maximal number of characters to be written to the buffer pointed to by the pUcs parameter. No more than that number of wide characters of the buffer will be modified. If the pUcs pointer is NULL the value of cchUtf is ignored by the function.
Returns
The function returns zero if no runtime-constraint violation and no encoding error occurred. Otherwise, a corresponding non-zero error code is returned.

The function implements the conversion of the specified UTF-8 multibyte string to a corresponding UCS-2 (or UCS-4 if a wchar_t object can hold UCS-4 values) wide string. The function is built to perform the conversion from the UTF-8 to the UCS format independently of the current locale. The result of the conversion depends upon the size of the wchar_t type. For instance, on Windows sizeof(wchar_t) equals 2 which is not enough to cover all possible Unicode code points. In this case a conversion to the UCS-2 will take place. On the other hand, some Linux compilers define the size of the wchar_t type as 4 which results in UCS-4 based conversion performed by the function.

If a converted UTF-8 code, when converted to wchar_t, results in a loss of data, the function returns EILSEQ setting pStatus to -1.

It is a secure variant of the u8stowcs function. The relation of u8stowcs_s to u8stowcs is similar to one of the analogous mbstowcs_s function to its non-secure mbstowcs counterpart. The secure function is defined by the C11 standard (Annex K) as well as the extension ISO/IEC TR 24731-1 to the C99 standard.

The function verifies adherence to the following runtime-constraints.

  1. Neither pcchConverted nor pszUtf can be a null pointer.
  2. If pUcs is not a null pointer, then neither cchUtf nor cchUcsMax can be greater than RSIZE_MAX.
  3. If pUcs is a null pointer, then cchUcsMax must equal zero.
  4. If pUcs is not a null pointer, then cchUcsMax must not equal zero.
  5. If pUcs is not a null pointer and cchUtf is not less than cchUcsMax, then a null character must occur within the first cchUcsMax multibyte characters of the array pointed to by pszUtf.
  6. If there is a runtime-constraint violation, then the function does the following. If pcchConverted is not a NULL pointer, then pcchConverted is set to (size_t)(-1). If pUcs is not a NULL pointer and cchUcsMax is greater than zero and less than RSIZE_MAX, then pUcs[0] is set to the null wide character.

From these runtime-constraints it follows, that if pUcs is not NULL, the pcchConverted receives an actual number of wide characters written to the pUcs, not counting the terminating null wide character.

Also, unlike the non-secure u8stowcs counterpart, the function verifies that all of the converted UTF-8 codes are valid and can be represented with the wchat_t type.

Warning
The implementation adheres the requirements defined by the C11 standard and the ISO/IEC TR 24731-1 extension of C99 for the analogous mbstowcs_s function, but differs from Microsoft mbstowcs_s which does not conform the standard interface definition.

The differences are the following:

  1. In the case of successful conversion the standard demands that the output value of pcchConverted, i.e. the number of successfully converted characters, should not include the null terminator, whereas Microsoft definition includes it.
  2. In case of an error the standard function does not necessarily set the errno code, while the Microsoft definition always does.
  3. The standard requires that neither cchUcpMax nor cchUtf must be greater than RSIZE_MAX whereas Microsoft does not define this requirement.
  4. The standard considers equality of pcchConverted to NULL a runtime-constraint violation whereas Microsoft does not assert it.
See also
u8stowcs;
wcstou8s_s;
u8towc.