chsvlib
chsv helper source code

◆ u8blen() [2/3]

constexpr std::size_t Chusov::String::u8blen ( InputIterator  itSymbol,
std::size_t  cbSymbol = UTF8_MAX_LEN 
)
constexprnoexcept

Returns a byte length of a UTF-8 character addressed by an iterator.

Template Parameters
do_full_checksspecifies whether the function should perform all checks for validity of the input UTF-8 code as specified by the section 3.9 of the Unicode 11.0 standard. If the flag is false, the function only performs basic checks necessary for the operation, i.e. whether cbSymbol is greater or equal to 1 to read the first byte of the UTF-8 code with length-encoding prefix. The default value is true.
throw_on_failureA boolean flag which specifies whether the function should throw an exception when the multi-byte value addressed by itSymbol specifies an invalid or incomplete UTF-8 character. If the flag is false, the function is marked noexcept and in case of failure returns
std::size_t(-1)
. The default value is true.
InputIteratoris a deducible parameter which is a type of itSymbol.
Parameters
itSymbolis an iterator referencing the multi-byte character to obtain the size of. The iterator must meet the InputIterator requirements and its elements must be of the char type.
cbSymbolis a maximal number of bytes, i.e. elements addressed by itSymbol, to inspect. The default value is UTF8_MAX_LEN.
Returns
On success the function returns the number of bytes occupied by the UTF-8 character addressed by itSymbol. On failure, if throw_on_failure is false, the function returns
std::size_t(-1)
.

The function is constexpr when compiled by a C++14 compiler.

Exceptions
Chusov::Exceptions::InvalidCharSequenceExceptionThe multi-byte character given by the parameters is not a valid or complete UTF-8 code. The exception is only thrown when throw_on_failure is true.
Note
Success of a call to the function only guarantees a validity of the UTF-8 character when do_full_checks is true. For performance reasons it might be preferable to avoid excessive full per-byte validation of UTF-8 characters and only do the validation once. To do that one may call the function with do_full_checks set to false and later validate the value using the u8_check function or just call the read_u8_char_data function to do the all the necessary checks, obtain the Unicode code point and the byte length of its UTF-8 representation at once.