UTF – 8 is a variable-length encoding that represents Unicode characters using 1 to 4 bytes. It’s widely used for text storage and transmission due to its compactness and compatibility with ASCII. Wide Characters (wchar_t) is a type that represents a single character in a wide character encoding (usually UTF-16 or UTF-32). The size of wchar_t varies across platforms (e.g., 2 bytes on Windows, 4 bytes on Unix-like systems).
In this article, we’ll explore how to convert between UTF-8 and wide character (wchar_t) strings using the C++ standard library.
Methods to Convert UTF-8 characters to Wide Char in C++There are multiple methods to convert between UTF-8 and wide character (wchar_t) strings using the C++ standard library. Here are few of them:
1. Convert UTF-8 characters to Wide Char using std::wstring_convertstd::wstring_convert is part of the C++11 standard library, defined in the <codecvt> header. It’s a template class that facilitates conversions between different character encodings.
Syntax to Create std::wstring_convertwstring_convert<facet> converter;
where, facet is the codecvt facet for the conversion of the given type of character string to another. For UTF-8 to wchar conversion, it is: codecvt_utf8.
Afterwards, we can use this convertor to convert the given string as shown in the below
Example
C++
// C++ program to convert utf8 to wchar_t using wstring_convert
#include <iostream>
#include <string>
#include <codecvt>
#include <locale>
using namespace std;
int main() {
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Create a wstring_convert object
wstring_convert<codecvt_utf8<wchar_t>> converter;
// Convert UTF-8 string to wide string
wstring wide_str = converter.from_bytes(utf8_str);
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
OutputConverted wide string: Hello, ??
Time Complexity: O(n), where n is the number of characters in the string. Space Complexity: O(1)
2. Convert UTF-8 characters to Wide Char Using std::mbstowcs The std::mbstowcs function is used to convert a multibyte string to a wide character string. It is defined inside <cstdlib> header file.
Syntaxmbstowcs(dest, src, len); where,
- dest: destination string.
- src: source string
- len: length of the string to be converted.
But before using this function, we need to set the locale to a locale that supports UTF-8. We can do that using the following statement:
setlocale(LC_ALL, ""); Example
C++
// C++ program to convert utf8 to wchar_t using mbstowcs
#include <cstdlib>
#include <iostream>
#include <string>
using namespace std;
int main()
{
// Set locale to handle UTF-8 multibyte characters
setlocale(LC_ALL, "");
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Convert UTF-8 string to wide string
wstring wide_str(utf8_str.size(), L'\0');
mbstowcs(&wide_str[0], utf8_str.c_str(),
utf8_str.size());
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
Output
Converted wide string: Hello, 世界 Time Complexity: O(n), where n is the number of characters in the string. Space Complexity: O(1)
3. Convert UTF-8 characters to Wide Char Using MultiByteToWideChar on WindowsIn C++, MultiByteToWideChar() is a Windows API function that converts a string from a multibyte character set to a wide character (Unicode) string. It’s part of the Windows SDK defined inside windows.h header file.
Example
C++
// C++ program to convert utf8 to wchar_t using
// MultiByteToWideChar
#include <iostream>
#include <string>
#include <windows.h>
using namespace std;
int main()
{
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Determine the length of the wide string
int len = MultiByteToWideChar(
CP_UTF8, 0, utf8_str.c_str(), -1, nullptr, 0);
if (len == 0) {
cerr << "Error in MultiByteToWideChar: "
<< GetLastError() << endl;
return 1;
}
// Convert the string
wstring wide_str(len, 0);
MultiByteToWideChar(CP_UTF8, 0, utf8_str.c_str(), -1,
&wide_str[0], len);
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
Output
Converted wide string: Hello, 世界 Time Complexity: O(n), where n is the number of characters in the string. Space Complexity: O(1)
4. Convert UTF-8 characters to Wide Char Using iconv on Unix-like Systemsiconv is a standardized library for converting between character encodings. It’s available on Unix-like systems under iconv.h header file.
Example
C++
// C++ program to convert utf8 to wchar_t using iconv
#include <iconv.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Open iconv descriptor
iconv_t conv = iconv_open("WCHAR_T", "UTF-8");
if (conv == (iconv_t)-1) {
perror("iconv_open");
return 1;
}
// Set up conversion buffers
size_t in_bytes = utf8_str.size();
char* in_buf = const_cast<char*>(utf8_str.c_str());
vector<wchar_t> wide_buf(in_bytes + 1);
char* out_buf
= reinterpret_cast<char*>(wide_buf.data());
size_t out_bytes = wide_buf.size() * sizeof(wchar_t);
// Perform conversion
if (iconv(conv, &in_buf, &in_bytes, &out_buf,
&out_bytes)
== (size_t)-1) {
perror("iconv");
iconv_close(conv);
return 1;
}
// Create wide string
wstring wide_str(wide_buf.data());
// Close iconv descriptor
iconv_close(conv);
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
Output
Converted wide string: Hello, 世界 Time Complexity: O(n), where n is the number of characters in the string. Space Complexity: O(1)
|