summaryrefslogtreecommitdiffhomepage
path: root/debian/README.Debian
blob: 0a47d0adbb73bb2ab86012f6b85a58fc4e91c8fd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
unicode for Debian
==================

This package is the Debian version of unicode, a C++ library for Unicode encoding.


CLI interface (package unicode-tools)
-------------------------------------

* unicode-recode

  Usage: recode <from-format> <from-file> <to-format> <to-file>
  Format:
      UTF-8       UTF-8
      UTF-16      UTF-16, native endian
      UTF-16LE    UTF-16, little endian
      UTF-16BE    UTF-16, big endian
      UTF-32      UTF-32, native endian
      UTF-32LE    UTF-32, little endian
      UTF-32BE    UTF-32, big endian
      ISO-8859-1  ISO-8859-1 (Latin-1)
      ISO-8859-15 ISO-8859-15 (Latin-9)
  Exit code: 0 if valid, 1 otherwise.

* unicode-validate

  Usage: validate <format> <file>
  Format:
      UTF-8     UTF-8
      UTF-16    UTF-16, big or little endian
      UTF-16LE  UTF-16, little endian
      UTF-16BE  UTF-16, big endian
      UTF-32    UTF-32, big or little endian
      UTF-32LE  UTF-32, little endian
      UTF-32BE  UTF-32, big endian
  Exit code: 0 if valid, 1 otherwise.


C++ interface (package libunicode-dev)
--------------------------------------

Example:

#include <unicode.h>
...

  std::string utf8_value {u8"äöü"};
  std::u16string utf16_value{unicode::convert<char, char16_t>(utf8_value)};

And for C++20:

  std::u8string utf8_value {u8"äöü"};
  std::u16string utf16_value{unicode::convert<char8_t, char16_t>(utf8_value)};

The following encodings are implicitly deducted from types:
  * char resp. char8_t (C++20): UTF-8
  * char16_t: UTF-16
  * char32_t: UTF-32

You can specify different container types directly:
  
  std::deque<char> utf8_value {...};
  std::list<wchar_t> utf16_value{unicode::convert<std::deque<char>, std::list<wchar_t>>(utf8_value)};

Explicit encoding specification is also possible:

  std::string value {"äöü"};
  std::u32string utf32_value{unicode::convert<unicode::ISO_8859_1, unicode::UTF_32>(value)};

Supported encodings are:

  * unicode::UTF_8
  * unicode::UTF_16
  * unicode::UTF_32
  * unicode::ISO_8859_1
  * unicode::ISO_8859_15

Supported basic types:
  * char
  * char8_t (C++20)
  * wchar_t (UTF-16 on Windows, UTF-32 on Linux)
  * char16_t
  * char32_t
  * uint8_t, int8_t
  * uint16_t, int16_t
  * uint32_t, int32_t
  * basically, all basic 8-bit, 16-bit and 32-bit that can encode
    UTF-8, UTF-16 and UTF-32, respectively.

Supported container types:
  * All std container types that can be iterated (vector, list, deque, array)
  * Source and target containers can be different container types

Validation can be done like this:

  bool valid{unicode::is_valid_utf<char16_t>(utf16_value)};

Or via explicit encoding specification:

  bool valid{unicode::is_valid_utf<unicode::UTF_8>(utf8_value)};


Contact
-------

Reichwein IT <mail@reichwein.it>