Tip of the day: Exempt your IP address from bans, just in case you or a fellow IRCOp accidentally GLINES you. |
Nick Character Sets
Jump to navigation
Jump to search
In UnrealIRCd you can specify which "character sets" or languages should be allowed in nicknames. You do this in set::allowed-nickchars.
Available UTF8 character sets
UnrealIRCd 4.0.17 and later have experimental support for UTF8 character encoding.
Note that many Services packages do not permit registration with such characters. See also #Important notes.
The following languages are available:
Name | Description | Script/Alphabet | Allowed extra characters (other than the default) |
---|---|---|---|
hebrew-utf8 | hebrew characters | Hebrew script | אבגדהוזחטיךכלםמןנסעףפץצקרשת |
latin-utf8 | latin characters | Latin script | ÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÖØÙÚÛÜÝÞßàáâäåæçèéêëìíîïðñòóôöøùúûüýþÿĂ㥹ĆćČčĎďĘęĚěĹ弾ŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽž |
french-utf8 | french characters | Latin script | ÀÂÇÈÉÊËÎÏÔÙÛÜàâçèéêëîïôùûüÿ |
slovak-utf8 | slovak characters | Latin script | ÁÄÉÍáäéíóôúýČčĎďĹ弾ňŔ੹ŤťŽž |
icelandic-utf8 | icelandic characters | Latin script | ÁÆÍÐÓÖÚÝÞáæíðóöúýþ |
danish-utf8 | danish characters | Latin script | ÅÆØåæø |
swedish-utf8 | swedish characters | Latin script | ÄÅÖäåö |
catalan-utf8 | catalan characters | Latin script | ÀÇÈÉÍÏÒÓÚÜàçèéíïòóú |
italian-utf8 | italian characters | Latin script | ÀÈÉÌÍÒÓÙÚàèéìíòóùú |
spanish-utf8 | spanish characters | Latin script | ÁÉÍÑÓÚÜáéíñóúü |
hungarian-utf8 | hungarian characters | Latin script | ÁÉÍÓÖÚÜáéíóöúüŐőŰű |
czech-utf8 | czech characters | Latin script | ÁÉÍÓÚÝáéíóúýČčĎďĚěŇňŘřŠšŤťŮůŽž |
romanian-utf8 | romanian characters | Latin script | ÂÎâîĂ㪺Ţţ |
swiss-german-utf8 | swiss-german characters | Latin script | ÄÖÜäöü |
german-utf8 | german characters | Latin script | ÄÖÜßäöü |
turkish-utf8 | turkish characters | Latin script | ÇÖÜçöüĞğıŞş |
dutch-utf8 | dutch characters | Latin script | èéëïöü |
polish-utf8 | polish characters | Latin script | ÓóĄąĆćĘꣳŃńŚśŹźŻż |
latvian-utf8 | latvian characters (5.0.7+) | Latin script | |
estonian-utf8 | estonian characters (5.0.7+) | Latin script | |
lithuanian-utf8 | lithuanian characters (5.0.7+) | Latin script | |
greek-utf8 | greek characters | Greek script | ΆΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστ |
ukrainian-utf8 | ukrainian characters | Cyrillic script | ЄІЇАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЮЯабвгдежзийклмнопрстуфхцчшщьюяєіїҐґ |
russian-utf8 | russian characters | Cyrillic script | ЁАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяё |
cyrillic-utf8 | cyrillic characters | Cyrillic script | ЁЄІЇЎАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёєіїўҐґ |
belarussian-utf8 | belarussian characters | Cyrillic script | ЁІЎАБВГДЕЖЗЙКЛМНОПРСТУФХЦЧШЫЬЭЮЯабвгдежзйклмнопрстуфхцчшыьэюяёіў |
Available non-utf8 character sets
Table of all available "old" character sets (not using UTF8):
Name | Description | Character set / encoding |
---|---|---|
catalan | Catalan characters | iso8859-1 (latin1) |
danish | Danish characters | iso8859-1 (latin1) |
dutch | Dutch characters | iso8859-1 (latin1) |
french | French characters | iso8859-1 (latin1) |
german | German characters | iso8859-1 (latin1) |
swiss-german | Swiss-German characters (no es-zett) | iso8859-1 (latin1) |
icelandic | Icelandic characters | iso8859-1 (latin1) |
italian | Italian characters | iso8859-1 (latin1) |
spanish | Spanish characters | iso8859-1 (latin1) |
swedish | Swedish characters | iso8859-1 (latin1) |
latin1 | catalan, danish, dutch, french, german, swiss-german, spanish, icelandic, italian, swedish | iso8859-1 (latin1) |
hungarian | Hungarian characters | iso8859-2 (latin2), windows-1250 |
polish-iso | Polish characters (note that polish-w1250 is more common!) | iso8859-2 (latin2) |
romanian | Romanian characters | iso8859-2 (latin2), windows-1250, iso8859-16 |
latin2 | hungarian, polish-iso, romanian | iso8859-2 (latin2) |
polish-w1250 | Polish characters, windows variant | windows-1250 |
slovak-w1250 | Slovak characters, windows variant | windows-1250 |
czech-w1250 | Czech characters, windows variant | windows-1250 |
windows-1250 | polish-w1250, slovak-w1250, czech-w1250, hungarian, romanian | windows-1250 |
greek | Greek characters | iso8859-7 |
turkish | Turkish characters | iso8859-9 |
russian-w1251 | Russian characters | windows-1251 |
belarussian-w1251 | Belarussian characters | windows-1251 |
ukrainian-w1251 | Ukrainian characters | windows-1251 |
windows-1251 | russian-w1251, belarussian-w1251, ukrainian-w1251 | windows-1251 |
hebrew | Hebrew characters | iso8859-8-I/windows-1255 |
chinese-simp | Simplified Chinese | Multibyte: GBK/GB2312 |
chinese-trad | Tradditional Chinese | Multibyte: GBK |
chinese-ja | Japanese Hiragana/Pinyin | Multibyte: GBK |
chinese | chinese-* | Multibyte: GBK |
gbk | chinese-* | Multibyte: GBK |
Important notes
A few notes:
- The following basic nick characters are always allowed/included: a-z A-Z 0-9 [ \ ] ^ _ - { | }
- Nicknames cannot begin with a number, a / (slash) or a - (hyphen)
- Some combinations can cause problems and will cause an error. For example, combining latin* and chinese-* can not be properly handled by the IRCd and UnrealIRCd will refuse it. Mixing of other charsets might cause display problems. UnrealIRCd will print out a warning if you try to mix latin1/latin2/greek/other incompatible groups.
- Most Services do not permit registration of UTF8 nicks
- Casemapping (if a certain lowercase character belongs to an upper one) is done according to US-ASCII, this means that characters like ö and Ö are not recognized as 'the same' and hence someone can have a nick with álpha and someone else Álpha at the same time. This is a limitation of the current system and IRCd standards. Work is underway at the IRCv3 working group to solve this. People should be aware of this limitation. Note that this limitation already existed in channels (in which nearly any characters have always been available for use, and casemapping was also always performed in US-ASCII).
- There is also no "similar looking character" or "identical looking character" checking. In particular if you enable cyrillic script (eg: russian-utf8) then characters such as cyrillic A and latin A will look the same. This could be abused to impersonate another user by using the identical looking character.
Examples
Western languages
For people in Europe and other Latin language countries:
set { allowed-nickchars { latin-utf8; }; };
Or, to use the old latin1 characters in western europe:
set { allowed-nickchars { latin1; }; };
Chinese language
This allows nick names to contain both Simplified Chinese and Traditional Chinese characters (GBK encoding):
set { allowed-nickchars { chinese-simp; chinese-trad; }; };