Skip to content

LibreOffice goes BCP 47

This week I accomplished an important milestone of the major rewrite that – apart from the daily work such as fixing bugs, coding small enhancements and reviewing patches – I'm working on since 9 months or so. In current master LibreOffice finally is able to transparently handle arbitrary (if valid) BCP 47 language tags and fully support the fo:script and *:rfc-language-tag attributes defined in ODF 1.2.

So what does this mean? It means that you'll be able to get your language in.

It means that already supported languages or writing scripts that so far used a kludge to squeeze them into ISO 639 language codes and ISO 3166 country codes only, are finally supported using the proper language tags registered with IANA. For example:

ca-ES-valencia Catalan Valencian
The Valencian variant of Catalan previously used the ca-XV kludge where XV is a reserved for private use ISO 3166 code, which meant it could be used for UI translation purposes but not for document content. This is now stored in ODF as style:rfc-language-tag='ca-ES-valencia' attributes.
sr-Latn Serbian Latin
Previously the deprecated sh kludge was used to differentiate between Serbian Latin and sr Serbian Cyrillic. Serbian Latin in Serbia sr-Latn-RS is now stored in ODF as fo:language='sr' fo:script='Latn' fo:country='RS' attributes.

It also means that a tag en-GB-oed can be and now is already supported, including the corresponding language list entry already being added to the list. This is English, Oxford English Dictionary spelling, which is mandatory for UN documents and as it seems also used for EU documents. LibreOffice will be the first free office suite to support spell-checkers with Oxford English Dictionary spelling along with en-GB and en-US spelling at the same time.

Transparently handle arbitrary tags means that when a document is read that contains language attribution not specifically known to LibreOffice (i.e. does not have an entry in the language list), when positioning the cursor on or selecting such text the language tag is shown in the status bar and in the language list of the character attribution so you will not see Unknown or, even worse, nothing or the system locale's language. If a dictionary was installed that handled such tag then it could be used for spell-checking. Transparently of course also means that the tag will be stored again to ODF when saving the document so the attribution is not lost.

The following screenshot shows an example of a document that uses the tag de-DE-1901 to designate German, German variant, traditional orthography:

Screenshot of LibreOffice displaying a BCP 47 language tag.
Screenshot of LibreOffice displaying a BCP 47 language tag.

 

I'm extremely glad to have this step ready just in time and of course I'll talk about it at the LibreOffice Conference 2013 at Milano, so to get all the details please join me and attend Getting you language in on Thursday, 26 September at 15:30 in Sala Alfa.

LibreOffice Milano Conference 2013 logo

If you are interested in the technical details of BCP 47 language tags I recommend my bookmarks as a starting point.

 

Trackbacks

www.techwarrant.com on : LibreOffice goes BCP 47

Unfortunately, the contents of this trackback can not be displayed.

Comments

Display comments as Linear | Threaded

Olivier R. on :

Hi,

Interesting.

Will there be a translation table for these new tags? As is, it looks unfriendly and it won’t be easy to understand what might be some of them.

erAck on :

The raw tags are only displayed if LibreOffice does not have an entry in its mapping table and language list. For the vast majority of used (and all already previously used) language tags nothing changes, e.g. 'ca-ES-valencia' is displayed as "Catalan Valencian", 'en-US' of course is still "English (USA)" and for 'en-GB-oed' there is an entry "English, OED spelling". Note that 'de-DE-1901' is only an example of a yet not added tag, I'll probably add it because German users might use it quite frequently.

Due to the nature of BCP 47 almost arbitrary combinations of subtags are allowed, plus the grandfathered IANA registered tags, so it is nearly impossible to have a complete translatable list. However, descriptions could be assembled from the individual subtags listed in https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry so for example the grandfathered 'de-DE-1901' would be "German, German variant, traditional orthography", if the same subtags would be assembled from the individual subtags' descriptions it would be something like "German in Germany, traditional orthography". I already thought of this and may implement it in future but I don't have any details yet.

Anonymous on :

Let me know if you need any help with the details of BCP 47 to help you add support for arbitrary tags. I'd like to be able, just to cite one example, to tag content with the not-so-arbitrary "en-emodeng" for Early Modern English, without having to wait for someone to add it to a list. That's the whole point of BCP 47. I understand I won't get spell checkers etc. without some work in the background, but I should be able to tag content as such.

Tegomo germain on :

Good evening,

I'm working on how to create and integrate a new language in LibreOffice, such that a end-user could write a document to that new language.

Note that create a new dictionary is a part of my work, but I want also to create the locale of the new language, such a way that it can appear on language of LibreOffice.

How should I proceed?

erAck on :

Please see https://wiki.documentfoundation.org/LibreOffice_Localization_Guide/Adding_a_New_Language_or_Locale

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

You can use [geshi lang=lang_name [,ln={y|n}]][/geshi] tags to embed source code snippets.
Form options