Skip to content

LibreOffice 4.3 new language tag feature: adding a BCP 47 language tag

In LibreOffice 4.3 the user will be able to specify an arbitrary valid BCP 47 language tag for text attribution.

In character attribution dialogs the language list box (of the Western text font if CJK or CTL are enabled) is now a combo box with an edit field where the user can specify a valid BCP 47 language tag to define a text language attribute if the language she wants to assign is not available from the selectable list. The input is checked against a copy of the IANA language-subtag-registry (transformed to XML), provided either with the liblangtag package of your OS distribution, or the liblangtag shipped with LibreOffice on systems that do not provide one. The internal registry file distributed with LibreOffice 4.3 consists of data as of 2014-04-10.

language tag combo box with sga-Ogam entered
The language tag combo box with sga-Ogam entered, a tag for Irish, Old (to 900) written in Ogham script.

 

For language tag details please see the For users section on the langtag.net web site.

And now jump to the release notes for more nifty features of LibreOffice 4.3 :-)

 

FOSDEM 2014 How to squeeze a language tag into a Locale

The slides of my today's FOSDEM 2014 talk How to squeeze a language tag into a Locale are availabe as PDF now. In the same folder you'll also find the slides of my talk at the LibreOffice Conference at Milano in September 2013 if you're interested in more details of the LibreOffice LanguageTag implementation.

LibreOffice goes BCP 47

This week I accomplished an important milestone of the major rewrite that – apart from the daily work such as fixing bugs, coding small enhancements and reviewing patches – I'm working on since 9 months or so. In current master LibreOffice finally is able to transparently handle arbitrary (if valid) BCP 47 language tags and fully support the fo:script and *:rfc-language-tag attributes defined in ODF 1.2.

So what does this mean? It means that you'll be able to get your language in.

It means that already supported languages or writing scripts that so far used a kludge to squeeze them into ISO 639 language codes and ISO 3166 country codes only, are finally supported using the proper language tags registered with IANA. For example:

ca-ES-valencia Catalan Valencian
The Valencian variant of Catalan previously used the ca-XV kludge where XV is a reserved for private use ISO 3166 code, which meant it could be used for UI translation purposes but not for document content. This is now stored in ODF as style:rfc-language-tag='ca-ES-valencia' attributes.
sr-Latn Serbian Latin
Previously the deprecated sh kludge was used to differentiate between Serbian Latin and sr Serbian Cyrillic. Serbian Latin in Serbia sr-Latn-RS is now stored in ODF as fo:language='sr' fo:script='Latn' fo:country='RS' attributes.

It also means that a tag en-GB-oed can be and now is already supported, including the corresponding language list entry already being added to the list. This is English, Oxford English Dictionary spelling, which is mandatory for UN documents and as it seems also used for EU documents. LibreOffice will be the first free office suite to support spell-checkers with Oxford English Dictionary spelling along with en-GB and en-US spelling at the same time.

Transparently handle arbitrary tags means that when a document is read that contains language attribution not specifically known to LibreOffice (i.e. does not have an entry in the language list), when positioning the cursor on or selecting such text the language tag is shown in the status bar and in the language list of the character attribution so you will not see Unknown or, even worse, nothing or the system locale's language. If a dictionary was installed that handled such tag then it could be used for spell-checking. Transparently of course also means that the tag will be stored again to ODF when saving the document so the attribution is not lost.

The following screenshot shows an example of a document that uses the tag de-DE-1901 to designate German, German variant, traditional orthography:

Screenshot of LibreOffice displaying a BCP 47 language tag.
Screenshot of LibreOffice displaying a BCP 47 language tag.

 

I'm extremely glad to have this step ready just in time and of course I'll talk about it at the LibreOffice Conference 2013 at Milano, so to get all the details please join me and attend Getting you language in on Thursday, 26 September at 15:30 in Sala Alfa.

LibreOffice Milano Conference 2013 logo

If you are interested in the technical details of BCP 47 language tags I recommend my bookmarks as a starting point.