Root
Locales

## Locales

### An innocent question

At work I posed an innocent question about displaying large integers (like 12345678) with commas in the thousands separator (like 12,345,678). As I found out, this was quite a loaded question.

My question had to do with implementing it in Python. I couldn't find a built-in method, so I wrote up this awful one-liner:

```  >>> x = 2493085724309857243980
>>> print ",".join(  [str(x/(10**i))[-3:] for i in range(3*10,1,-3) if x/(10**i)>0] + [str(x)[-3:]]  )
2,493,085,724,309,857,243,980
```

It works for numbers with up to ten commas. Which is big, but not portable. Nor does it work with floats (in fact, it breaks in a fantastic display of numbers.)

I also assumed commas were appropriate. They're not always.

### printf

My question came back with this response:

```  Doesn't python have a printf like function that is handled at the
bytecode level? The best (simplest and efficient) way to do it is the
interpreter/compiler level.
```

So I did some research, and found out how to do it. C's printf has way of separating the thousands place with a comma when the `'` (apostrophe) modifier is applied to to i, d, f, etc. Except Pythons printf parser doesn't understand it!

### The real answer

After some research I found that since each country separates their long numbers differently (Europeans might write 1.234.567,89 while Americans write 1,234,567.89 -- that's why the central question of this text is "loaded"), UNIX provides "locales" to tune standard output of varous things. From locale(7):

```   A  locale is a set of language and cultural rules.  These cover aspects
such as language for messages, different  character  sets,  lexigraphic
conventions,  etc.   A program needs to be able to determine its locale
and act accordingly to be portable to different cultures.
```

Back to Python:

```   >>> import locale
>>> locale.format("%d", 3245452, 1)
'3245452'
```

Oops. It seems that default locale has no thousands separator defined:

```   >>> locale.localeconv()["thousands_sep"]
''
```

... so you have to switch to a locale that does:

```   >>> locale.setlocale(locale.LC_NUMERIC, 'en_US.ISO8859-1')
'en_US.ISO8859-1'
>>> locale.localeconv()["thousands_sep"]
','
```

Now things work:

```   >>> locale.format("%d", 3245452, 1)
'3,245,452'
>>> locale.format("%d", 324545278968968698, 1)
'324,545,278,968,968,698'
```

Doing things in C is semantically identical:

```   #include <stdio.h>
#include <locale.h>

int main(void)
{
int i = 1234567;

/* Print i with default locale, C */
printf("%'15d (%s)\n", i, setlocale(LC_NUMERIC, NULL));

/* Switch locale for numerics, and print i */
setlocale(LC_NUMERIC, "en_US.iso88591");
printf("%'15d (%s)\n", i, setlocale(LC_NUMERIC, NULL));

return 0;
}
```

It outputs:

```         1234567 (C)
1,234,567 (en_US.iso88591)
```

### More

On the topic of internationalization, Joel Spolsky has an essay worth reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

This is `https://michal.guerquin.com/locales.html`, updated 2004-12-02 01:36 EST

Contact: michalg at domain where domain is gmail.com (more)