[IronPython] IronPython codec names not compatible with CPython

John Machin sjmachin at lexicon.net
Sun Oct 8 19:23:30 PDT 2006

On 8/10/2006 12:54 PM, John Machin wrote:
> CPython recognises both 'gbk' and 'cp936' i.e. unicode('some string', 
> 'gbk') does what you'd expect.
> IronPython 1.0.1 recognises only 'cp936'.
> CPython recognises 'mac_roman', 'mac_greek', etc.
> IronPython doesn't.
> After a [rare] flash of inspiration, I tried 'cp10000', 'cp10006', etc 
> and IronPython recognises these, which CPython doesn't.
> The "differences" document says: """
> IronPython's _codecs module implementation is incomplete.  There are 
> several replace_error/lookup_error handlers that IronPython does not 
> implement.
> """
> It is not apparent whether this is intended to mean that missing error 
> handlers is the *only* known deficiency.
> IronPython Bug #3214 mentions "import encodings" as fixing a 
> LookupError. Well, you learn something new every day:
> 1. CPython permits one to import encodings, but it's not documented 
> AFAICT, and it's *not* necessary in order to use 'gbk', 'mac_roman', etc.
> 2. After import encodings, IronPython recognises 'mac_roman' and 
> 'mac_greek', but still not 'gbk'.
> How much of the above is bug and how much is feature? What is this 
> mysterious encodings module anyway? Does this mean the CPython test 
> suite doesn't cover the above cases? Are the equivalences (mac_roman, 
> cp10000) etc correct and official? Should I just dump all of the above 
> into the IronPython Issue Tracker?

An update: I had appended
to my IronPython site.py.

Removing that: IronPython doesn't have an encodings module ... so why 
does Bug #3214 say to import it?

Leaving it in:
unicode('\xf0', 'mac_roman') produces the wrong exception:

     exceptions.SystemError: Object reference not set to an instance of 
an object.

unicode('\xf0', 'mac_roman', 'replace') produces the same exception.

And for the curious, the two encodings are not exactly identical:

0xdb: mac_roman u'\xa4', cp_10000 u'\u20ac'
0xf0: mac_roman u'\ufffd', cp_10000 u'\uf8ff'
(the U+FFFD (REPLACEMENT CHARACTER) is what I stuffed into a DIY kludgy 
workaround; U+F8FF is not defined)

I was going to show the names of the characters, using 
unicodedata.name(), but there's no unicodedata module in IronPython (and 
that's not mentioned in the differences file).


More information about the users mailing list