# [IronPython] Weird issue with codecs.BOM_UTF8

Michael Foord fuzzyman at voidspace.org.uk
Tue Nov 10 13:31:46 PST 2009

Leonides Saguisag wrote:
> Thank you for taking the time to reply.  Any idea why this would happen in IronPython but not with the standard Python interpreter?  What is weirding me out is that the exact same script behaves differently depending on whether I use IronPython or the standard Python interpreter.
>
Well, if codecs.BOM_UTF8 is set to the empty string (you didn't say if
you have tried this yet?) then it would be due to a bug in IronPython
somewhere - but at least you would know what was causing it.

If it is the empty string, purely speculating, it could be due to the
way the .NET framework treats the BOM at the start of strings. Pure
speculation though - that might not be the problem at all or it could be
caused by something entirely different.

In .NET it would be more normal to check for the BOM with bytes, as by
the time you have a string you have (usually) decoded already.
IronPython 2.X is a bit odd for the .NET framework in this respect.

Michael

> Thanks!
>
> -- Leo
>
> -----Original Message-----
> From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Michael Foord
> Sent: 2009?11?10? 13:17
> To: Discussion of IronPython
> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8
>
> Leonides Saguisag wrote:
>
>> Hi everyone,
>>
>> I am encountering a weird issue with getting to codecs.BOM_UTF8 to work correctly.  I am using SharpDevelop 3.1.
>>
>> Here is the test script that I put together:
>>
>>
>> import sys
>> sys.path.append(r'D:\Python25\Lib')
>> import codecs
>>
>> print sys.version
>> myfile = open(r'D:\Temp\text_file_with_utf8_bom.txt', 'r') lines =
>> myfile.close()
>> if lines[0].startswith(codecs.BOM_UTF8):
>> 	print ('UTF-8 BOM detected!')
>> else:
>> 	print ('UTF-8 BOM not detected!')
>>
>> myfile = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r') lines =
>> myfile.close()
>> if lines[0].startswith(codecs.BOM_UTF8):
>> 	print ('UTF-8 BOM detected!')
>> else:
>> 	print ('UTF-8 BOM not detected!')
>>
>>
>> If I run the executable that I get from SharpDevelop this is what I get:
>> bin\Debug> Test.exe
>> 2.5.0 ()
>> UTF-8 BOM detected!
>> UTF-8 BOM detected!
>>
>>
>> But if I run the same script using the standard python interpreter, this is what I get:
>> bin\Debug> D:\Python25\python.exe ..\..\Program.py
>> 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]
>> UTF-8 BOM detected!
>> UTF-8 BOM not detected!
>>
>>
>> The script works correctly with the standard python interpreter but for some reason is not working right with IronPython.
>>
>> Any ideas what is going wrong?
>>
>>
> I'm not in a position to check right now, but this could happen if codes.UTF8_BOM is set to the empty string.
>
> Michael
>
>
>> Thanks!
>>
>> Best regards,
>> -- Leo
>> _______________________________________________
>> Users mailing list
>> Users at lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>
>>
>
>
> --
> http://www.ironpythoninaction.com/
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>

--
http://www.ironpythoninaction.com/