[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ltru] Re: draft-4645bis-03



On Tue, Dec 11, 2007 at 11:47:46PM +0100,
 Frank Ellermann <nobody at xyzzy.claranet.de> wrote 
 a message of 17 lines which said:

> Internet Drafts need to be US-ASCII.  Just pipe
> it through 4645bis.awk to get the UTF-8 version:
> <http://purl.net/xyzzy/home/ltru/4645bis.awk>

Or ncr2utf8.py, attached :-)

I tested the UTF-8 version of 4645bis and everything is OK. It is much
simpler now :-)

unicodechar = satisfy (\thechar -> 
                 let c = (ord thechar) in
                     (c >= 0x21 && c <= 0x10ffff))
       <?> "Character"
#!/usr/bin/python

""" Converts a text file with hexadecimal Numeric Character References
(like &#x153;) to an UTF-8 file"""

import sys
import re

ncr = re.compile("&#x([0-9A-F]+);", re.IGNORECASE)
extension = re.compile("^(.*)\.([a-z0-9_-]+)$", re.IGNORECASE)

def convert(thematch):
    codepoint = long(thematch.group(1), 16)
    return unichr(codepoint)

for ifilename in sys.argv[1:]:
    print "Converting %s..." % ifilename
    match = extension.search (ifilename)
    if match:
        ext_ifile = match.group(2)
        ofilename = match.group(1) + "-utf8." + ext_ifile
    else:
        ofilename = ifilename + "-utf8"
    ifile = open(ifilename, "r")
    ofile = open(ofilename, "w")
    data = unicode(ifile.read(), "ascii")
    udata = re.sub(ncr, convert, data)
    ifile.close()
    ofile.write(udata.encode("utf-8"))
    ofile.close()
    
    
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru

Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.