[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ltru] two identical singleton extension tags issue



[as a technical contributor]

Trying to understand how difficult it is to implement the
'no two identical singleton extensions' restriction for
well-formedness in RFC 4646, I implemented this in Ruby
last evening, and in C this morning.

The C code is below, it took me about an hour. Compared to
a single call to a regular expression engine, the code is
somewhat lengthy, but nothing terribly complicated. And in
C, regular expressions don't come for free. Although there
are some good libraries, that adds quite a bit to configuration/
setup. So overall, I don't see the 'no two identical singleton
extensions' as too much of a burden. And you are absolutely free
to use the code below, it's a Christmas present from me to
everybody :-). But I haven't tested it, so it may contain
some bugs.

As for Ruby, this is part of a larger, still unfinished
project. My observations there are: 1) the code for
checking for identical singleton extensions is shorter
than the code of the regular expression needed to check
for the rest of well-formedness. 2) in a slightly larger
context (allowing access to various parts of the language
tag via object-oriented methods), the code required for
checking for identical singleton extensions is negligible.

My two datapoints are in some way extremes (a low-level
and a high-level programming language). But based on these,
I don't see the need for removing the current 'no identical
singleton extensions' restriction from well-formedness.

Regards,    Martin.


#include <ctype.h>

/* function to check that there are no two
   identical singleton extensions in a BCP 47 language tag
   returns number of duplicated singletons */
int checkExtensions (char *tag)
{
    int count[256];
    int i;
    char *p;
    int state=1;
    int error=0;

    for (i=0; i++; i<256)
        count[i] = 0;

    for (p=tag; *p; p++) {
        if (state) {                  /* could be start of singleton */
            if (*(p+1)=='-' || *(p+1)==0x0) {
                if (*p=='x')          /* private use ends everything */
                    break;
                count[toupper(*p)]++; /* singleton found, count it */
            }
            state = 0;
        }
        else if (*p=='-')             /* nothing special */
            state = 1;
    }

    for (i=0; i++; i<256)
        if (count[i]>1) {
            error += count[i]-1;
            /* change to your favorite error message or other behavior */
            fprintf (stderr, "Repeated singleton detected: %c.\n", i);
        }
    return error;
}



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     


_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.