Re: [xml] encoding tutorial draft



I am not sure Igor if you do not expect too few. Of course there are
programmer who never put a thought on the general problem of data
representation in computing, but from time to time every programmer is faced
with the problem, that (s)he needs to know about the actual representation
of data. (This does not relate only to character data but to numbers and of
course the more complex data types). Not knowing what "encoding" is produces
so many traps. I think you can expect, that most of the libxml users DO know
what encoding is and that there are many different ways of representing
character data, since everybody has once walked into one of these traps or
the other.

Sure, you are right. Still, the fact remains, that people have problems believing that the conversion is necessary. Statements like "I have found a bug, libxml misinterprets my strings when they contain äöü" are too common.

I wasn't trying to introduce any reader in C, but to emphasise the necessity of conversion as much as possible.

> And don't forget the people we are talking about are C
> programmers! I emphasize again: C!! Only that they need a small reminder
> perhaps.

C programmers are the targeted audience and every C programmer was a beginner once upon a time. Many posts here could be observed which hint that the poster knows little beyond the first hello-world lesson. But, these beginners in C are also our audience. What better way to learn programming than to write software? Libxml is not reserved just for the expirienced.

More grateful they will be if supplied with a simple way of achieving the
conversion to utf-8. Maybe the smaller code snippet I made for the FAQ
should at least be mentioned in the tutorial, since 80% of all appliances of
conversion will be from iso latin-1 (or some local windows codepage almost
the same) and about 10% the other way round (just a guess and not counting
east asian programmers who know the business of encoding quite well :-):

in = "some null terminated iso latin-1 string";
temp = size = (int)strlen(in)+1; /*terminating null included*/
out_size = size*2-1; /*terminating null is just one byte*/
out = malloc((size_t)out_size); if (!out) {
        if ((ret=isolat1ToUTF8(out, &out_size, in, &temp)) || temp-size) {
                free(out);
                out=NULL;
        }
}


Sure, everything that makes sense to someone is worth mentioning. Each human perceives the information differently and only the readers of a tutorial can judge if it helped them learn something or not.

The fact remains, that libxml is there to process XML, not to transcode data from one format into another.

Ciao
Igor




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]