[xml] Entity output query

In xmlEncodeEntitiesReentrant() in entities.c, the following
logic appears at line 481 (as of version 2.6.0):

        } else if (*cur >= 0x80) {
            if (((doc != NULL) && (doc->encoding != NULL)) || (html)) {
                /* Simply copy the character */
            } else {
                /* Translate into an entity */

The decision of whether to copy an 8-bit character or turn it
into an entity seems to be:

  If we have an encoding for the output, or if we are generating
  HTML, don't construct an entity.

Thus the probability of turning an 8-bit character into an entity
would seem to be very low.

This strikes me as a rather strange way to make the decision;
when I'm converting to HTML, I'd much rather have the entity than
the raw character.  In fact, the else clause knows how to
generate HTML entities, but will never do so.

I don't know what the "right" logic is; I'm definitely curious
under what circumstances it is desirable to _not_ generate
entities in XML/HTML output.

John R. Daily                                        jdaily progeny com
Director of Technology                            Progeny Linux Systems
                    Master of the ephemeral epiphany

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]