On 06/07/2018 12:54 AM, Eric S.
Eberhard wrote:
I know I am the oddball here but -- why use DTDs
at
all?
I gave reasons above. I am working on a tool. How people using the
tool is not under my control. Maybe we can focus on the opportunity
to improve libxml2 a bit here.
I supply software to a lot of companies (thousands
through
dealers). Many exchange millions of XML docs per day. I've
used this
since it was libxml. Even have some patches in there. My
application
is proprietary (meaning XML to get an order or tell a customer
our
availability is simply XML I designed and documented and give to
my
customer's customers (via download from a Web page)). Once they
get it
working it pretty much always works. They write software to
create
orders and send them to us -- it is consistent (I know, not
everyone
has this luxury so this may not apply to everyone). So why
check them?
I also found that I was getting a gagillion support tickets
because
DTDs ... simple things like a date ... seem to escape people --
take
June 7, 2018
In our date fields we will take:
Jun 7 2018
June 7 2018
the above with commas and any case (upper/lower/mixed)
6/7/18
6/7/2018
2018/6/7
20180607
180606
06-07-18
And actually many many more. Anything that is a date goes through
this
one routine and if there is any way in the world to extract a
date, we
do.
Ditto money -- say $1,245.56
We accept:
$1,245.56
1245.56
124556 (decimal is implied at 2 places if no decimal
is
found)
1,235.56
And many more - same thing, one routine reads it and if we can
possibly
get a reasonable number, we do.
This, in turn, reduced our CONSTANT support tickets for silly
things
like a format of something to ZERO. Which I like.
Even sicker -- we ignore case on tags. All of our XML is designed
to
not use duplicate names with different cases (stupid thing to do
anyway
-- expect orderNumber and OrderNumber to both be used, as
different
things).
As long as the customer is consistent and the XML is well formed
we
scan the tree and compare tags without regard to case. A WHOLE
LOT
more support tickets gone.
A lot of the people we deal with are not sophisticated. As the
receiver of XML we decided it was much better to be as flexible as
possible and take what we can if at all possible. After all -- a
DTD
can indeed tell you if an address comes in without a city name.
And
reject it and usually generate a support ticket. Since we use an
on-line AVS system (more XML) and if we have the zip and the
address
otherwise matches ... we don't need the city and state ... the AVS
system provides it. And if it fails they will get an error back
from
us (from the application) anyway. So why use a DTD to see if the
city
or state were sent? A LOT MORE support calls removed.
And, of course, performance without the DTDs is much better.
As a result we are able to give documentation to new customers and
they
are able to get it up and running with little to no help. Any
serious
errors we cannot fix are clearly explained in the responses BY THE
APPLICATION and not by a DTD.
Being flexible on our end reduces support tickets which is all I
care.
I would rather code for all the mistakes I can think of an enduser
would make (and we add new ones when they crop up) than be strict
and
do a lot of support. We don't think DTDs are flexible enough.
And I
hate making them :-)
We do offer a page with DTDs they can use manually to check their
document if they like -- or they can send it to our test system.
Once
they are running they seem to do just fine.
As programmers it is hard to believe but sometimes it is better
for us
to make slightly less efficient code in order to make the human
aspect
much more efficient. I once had someone send me a link to a
"contest"
which was a convoluted C statement and asking to solve what the
result
would be. My response -- "fire the programmer!"
If it takes 100s of competent C programmers to get the right
answer
(and only a small percent did) to read a line of code -- it is bad
code. And for people's information, modern computers read ahead
and
pre-execute code based on all kinds of weird logic. Simple C code
is
easy for it to handle ... but convoluted code ends up stopping the
pre-execution and is actually slower -- may have less lines of
code --
but it will be slower. I see nothing wrong with short clear clean
code
with as little craziness as possible. This is the same with XML
-- one
can go overboard easily, K.I.S.S. :-)
Not being so strict and no DTDs has had other benefits -- say EDI
(from
old IBMs) -- we have a cheap program that maps EDI to XML and
back. So
we can handle EDI -- and we don't need new software (after the
conversion). We accept the EDI, convert to XML, run our standard
application, create XML response, which is converted to EDI. The
package we use is low cost and no, it won't work too well with
DTDs as
EDI has it's own problems.
I could go on but most of you have probably skipped this post by
now :-)
E
On 6/6/2018 3:00 PM, Stefan Sauer wrote:
On 05/17/2018 06:01 PM, Stefan Sauer wrote:
On 05/17/2018 04:18 PM, Nick Wellnhofer wrote:
On 16/05/2018 21:51, Stefan Sauer wrote:
So one solution could be another flag to enable this?
Yes, but it would be rather ugly.
In which sense? I guess because it is something that noone should need
to know about or have to care about?
Thanks, reading the code. Need to figure where we could cache external
subsets and what a suitable keys is (ExternalID ?).
Note that I'm currently not planning to review and integrate larger
patches from other developers. I only took over some libxml2
maintenance duties because noone else did. So even if you write a
high-quality patch, it might never get merged.
Thanks for making this clear upfront. This is how I ended up becoming
the gtkdoc maintainer :)
Caching external subsets for XIncludes certainly sounds like a nice
feature but I would prefer to find a simpler solution. For example,
can't you just omit the external DTD from included documents?
Yeah, right now, the benefit of having the DTD is that one can validate
fragments. I'll do some research (aka grepping over existing projects)
to see how the doc-type headers being used today look like. If all that
people do is using an entity to inject the version, I'll write a
migration tool.
We have a test that validates the doc, but I think I can change this to
just resolve all xincludes and check through the top-level doctype.
Just to add to this, I am assuming a lot of people follow this book
http://www.sagehill.net/docbookxsl/ModularDoc.html#UsingXinclude
and using a DOCTYPE is part of the examples.
You wrote:
and gtk-doc will replicate this for the fragments (replacing 'book' with
e.g. 'refentry'). This way one can e.g. inject things like a version.
What do you mean by "inject things like a version"? Why exactly do
your included documents have to reference an external DTD?
The documentation consists of a handwritten master doc (type book), that
includes more handwritten parts (e.g. tutorials, guides) and include
generated reference docs. When gtkdoc generated the reference docs, it
applies takes the doctype header of the master-doc as a template and
uses that for the generated reference docs. If the master doc has
entities declared, those can be expanded in the reference fragments.
Thats the part I will check how widely it is actually used.
Stefan
Another idea is to stop loading external DTDs for XIncludes without an
XPointer _expression_. This would still change the behavior for some
users but it's much less likely to cause problems.
change the behaviour, as in we would not catch validation errors?
Too bad that xmlXIncludeParseFile() does not get the parent parserCtx,
in that case we could apply the same flags'.
Nick
I definitely don't know enough about the implications here. I was mostly
thinking to see if we can stick a dictionary of <dtd-identifier,
xmlDtdPtr> into the Parser Context and before actually loading a dtd,
check if we did already and reuse. Somehow the dict needs to be stored
in the top-level doc, when parsing is done (do we need the dtds once the
doc has been parsed?). We only free the dtds with the top-level doc. But
I agree, it is not going to be a two liner.
It seems that xmldict is only handling key and value to be a string,
right? So, we'll even need out one cache data structure. I'd say it
would need to be on the _xmlXIncludeCtxt level. global is easier, but
then we can't free it ever :/
Stefan
Stefan
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml
--
Eric S. Eberhard
VICS
2933 W Middle Verde Road
Camp Verde, AZ 86322
928-567-3727 work 928-301-7537 cell
http://www.vicsmba.com/index.html (our work)
http://www.vicsmba.com/ourpics/index.html (fun pictures)
|