[xslt] Issues when validating language strings

According to the schema standard xs:language type is defined as follows (see http://www.w3.org/TR/xmlschema11-2/#language):

<xs:simpleType id="language" name="language">
<xs:restriction base="xs:token">
<xs:pattern id="language.pattern" value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"/>

According to that definition if we put 'aa-bb-cc' as a value of xml:lang or any custom attribute of type xs:language it should be accepted as valid one.

However when using libxslt that is not the case. Definition of language type in libxslt permits only one section with - (like aa-bb, dd-cc, en-US etc.). It is as if it uses this regex for validation:


Here are the sample files that demonstrate the issue:

--- test.xml ---
xml version="1.0" encoding="UTF-8" ?>
       <sample language="aa-bb-cc">custom language</sample>
  <sample language="en-US" xml:lang="aa-bb-cc">English US</sample>

--- test.xsd --
<?xml version="1.0"?>
<xs:schema id="rootel"
  <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd" />          
  <xs:element name="rootel">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="sample" nillable="true">
              <xs:extension base="xs:string">
                <xs:attribute name="language" form="unqualified" type="xs:language" />
                <xs:attribute ref="xml:lang" use="optional"/>

Save these files in the same directory and execute following command to demonstrate the issue:

xmllint --noout --schema test.xsd test.xml

Any thoughts? This looks like a bug to me.

Darko Miletic

