Re: [Scrollkeeper-devel] structure of extracted index page

From: Daniel Veillard <veillard redhat com>
To: Dan Mueth <dan eazel com>
Cc: Daniel Veillard <veillard redhat com>, Mary Dwyer <Mary Dwyer Sun COM>, scrollkeeper-devel lists sourceforge net, gnome-doc-list gnome org
Subject: Re: [Scrollkeeper-devel] structure of extracted index page
Date: Thu, 26 Apr 2001 06:16:39 -0400

On Thu, Apr 26, 2001 at 02:24:47AM -0500, Dan Mueth wrote:
> 
> On Thu, 26 Apr 2001, Daniel Veillard wrote:
> 
> > On Wed, Apr 25, 2001 at 10:44:30PM -0500, Dan Mueth wrote:
> > > The other possibility is that instead of trying to refer to an anchor in
> > > the generated HTML, we try to refer to the position in the XML document.  
> > > I really don't know how this would work exactly, since I am not very
> > > familiar with libxml, but it may be possible.  (DV?)
> > 
> >   I'm afraid I didn't follow the discussion here (sorry !), the best
> > way to get a technical  answer from me is to give me a practical example
> > (what's your input, how it's processed, what's the result, why it fails),
> > and then I can use what I know both from the specs and the code to get
> > this answer precisely and quickly,
> 
> Ok.  Let me give this a shot...
> 
> <disclaimer>
> The DTD represented in this email is fictional and any similarity
> with a real DTD is purely coincidental.
> </disclaimer>
> 
> <sect1>
>   <para>
>     This is a sentence.
>   </para>
>   <indexterm>
>     <primary>
>       Sentences
>     </primary>
>   </indexterm>
>   <para>
>     This is another sentence.
>   </para>
> </sect1>
> 
> When this is converted to HTML, we will get an index at the end of the
> document which has a link from an item called "Sentences" to the location
> of the indexterm element above: between the two paragraphs.  Populating
> our document with indexterms yields a helpful index at the end of the
> document :)
> 
> We would like ScrollKeeper to keep an XML data file describing the
> index.  It should list all of the index terms and where they link into the
> document.
> 
> The thing I am not sure about is how we "anchor" the links into the
> document.  If the indexterm had a unique id attribute, we could use
> that.  But our DTD does not require the id attribute be used.
> 
> Is there a nice way we could have an XML representation of the index which
> somehow specifies the anchors for the index term links so that a browser
> (such as the help browser in Nautilus) can link from index terms to
> locations in the XML document?

  If you want to point into the XML then you need to use XPointer,
if you had an ID on the element, say "sentencedef", then the simple
way to addres the subpart is

     #sentencedef

if you don't have such an ID in this case you can use the structured
access method of XPointer in this case one such pointer could be:

     #xpointer(/sect1[1]/indexterm[1]/primary[1])

Libxml has an XPointer implementation

orchis:~/XML -> cat tst.xml
<sect1>
  <para>
    This is a sentence.
  </para>
  <indexterm>
    <primary>
      Sentences
    </primary>
  </indexterm>
  <para>
    This is another sentence.
  </para>
</sect1>

orchis:~/XML -> ./testXPath -xptr -i tst.xml "xpointer(/sect1[1]/indexterm[1]/primary[1])"
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT primary
orchis:~/XML -> 

  XPointer is not (yet) widely deployed, I just happen to be the co-chair
of the working group defining it at W3C, you can get the spec at
    http://www.w3.org/TR/xptr

and I can of course answer questions about it.

  In practice, getting IDs is better because it's more resilient to changes
another improvement would be to have one term per primary tag like this
<primary>sentence</primary>
then a very resilent XPointer would be:

  #xpointer(//indexterm/primary[. = "sentence"])

Basically it instructs to search all indexterm in the document, then
look for primary children and extract the ones where the content is
 "sentence"

  All those queries are actually XPath expressions (used in XSLT too),
you can get more informations at :
   - a generic presentation on XPointer
     http://daniel.veillard.com/Talks/9912XPointer/Overview.html

   - the W3C page on XPointer (at the bottom of the page)
     http://www.w3.org/XML/Linking.html

   - XPointer libxml interfaces:
    http://xmlsoft.org/html/libxml-xpointer.html

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

References:
- Re: [Scrollkeeper-devel] structure of extracted index page
  - From: Daniel Veillard
- Re: [Scrollkeeper-devel] structure of extracted index page
  - From: Dan Mueth

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]