tracker r2313 - in trunk: . docs/reference/tracker-indexer



Author: mr
Date: Mon Oct  6 12:11:14 2008
New Revision: 2313
URL: http://svn.gnome.org/viewvc/tracker?rev=2313&view=rev

Log:
	* docs/reference/tracker-indexer/Makefile.am:
	* docs/reference/tracker-indexer/adding-categories-and-properties.sgml:
	* docs/reference/tracker-indexer/tracker-indexer-docs.sgml:
	* docs/reference/tracker-indexer/writing-an-extractor.sgml: Added
	Ivan's live.gnome.org documentation so it is available in the
	reference documentation.


Added:
   trunk/docs/reference/tracker-indexer/adding-categories-and-properties.sgml
   trunk/docs/reference/tracker-indexer/writing-an-extractor.sgml
Modified:
   trunk/ChangeLog
   trunk/docs/reference/tracker-indexer/Makefile.am
   trunk/docs/reference/tracker-indexer/tracker-indexer-docs.sgml

Modified: trunk/docs/reference/tracker-indexer/Makefile.am
==============================================================================
--- trunk/docs/reference/tracker-indexer/Makefile.am	(original)
+++ trunk/docs/reference/tracker-indexer/Makefile.am	Mon Oct  6 12:11:14 2008
@@ -65,9 +65,13 @@
 
 # Extra SGML files that are included by $(DOC_MAIN_SGML_FILE)
 content_files =					\
-	version.xml
-
-expand_content_files =
+	version.xml				\
+	adding-categories.sgml			\
+	writing-an-extractor.sgml
+
+expand_content_files =				\
+	adding-categories.sgml			\
+	writing-an-extractor.sgml
 
 # Images to copy into HTML directory
 HTML_IMAGES =

Added: trunk/docs/reference/tracker-indexer/adding-categories-and-properties.sgml
==============================================================================
--- (empty file)
+++ trunk/docs/reference/tracker-indexer/adding-categories-and-properties.sgml	Mon Oct  6 12:11:14 2008
@@ -0,0 +1,168 @@
+<chapter id="adding-categories">
+  <chapterinfo>
+    <author>
+      <firstname>Ivan</firstname>
+      <surname>Frade</surname>
+      <affiliation>
+	<address>
+	  <email>ivan.frade@@nokia.com</email>
+	</address>
+      </affiliation>
+    </author>
+  </chapterinfo>
+
+  <title>Adding Categories &amp; Properties</title>
+
+  <para>
+    A category is something which identifies one or more files with
+    particular attributes, for example, Files, Folders, Audio, Video,
+    etc. These are all categories. This will help you understand how
+    to add a category for your 3rd party application to integrate it
+    with Tracker.
+  </para>
+
+  <section id="defining-categories">
+    <title>Defining Categories</title>
+
+    <para>
+      The essential attributes are: 
+      <programlisting>
+[CategoryName]
+DisplayName=CategoryName
+Description=A description to the user
+Parent=Parent category. Usually Files. [1]
+Mimes=Mime types we want to assign to this category (e.g.
+audio/x-playlist)
+MimePrefixes=Prefix to match mimetypes (e.g. audio/ will match all mime types as audio/*)
+HasMetadata=Should tracker extract meta-data from these files?
+HasFullText=Shoudl tracker extract text content from these files?
+HasThumb= Should tracker generate thumbnails from these files?
+      </programlisting>
+    </para>
+    <note>
+      <para>
+	[1] At the moment we only support two levels of hierarchy.
+      </para>
+    </note>
+
+    <example>
+      <title>A Play List</title>
+      <programlisting>
+[Playlist]
+DisplayName=Playlists
+Description=Music playlists
+Mimes=audio/x-mpegurl
+HasMetadata=true
+HasFullText=false
+HasThumbs=false 
+      </programlisting>
+    </example>
+
+    <para>
+      This is because we want to extract meta-data (i.e. the number of
+      songs) but don't want to index the full text and don't want to
+      generate thumbnails.
+    </para>
+
+    <para>
+      In this case the properties will have the category name as
+      prefix (we will call them Playlist.X). We can add the
+      PropertyPrefix key in the category description to use a
+      different prefix.
+    </para>
+  </section>
+
+  <section id="defining-properties">
+    <title>Defining Properties</title>
+
+    <para>
+      Properties are defined in a "category.metadata" file.
+    </para>
+
+    <para>
+      Add a file in <filename>data/services/</filename> called
+      CategoryName.metadata (don't forget to update the Makefile.am
+      file in the same directory).
+    </para>
+    
+    <para>
+      In this file add the description of the properties you expect to
+      obtain from the extractor. The mandatory information is:
+    </para>
+
+    <example>
+      <title>Mandatory Properties</title>
+      <programlisting>
+[CategoryName:PropertyName]
+DisplayName=Name to show to the user
+Description=Long description of the property
+DataType=(index, keyword, double, integer, date) [2]
+      </programlisting>
+    </example>
+    <example>
+      <title>Properties Recommended</title>
+      <programlisting>
+Weight=Relevance of this property
+Parent=Parent property, usually from default.metadata
+Embedded=The property is embedded on the file or can/will be set by the user or other applications
+      </programlisting>
+    </example>
+    <note>
+      <para>
+	[2] The "index" and "keywords" are indexed as text. The
+	difference is that "index" properties are analyzed (splitted in
+	tokens and steemed), while keywords are indexed as they are. We
+	can use "Filtered=false" combined with "index" data type to
+	avoid steming.
+      </para>
+    </note>
+
+    <para>
+      So, for example, we can add a playlist.metadata file with the
+      following content: 
+    </para>
+	
+    <example>
+      <title>Real Data</title>
+
+      <programlisting>
+[Playlist:TotalLength]
+DisplayName=Length
+Description=Total length of songs in the playlist in seconds
+DataType=integer
+Weight=1
+Filtered=false
+
+[Playlist:Songs]
+DisplayName=Songs
+Description=Numbers of tracks in the playlist
+DataType=integer
+Weight=1
+Filtered=false
+
+[Playlist:Name]
+DisplayName=Name
+Description=Name describing the playlist
+DataType=index
+Weight=10
+Filtered=false
+Embedded=false
+      </programlisting>
+    </example>
+
+    <para>
+      We have added two embedded properties (total length and the
+      number of songs) and a non-embedded property "name". A media
+      player could ask to the user to set a name to the playlist
+      ("Boring music") and save that information in this property.
+    </para>
+  </section>
+
+</chapter>
+
+<!--
+Local variables:
+mode: sgml
+sgml-parent-document: ("tracker-indexer-docs.sgml" "book" "part" "chapter")
+End:
+-->

Modified: trunk/docs/reference/tracker-indexer/tracker-indexer-docs.sgml
==============================================================================
--- trunk/docs/reference/tracker-indexer/tracker-indexer-docs.sgml	(original)
+++ trunk/docs/reference/tracker-indexer/tracker-indexer-docs.sgml	Mon Oct  6 12:11:14 2008
@@ -4,6 +4,8 @@
 <!ENTITY tracker-module SYSTEM "xml/tracker-module.xml">
 <!ENTITY tracker-metadata SYSTEM "xml/tracker-metadata.xml">
 <!ENTITY tracker-metadata-utils SYSTEM "xml/tracker-metadata-utils.xml">
+<!ENTITY adding-categories-and-properties SYSTEM "xml/adding-categories-and-properties.sgml">
+<!ENTITY writing-an-extractor SYSTEM "xml/writing-an-extractor.sgml">
 <!ENTITY version SYSTEM "version.xml">
 ]>
 <book id="index">
@@ -12,13 +14,16 @@
     <releaseinfo>for tracker-indexer &version;</releaseinfo>
   </bookinfo>
   
-  <part id="libtracker-common">
+  <part id="tracker-indexer">
     <title>Overview</title>
     <partintro>
       <para>
-	The tracker-indexer is responsible for indexing all content it is given. The indexer is completely modular, this means that there are modules or 'backends' which are used for indexing specific content types. This reference manual attempts to help developers write 3rd party modules to extract content not commonly supported by Tracker.
-      </para>
-      <para>
+	The tracker-indexer is responsible for indexing all content it
+	is given. The indexer is completely modular, this means that
+	there are modules or 'backends' which are used for indexing
+	specific content types. This reference manual attempts to help
+	developers write 3rd party modules to extract content not
+	commonly supported by Tracker. 
       </para>
     </partintro>
   </part>
@@ -33,4 +38,17 @@
       &tracker-metadata;
       &tracker-metadata-utils;
     </chapter>
+
+    <part id="integrating-applications">
+      <title>Integrating Applications</title>
+
+      <para>
+	This part describes how you can add your own application
+	categories and extractors to Tracker to be able to index (and
+	ultimately track) your content.
+      </para>
+
+      &adding-categories-and-properties;
+      &writing-an-extractor;
+    </part>
 </book>

Added: trunk/docs/reference/tracker-indexer/writing-an-extractor.sgml
==============================================================================
--- (empty file)
+++ trunk/docs/reference/tracker-indexer/writing-an-extractor.sgml	Mon Oct  6 12:11:14 2008
@@ -0,0 +1,167 @@
+<chapter id="writing-an-extractor">
+  <chapterinfo>
+    <author>
+      <firstname>Ivan</firstname>
+      <surname>Frade</surname>
+      <affiliation>
+	<address>
+	  <email>ivan.frade@@nokia.com</email>
+	</address>
+      </affiliation>
+    </author>
+  </chapterinfo>
+
+  <title>Writing An Extractor</title>
+
+  <para>
+  </para>
+
+  <section id="How Extractors Work">
+    <title>How Extractors Work</title>
+
+    <para>
+      The API for the extractors is defined
+      in <filename>src/tracker-extract/tracker-extract.h</filename>.
+    </para>
+    <para>
+      The module must define a <structname>TrackerExtractorData</structname>
+      struct that links mime types with extraction functions. This is
+      returned in a function
+      called <function>tracker_get_extractor_data</function>.
+    </para>
+    
+    <para>
+      The extraction function has the following signature:
+      <programlisting>
+void name_of_the_function (const gchar *filename, 
+                           GHashTable  *metadata)
+      </programlisting>
+    </para>
+
+    <para>
+      ANY extracted meta-data MUST be added to the hash table.
+    </para>
+  </section>
+
+  <section id="how-to-add-an-extractor">
+    <title>How To Add An Extractor</title>
+
+    <para>
+      The extractors are typically named
+      as <filename>tracker-extractor-[document-extension].c</filename>
+      and placed in the <filename>src/tracker-extract/</filename>
+      directory.
+    </para>
+
+    <para>
+      A <filename>dummy.c</filename> extractor is provided
+      (<filename>src/tracker-extract/dummy.c</filename>) to use as
+      base to write new ones. Copy and complete the implementation of
+      this file to create your extractors. Don't forget to add them in
+      the <filename>Makefile.am</filename> file!
+    </para>
+
+    <para>
+      Your extractor can link with external libraries, but please be
+      careful adding new dependencies in tracker (this will be no
+      problem when the extractors are compiled and packaged out of
+      tracker).
+    </para>
+
+    <para>
+      For a play list, a <filename>tracker-extractor-m3u.c</filename>
+      file would be written. In the extraction function it would open
+      the file, count the lines containing a filename and if the line
+      starts with #EXTINF, the sum of the duration of the songs (the
+      #EXTINF line contains the duration and name of the song) would
+      be added.
+    </para>
+  </section>
+
+  <section id="installing-an-extractor">
+    <title>Installing An Extractor</title>
+
+    <para>
+      To install the extractor you need to run:
+      <programlisting>
+	(cd src/tracker-extract; sudo make install)
+      </programlisting>
+    </para>
+
+    <para>
+      You can then test the extract in a standalone fashion using:
+      <programlisting>
+	echo -e "filename\nmimetype" | tracker-extract 
+      </programlisting>
+    </para>
+      
+    <example>
+      <title>Testing Your Extractor</title>
+      <programlisting>
+	echo -e "/tmp/a.m3u\naudio/x-mpegurl" | tracker-extract
+      </programlisting>
+    </example>
+      
+    <para>
+      Once the extractor is working, install tracker (to put the
+      category and property descriptions in the right place).
+    </para>
+
+    <para>
+      Remove <filename>~/.local/share/tracker/data/common.db</filename>.
+      This database in the user home directory is the cache of
+      categories and properties. Right now we don't support automatic
+      upgrade, so you need to remove it by hand. This will be fixed in
+      future releases.
+    </para>
+  </section>
+
+  <section id="testing-an-extractor">
+    <title>Testing An Extractor</title>
+
+    <para>
+      Restart Tracker. We recommend to use the "-v 3" option to log
+      with maximum verbosity to know what is happening. The log is
+      printed on the console, but also in two files:
+      <simplelist>
+	<member><filename>~/.local/share/tracker/trackerd.log</filename></member>
+	<member><filename>~/.local/share/tracker/tracker-indexer.log</filename></member>
+      </simplelist>
+    </para>
+
+    <para>
+      Check these files. You may find messages like "Unrecognized
+      option 'X.Y'" and that can give you clues of what is happening,
+      specifically if something goes wrong (it should never happen if
+      you follow this tutorial).
+    </para>
+
+    <para>
+      You can check that tracker has read the .service and .metadata
+      files correctly using the <filename>tracker-services</filename>
+      utility. With -s flat, it asks to the daemon all available
+      categories. With -p it does the same with properties. Check that
+      your new categories are in that list.
+    </para>
+      
+    <para>
+      You should be able to use <filename>tracker-files</filename>
+      with -s [MyCategory] to check what files have been included in
+      your new category.
+    </para>
+
+    <para>
+      You can try to search with <filename>tracker-search</filename>
+      for some words you know are there (i.e. filename or words in the
+      extracted properties).  
+    </para>
+  </section>
+
+</chapter>
+
+<!--
+Local variables:
+mode: sgml
+sgml-parent-document: ("tracker-indexer-docs.sgml" "book" "part" "chapter")
+End:
+-->



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]