[ocrfeeder/user_documentation] WORK IN PROGRESS: User docs



commit 3bf38bf7d8830e64bd0ac12fb5f96323f0a9902e
Author: Joaquim Rocha <jrocha igalia com>
Date:   Sat Dec 11 02:12:09 2010 +0100

    WORK IN PROGRESS: User docs

 help/C/addingfolder.page               |   17 +++++++
 help/C/addingimage.page                |   26 ++++++++++
 help/C/automaticrecognition.page       |   34 +++++++++++++
 help/C/documentgeneration.page         |   27 +++++++++++
 help/C/importingfromscanner.page       |   26 ++++++++++
 help/C/importingpdf.page               |   27 +++++++++++
 help/C/index.page                      |   38 +++++++++++++++
 help/C/legal.xml                       |    9 ++++
 help/C/manualeditionandcorrection.page |   81 ++++++++++++++++++++++++++++++++
 9 files changed, 285 insertions(+), 0 deletions(-)
---
diff --git a/help/C/addingfolder.page b/help/C/addingfolder.page
new file mode 100644
index 0000000..093ba34
--- /dev/null
+++ b/help/C/addingfolder.page
@@ -0,0 +1,17 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="addingfolder">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Adding all the images from a folder</desc>
+</info>
+
+<title>Adding Folder</title>
+
+<p>Sometimes it is useful to add all the images from a given
+folder. <app>OCRFeeder</app> provides this functionality
+by choosing <guiseq><gui>File</gui><gui>Add Folder</gui></guiseq>.</p>
+
+</page>
diff --git a/help/C/addingimage.page b/help/C/addingimage.page
new file mode 100644
index 0000000..f47339d
--- /dev/null
+++ b/help/C/addingimage.page
@@ -0,0 +1,26 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="addingimage">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <desc>Adding an image to be recognized</desc>
+</info>
+
+<title>Adding An Image</title>
+
+<p>Adding an image to OCRFeeder is usually the first step when
+converting a document.</p>
+
+<p>Each image added represents a page in the final document.
+A thumbnail of the image will be shown in the pages area (left
+area of <app>OCRFeeder</app>).</p>
+
+<p>The order of the pages in the final document will be the
+same as the images' order in the pages' area. This way, pages
+can be reordered by dragging them in the images' thumbnails
+in the pages' area.</p>
+
+<p>You can add an image by clicking
+<guiseq><gui>File</gui><gui>Add Image</gui></guiseq>.</p>
+</page>
diff --git a/help/C/automaticrecognition.page b/help/C/automaticrecognition.page
new file mode 100644
index 0000000..58f4435
--- /dev/null
+++ b/help/C/automaticrecognition.page
@@ -0,0 +1,34 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="automaticrecognition">
+
+<info>
+    <link type="guide" xref="index#recognition"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Automatically recognizing an image</desc>
+</info>
+
+<title>Automatic Recognition</title>
+
+<p><app>OCRFeeder</app> tries to detect the contents in a
+document image and perform OCR over them, also distinguishing
+between what is graphics and what is text. To simplify this
+concept, we call it recognition.</p>
+
+<p>After an image is added it can be automatically recognized
+by clicking
+<guiseq><gui>Document</gui><gui>Recognize Document</gui></guiseq>.</p>
+
+<note style="important"><p>Since there are many different document
+layouts out there, the automatic recognition, mainly the page
+segmentation, may turn out not to be accurate for you document. In this
+case, some manual editing of the recognition results might be needed.
+</p></note>
+
+<note style="warning"><p>The automatic recognition performs some complex
+operations and may take some time depending on the size of the image
+and the complexity of the layout.</p>
+<p>The automatic recognition will replace all the content areas
+in the currently selected page.</p></note>
+
+</page>
diff --git a/help/C/documentgeneration.page b/help/C/documentgeneration.page
new file mode 100644
index 0000000..bbd3abe
--- /dev/null
+++ b/help/C/documentgeneration.page
@@ -0,0 +1,27 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="documentgeneration">
+
+<info>
+    <link type="guide" xref="index#recognition"/>
+    <link type="seealso" xref="automaticrecognition"/>
+    <link type="seealso" xref="manualeditionandcorrection"/>
+    <desc>Creating an editable document</desc>
+</info>
+
+<title>Document Generation</title>
+
+<p><app>OCRFeeder</app> currently generates two document formats:
+<em>ODT</em> and <em>HTML</em>.</p>
+
+<p>After the recognition and eventual manual edition has been
+performed, it is possible to generate a document by clicking
+<guiseq><gui>File</gui><gui>Exportâ?¦</gui></guiseq> and choosing
+the desired document format.</p>
+
+<note style="tip"><p>The HTML exportation generates a folder
+with the document pages represented by one HTML file. In each page
+there are links to go to the previous and next pages. Image content
+areas are stored in a subfolder called <em>images</em>.</p></note>
+
+</page>
diff --git a/help/C/importingfromscanner.page b/help/C/importingfromscanner.page
new file mode 100644
index 0000000..6acf508
--- /dev/null
+++ b/help/C/importingfromscanner.page
@@ -0,0 +1,26 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="importingfromscanner">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Importing from a scanner device</desc>
+</info>
+
+<title>Importing From Scanner</title>
+
+<p>In order to help convert a printed document into
+an editable document, <app>OCRFeeder</app> offers a
+way to import images directly from a scanner device.</p>
+
+<p>To import an image from a scanner device, use the menu
+<guiseq><gui>File</gui><gui>Import Page From Scanner</gui></guiseq>
+or the keyboard shortcut
+<keyseq><key>Ctrl</key><key>Shift</key><key>I</key></keyseq>.</p>
+
+<p>The currently detected scanner device will be used to
+scan the page. If more than one scanner if found, then a dialog
+will be shown with the options to choose from.</p>
+
+</page>
diff --git a/help/C/importingpdf.page b/help/C/importingpdf.page
new file mode 100644
index 0000000..3067340
--- /dev/null
+++ b/help/C/importingpdf.page
@@ -0,0 +1,27 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="importingpdf">
+
+<info>
+    <link type="guide" xref="index#images"/>
+    <link type="seealso" xref="addingimage"/>
+    <desc>Importing PDF documents</desc>
+</info>
+
+<title>Importing PDF</title>
+
+<p>Some documents are nothing more than images placed in a
+PDF document. For cases like this, <app>OCRFeeder</app> can
+still import a PDF document so it can then be converted into
+an editable document.</p>
+
+<p>To import a PDF document, click in
+<guiseq><gui>File</gui><gui>Import PDF</gui></guiseq>.</p>
+
+<p>Each PDF page will be converted to an image and placed
+in the pages' area.</p>
+
+<note style="warning"><p>The PDF conversion can be a demanding
+process and take some time for large PDF files.</p></note>
+
+</page>
diff --git a/help/C/index.page b/help/C/index.page
new file mode 100644
index 0000000..8f02a7f
--- /dev/null
+++ b/help/C/index.page
@@ -0,0 +1,38 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="guide"
+      id="index">
+
+<info>
+    <desc>Help for the <app>OCRFeeder Document Conversion System</app>.</desc>
+    <title type='link'>OCRFeeder Document Conversion System</title>
+    <title type='text'>OCRFeeder Document Conversion System</title>
+    <credit type="author">
+      <name>Joaquim Rocha</name>
+      <email>jrocha igalia com</email>
+    </credit>
+
+    <include href="legal.xml" xmlns="http://www.w3.org/2001/XInclude"; />
+</info>
+
+<title>OCRFeeder Document Conversion System</title>
+<p>OCRFeeder is a document layout analysis and optical character recognition system.</p>
+
+<p>OCRFeeder was created to allow users to easily convert document images
+(for example, a PNG image with text) into editable documents (for example,
+an ODT version with that text).</p>
+
+<p>Given the images it will automatically outline its contents, perform OCR and
+distinguish between what's graphics and text. It generates multiple formats being
+its main one ODT.</p>
+
+<p>This guide will explain you how to configure and use OCRFeeder.</p>
+
+<section id="images" style="2column">
+    <title>Adding Images</title>
+</section>
+
+<section id="recognition" style="2column">
+    <title>Recognition</title>
+</section>
+
+</page>
diff --git a/help/C/legal.xml b/help/C/legal.xml
new file mode 100644
index 0000000..0e59883
--- /dev/null
+++ b/help/C/legal.xml
@@ -0,0 +1,9 @@
+<license xmlns="http://projectmallard.org/1.0/";
+ href="http://creativecommons.org/licenses/by-sa/3.0/us/";>
+<p>This work is licensed under a
+<link href="http://creativecommons.org/licenses/by-sa/3.0/us/";>Creative Commons
+Attribution-Share Alike 3.0 United States License</link>.</p>
+<p>As a special exception, the copyright holders give you permission to copy,
+modify, and distribute the example code contained in this document under the
+terms of your choosing, without restriction.</p>
+</license>
diff --git a/help/C/manualeditionandcorrection.page b/help/C/manualeditionandcorrection.page
new file mode 100644
index 0000000..3e2e5d3
--- /dev/null
+++ b/help/C/manualeditionandcorrection.page
@@ -0,0 +1,81 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="topic"
+      id="manualeditionandcorrection">
+
+<info>
+    <link type="guide" xref="index#recognition"/>
+    <link type="seealso" xref="addingimage"/>
+    <link type="seealso" xref="automaticrecognition"/>
+    <desc>Manual edition and correction of results</desc>
+</info>
+
+<title>Manual Edition</title>
+
+<p>One may want to manually select just a portion of an image to
+be recognized or correct the results of the automatic recognition.
+<app>OCRFeeder</app> lets its users manually edit every aspect of
+a document's contents in an easy way.</p>
+
+<section>
+
+<title>Content Areas</title>
+
+<p>The mentioned document's contents are represented by areas like
+shown in the following image:
+<media type="image" mime="image/png" src="figures/content-areas.png">
+A picture of two content areas with one of them selected.
+</media>
+</p>
+
+<p>The attributes of a selected are shown and can be changed from
+the right part of the main window, like shown in the following image:
+<media type="image" mime="image/png" src="figures/areas-edition.png" width="100px">A
+picture showing the areas' edition UI</media>
+</p>
+
+<p>The following list describes the content areas' attributes:</p>
+<list>
+    <item><p><em>Type</em>: sets the area to be either the type image or text.
+             The image type will clip the area from the original page and
+             place it in the generated document. The text type will use the
+             text assigned to the area and represent it as text in the generated
+             document. (Generated ODT documents will have text boxes when an
+             area was marked as being of the type text)</p></item>
+    <item><p><em>Clip</em>: Shows the current clip from the original area. This makes
+             it easier for users to check exactly what's within the area.</p></item>
+    <item><p><em>Bounds</em>: Shows the point (X and Y) in the original image where the
+             top left corner of the area is placed as well as the areas' width
+             and height.</p></item>
+    <item><p><em>OCR Engine</em>: Lets the user choose an OCR engine and recognize the
+             area's text with by (by pressing the <gui>OCR</gui> button)</p>.
+             <note type="warning"><p>Using the OCR engine to recognize the text
+             will directly assign that text to the area and replace the one
+             assigned before.</p></note></item>
+    <item><p><em>Text Area</em>: Represents the text assigned to that area and lets the
+             user edit it. This area is disabled when the area is of the type
+             image</p></item>
+    <item><p><em>Style Tab</em>: Lets the user choose the font type and size, as well as
+             the text alignment, line and letter spacing.</p></item>
+</list>
+
+<p>The content areas can be selected by clicking on them or by using the menus
+<guiseq><gui>Document</gui><gui>Select Previous Area</gui></guiseq> and
+<guiseq><gui>Document</gui><gui>Select Next Area</gui></guiseq>. There are
+also keyboard shortcuts for these actions:
+<keyseq><key>Ctrl</key><key>Shift</key><key>P</key></keyseq> and
+<keyseq><key>Ctrl</key><key>Shift</key><key>N</key></keyseq>, respectively.</p>
+
+<p>Selecting all areas is also possible using
+<guiseq><gui>Document</gui><gui>Select All Areas</gui></guiseq> or
+<keyseq><key>Ctrl</key><key>Shift</key><key>A</key></keyseq>.</p>
+
+<p>When at least one content area is selected, it is possible to recognize
+their contents automatically or delete them. These actions can be accomplished
+by clicking <guiseq><gui>Document</gui><gui>Recognized Selected Areas</gui></guiseq>
+and <guiseq><gui>Document</gui><gui>Delete Selected Areas</gui></guiseq> (or
+<keyseq><key>Ctrl</key><key>Shift</key><key>Delete</key></keyseq>), respectively.
+</p>
+
+</section>
+
+</page>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]