Re: [Gimp-user] approches used for language detection on images ...

From: Liam R E Quin <liam holoweb net>
To: JWein <forums gimpusers com>, gimp-user-list gnome org
Subject: Re: [Gimp-user] approches used for language detection on images ...
Date: Wed, 29 Jan 2020 15:07:10 -0500

On Wed, 2020-01-29 at 13:52 +0100, JWein wrote:

You need (1) feature extraction, finding the writing, (2) OCR of
some
sort, to turn pictures of letters into letters, and then (3) the
linguistic Analysis.


 Hey Liam:

Thank you, and yes, I could guess the way to go would be through the
steps you
outline, but I am pretty sure some other gimp developers have trodden
those
paths before and may have some tips to share.


I doubt it.

There _are_ somepeople who use GIMP to clean up images preparatory to
running OCR on them, or have been in the past, but there are much
better programs for that.

I asked you about text cleansing (cleaning) because it has different
meanings in different contexts; i'm *certainly* not interested in
losing the page apparatus or hyphenation information, although in my
own work i mark them so software can skip them whe wanted.

If you're doing an academic study of a book “manifestation” such things
are important, but i had rather use the Text Encoding Initiative as a
model than Michael Hart’s flailing Gutenberg project.

I do the same kinds of things you do


I doubt that, at least from your description, but some of it may be a
language issue in reading the tone of your message. If you are doing
natural language processing and semantic-Web-style text mining your
needs for texts overlap with my personal projects but not so much with
GIMP, which is a bitmap image editor. For example, detecting Greek
words and phrases included in a 30,000 page OCR's text by analyzing the
page images would interest me (and detecting italics for that matter);
if i ever have a spare few days i plan to try the (then) latest
Tesseract for that.

-- 
Liam Quin - web slave for https://www.fromoldbooks.org/

References:
- [Gimp-user] approches used for language detection on images ...
  - From: JWein
- Re: [Gimp-user] approches used for language detection on images ...
  - From: Liam R E Quin
- [Gimp-user] approches used for language detection on images ...
  - From: JWein

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]