Re: [Gimp-user] approches used for language detection on images ...



On Tue, 2020-01-28 at 18:21 +0100, JWein wrote:
I worked on corpora research and text cleansing can be done
relatively
straightforwardly. The problem is with images, images containing
texts, which
language, ...
Could you point me in the right direction? (I am a mathematician, so
Math is
not a problem for me at all)
 Thank you

You need (1) feature extraction, finding the writing, (2) OCR of some
sort, to turn pictures of letters into letters, and then (3) the
linguistic analysis.

However, many images contain metadata in plain text (OK, XML or
whatever) that may include language and location information.

I'm interested in the text cleansing, can you tell me more (off list
maybe?)

Thank you!

slave liam

-- 
Liam Quin - web slave for https://www.fromoldbooks.org/
with fabulous vintage art and fascinating texts to read.
Click here to have the slave rewarded with extra work.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]