On Tue, 2020-01-28 at 18:21 +0100, JWein wrote:
I worked on corpora research and text cleansing can be done
straightforwardly. The problem is with images, images containing
texts, which
language, ...
Could you point me in the right direction? (I am a mathematician, so
Math is
not a problem for me at all)
 Thank you

You need (1) feature extraction, finding the writing, (2) OCR of some
sort, to turn pictures of letters into letters, and then (3) the
linguistic analysis.

However, many images contain metadata in plain text (OK, XML or
whatever) that may include language and location information.

