Re: [orca-list] Copying Math Expressions from Web pages



Hi,


On Sunday,  9 August 2020, at 20:33, Rastislav Kiss via orca-list <orca-list gnome org> wrote:
Hello,
heh, it seems we are in somewhat similar position, although coming from
different backgrounds. I'm using Linux as my main operating system for
about 3 months by now, and I have problems with reading math too.

Sure, it looks like maths accessibility is still a challenge with most
screen-readers. On Windows, I used to rely on NVDA's Access8Math addon
to read maths content. But seems it could not copy maths expressions as
they are said out by speech. 

I may be wrong, I hope someone fixes me in that case, but I don't see a
way to do this without a kind of speech logger either.
But if you want to really do it this way, there is an option. I have
made a project called Chinfusor some time ago:
https://rastisoftslabs.com/2020/07/22/chinfusor-a-universal-solution-for-reading-texts-in-foreign-alphabets-on-linux/

It's a speech module for speech dispatcher, which acts as a middle
layer between clients and speech modules in order to route parts of
text written in various alphabets to specified engines with specified
configuration.

I am Now downloading it to my machine, seems to be a promising project
not only for Maths but even some other content that I come across on the
web where some letters are not voiced out well, I just hear "Chinese
leter, Chinese letter".

But never mind that now, the important thing is, that it's a speech
module, receiving all spoken text from Orca. If you want, I can make a
special build for you, which will allow recording incoming speech
requests with a keyboard shortcut, and copying the recorded text to
clipboard.
Thus, you would be able to copy an equation like:
Press shortcut,
make Orca speak the equation you want to get,
press shortcut again, the collected text will appear in your
clipboard.

I think this is what I am looking for when dealing with lots of formulas
on the web. Though translators may be the best as you point out, but
such a simple approach for me for a start would be okay. In fact, most
of the time what I do is even though I get a formula through  speech, I
usually look for its equivalence in LaTeX if I want to be sure of how it
has to be written down. So for a start, this module promises to be a
lifesaver instead of copying what the speech is saying to the keyboard
by hand like what I am doing right now. Others who may also be engaged
in technical work may find this module useful for their productivity, I
would like to believe so.

Although I would personally most likely go with some kind of translator
to LaTeX or similar format, like:
https://sourceforge.net/projects/xsltml/

Let me check this one out as well. I am interested in compiling as much
info as I can on how maths can be accessed, or better still how we can
script this to do it as end users.


Can you programm in JavaScript? If yes, plugins like TamperMonkey can
make copying MathML notation much more flexible, technically if you
connect it with sheets above, you should be able to get directly a
LaTeX form. I can't confirm quality of the sheets, as I didn't use them
myself, I just saw some people using them, so they'd be my first try if
i wanted a functionality like this.

I have some familiarity with JavaScript as it is used in Nodejs
applications. For the browser, it had been a long time since I wrote it.
But I am willing to learn and devote enough time to find out what needs
to be done. Thanks for pointing out as well.


I personally find the current Orca MathML interpretation quite weak. It
not just doesn't contain a structural mathematical navigation, but even
the basic reading makes quite strange things, such as on this page:
https://openstax.org/books/calculus-volume-2/pages/3-6-numerical-integration

I can confirm that as well, because on one page yesterday, I knew that a
formula must have a square root in it, but it just said "operator" and
"end operator". However, I think it does a better job as its maths
support is passable out of the box as compared to Windows screenreaders
that may need some addon to do just that.


Combined with long pauses between individual equation elements, reading
more complex math formulas can be quite hard.
Thus I'm working on my own MathML parser to do this. I originally
thought, that i will create presentation markup parser along with
content markup parser, but considering the fact that even Firefox
doesn't support content markup yet, I'm currently thinking about
another plan, splitting the project to two parts, what would allow me
to release the thing much sooner than originally expected.

It's not ready yet, but if I finish the tags implementation, I will
need betatesters to check it with various formulas. I want a rock-solid 
parser, I had quite bad experiences with Jaws on Windows, where many
formulas simply got the interpreter stuck, draining cpu to 80 or 90%. I
don't want anything similar on Linux, but that means the program must
be checked in first place. Would you be interested in this?

Sure, I would be interested in testing out this project as it directly
relates to my work. It offers a promise for a better user experience on
*nix platforms. 
Also, if yes, I would also like to ask, what area of mathematics is
your most used? For example, i study artificial intelligence and deep
learning, which is almost completely built on multidimensional
calculus, which could be in theory coverable quite easily, as it
contains rather equations than mathematical statements.

My work is in the social sciences, particularly international politics
and relations. I mostly deal with quantitative research material, which
means that my branch  of maths is statistics.  Of recent though, I had
been trying to dabble into data science as I see it as glorified
statistics with some computer science thrown in to appeal to the
business world, otherwise I see no major difference with traditional
statistics we grew up studying. So in short, I am looking forward to  

However, abstract algebra for example is a different area, which deals
with mathematical proofing, making a rich use of mathematical language,
what means a lot of unicode symbols unknown to speech synthesizers. I
didn't check yet, how many of them are exactly in unicode table, but I
can imagine there are lot of them. Mapping them all will be surely a
drag, which will take somewhat more time than the basic presentation
markup implementation, so if you work in this area, it may require
somewhat more intensive preparations.

Best regards

Rastislav
V Sobota,  8. august 2020 o 13:03 +0200, Ishe Chinyoka napísal(a):

Hi Rastislav,

Thanks for your helpful suggestions regarding the various
technologies
used for mathematical presentation. Actually, I was trying to copy
what
Orca was saying. As an example, the formula for the pooled estimate
is
presented as:
(p1  + p2
)/(n1 + n2)
When I move my cursor to where the expression is, I do not have the
option to copy the formula. I tried pressing the context menu so as
to
bring any other options for that formula, but failed. So what I ended
up
doing was to literally type into my text editor what Orca was saying.
But I feel that this is an inefficient way to deal with lots of
formulas.

However, I think from what you are saying, I am getting a better idea
of
what I have to do: dealing with the HTML source for the page. The
only
setback I see in that is when I am participating in some MOOC courses
like at Coursera, where I have to answer a question after making some
calculations. In that situation, I may have no other way to review
the
formula in time.

I will have to explore those other options like MathML and LaTeX. The
latter is what I often use for my daily work in preparing some
material.
I find that LaTeX is the most accessible format out there when it
comes
to maths. My issue right now is reading maths on Linux, which is now
my
operating system of choice for the past two months.

Thanks once again.

Ishe




On Saturday,  8 August 2020, at 04:05, Rastislav Kiss via orca-list <
orca-list gnome org> wrote:
Hello,
it highly depends on concrete formula you're viewing.

The common html format for math expressions is called MathML. It
allows
webpages to contain math formulas in a xml-like form, which is
supported by all major browsers and can be quite easily rendered.
There are two branches of MathML:
* Presentation markup
* Content markup

The firstone describes math expressions in a rather visual form.
For
example, v^2 is represented as v with superscript 2 and its not
specified, whether the 2 is an index of a vector or a power.
The secondone focuses rather on meaning of expressions and their
individual parts, the previous example would be in this case either
specified explicitly as a power or index.

In practice, both branches are mixed together, so an author can
make
use of both visual and semantic expressivity of MathML. The exact
way
how this is done is... kind of more complicated, you can read more
about it if you want here:
https://www.w3.org/TR/MathML2/chapter5.html

Spoiler: after the first few sections, it gets superboring. :)

While presentation markup can be translated to correct math
notation in
very simple equations, it gets significantly harder with more
complexones, that's the reason why there are not many MathML to
LaTeX
convertors around. Such task needs heuristic, which may but also
may
not work, depending on quality of algorithm and processed formula.
Content markup is ideal for back conversion, as it contains all
necessary informations without distracting elements.
Which markup is used in concrete formula highly depends on used
software, and the fact that they can be used both doesn't help it
much.

Thus for copying these equations, I would most likely use a
mathematical software, which can deal with both markups to give you
the
best possible results. I can't recommend you any specific as I
didn't
need this myself, but there should be few of them around on the
net.

And... if you're lucky, there is a third option. MathML contains a
semantic tag, which can be used to describe various part of a math
expression. For example, a presentation markup with content markup.
But
it can also hold non-MathML content, like LaTeX form of the viewed
equation.
If this is your case, you have won. One program which commonly does
this is Pandoc, it annotates all equations with their LaTeX form,
so
they can be copied very easily.
Selection of the equation should do the job, if not, saving the
page as
txt could, or in worst case, examining the html code, search for
annotation tag.
Sadly I didn't see a common place, where equations would have this
kind
of attachment, but I wasn't really looking for it, so you may find
few,
where it will be available.

Best regards

Rastislav
V Štvrtok,  6. august 2020 o 15:19 +0200, Ishe Chinyoka via orca-
list
napísal(a):
Hi,


I am finding Orca's handling of maths to my liking as it is able
to
say
out all the maths expressions I come across. But I am failing to
copy
those formulas into some application, from Firefox. How do I
accomplish
such a thing as copying a maths expression?

Currently, copying any expression yields the following string at
the
place where the expression should be: "[Math Processing Error]".

TIA,



_______________________________________________
orca-list mailing list
orca-list gnome org
https://mail.gnome.org/mailman/listinfo/orca-list
Orca wiki: https://wiki.gnome.org/Projects/Orca
Orca documentation: https://help.gnome.org/users/orca/stable/
GNOME Universal Access guide: 
https://help.gnome.org/users/gnome-help/stable/a11y.html



_______________________________________________
orca-list mailing list
orca-list gnome org
https://mail.gnome.org/mailman/listinfo/orca-list
Orca wiki: https://wiki.gnome.org/Projects/Orca
Orca documentation: https://help.gnome.org/users/orca/stable/
GNOME Universal Access guide: https://help.gnome.org/users/gnome-help/stable/a11y.html


-- 
---------
I. Chinyoka
The only thing worse than being talked about is not being talked about.
----------
Sent from Mu4E with EMACS 26.3 on Linux


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]