Re: [orca-list] making Linux a hospitable place for TTS engines like Voxin



Bill Cox, le lun. 21 déc. 2020 08:46:29 -0800, a ecrit:
I don't mean to disparage the current implementation of modules like espeak.c,
which has much improved over the years.  However, it is simply not portable at
the binary level between Linux distros.

Well, yes, sure, that was never meant to be, and it's not usual for
Linux binaries to be portable across distributions.

  Just run ldd on sd_espeak:
[...]

Note that ldd also shows the subdependencies. To see the real direct
dependencies only, use

objdump -x sd_espeak | grep NEED

which on my debian shows

  NEEDED               libespeak.so.1
  NEEDED               libsndfile.so.1
  NEEDED               libdotconf.so.0
  NEEDED               libglib-2.0.so.0
  NEEDED               libc.so.6
  NEEDED               libltdl.so.7
  NEEDED               libpthread.so.0

The rest that you see are subdependencies of those libraries, not of
sd_espeak itself.

Binary portability appears to have been a non-goal in
speech-dispatcher.

Just like in almost all free software projects, since when you have the
source code you can just recompile it to get things working.

  Is there any chance I can contribute code to speech-dispatcher to
fix this?

I started having a look at making it easy to write a speech dispatcher
module that doesn't use dotconf and glib.

Then by extending the protocol to allow server-side audio, that'd
drop libsndfile and libltdl as well, and even libpthread when the
module doesn't need it for itself. We're then left with the actual
synth (libespeak) and libc.  I have pushed what I have so far in the
main-loop branch.  I basically reimplemented the module protocol with
an MIT licence, which allowed to separate out the protocol parsing from
the dotconf parameter management etc. The idea being that proprietary
modules can link against that implementation to make it easy for them to
create a speech-dispatcher module.

Also, link in libraries statically, such as espeak.a,

Most proprietary modules will not allow this with the current code which
is GPL. That's why I started rewriting the basis of modules with an MIT
licence, which will allow such linking.

I think my prefered approach is to start with 1), and migrate eventually to
2).  That way, users who need binary portability can start to benefit in the
near term while the more complex tasks in 2) are implemented.

That can be an interim solution, yes.

Espeak would be simpler, but some of the other engines that don't
use the new module_utils_* code need a major rewrite.

I don't think they would need a complete rewrite, that can probably be
done progressively.

If folks feel binary portability is a nice goal, but not worth the price (e.g.
not being able to use glib),

modules that are shipped with speech-dispatcher don't pose portability
problems, we can let them use glib etc. For those modules that want
portability, we just need to give them an easy way to do so, that's what
I have been working on.

I think we could move the audio queue into the speech-dispatcher
daemon without making things more complex.

It's not completely that obvious: one still has to transfer the audio
from the module to the server. Not something impossible, but that's
still some additional complexity :)

I am confused about what pitch range is for.

It is the range of pitch that the synth can use to express prosody. That
can be called "expressiveness", which is independent from the desired
pitch base.

I don't think that quite works right now.  For example, the BEGIN message is
not sent to Orca until module_speak returns,

Yes, that's one of the things I plan to fix.

and makes these binarys specific to not only the distro, but the
distro version. 

That, however, is a very convincing argument. Making it simple for
vendors to just ship a binary to a known place, whatever the distro and
version, can simplify things a lot for them.

I am relieved to hear you say that.

Well, I don't think I ever saw that argument raised before, and thus why
it never showed up as a goal of speech-dispatcher.

A TODO for me is to look into sandboxing these shady binaries from TTS vendors
:)

That could be useful indeed.

I would be interested in the task of making sd_espeak binary portable,

I don't really understand why focusing on sd_espeak, which is shipped
with speech-dispatcher. I understand that this can be an interesting
testcase, to make sure that things work, though.

Anything I would have forgotten?

Ha!  We will only know what we forgot when we write the code!

Sure :)

But better ask for opinions before starting writing the code, to avoid
mistakes when we can :)

Samuel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]