I don't know whether this kind of spelling is important for
accessibility and if it is worth any efforts now, but I'm just
trying to show that this is in principle not a text-to-text
thing. It can only be reduced to text-to-text in exchange
for a loss of possible quality.

If alpha,bravo,charlie spelling is done in the synthesizer, then we
need a way to tell the synthesizer to speak the text in this mode.  We
don't have this yet.

Once this is defined, then I can implement "military spelling" in
eSpeak, and we can update eSpeak's dictionary files for each language
to add the letter names (or it will default to English if they are not

But this will only work with eSpeak, and not other synthesizers, unless
they are also updated to use this new feature.  If not, then a higher
level such as Speech Dispatcher or Orca will still need to provide this

I still think this is a language-dependent, but not
synthesizer-dependent function.  This probably applies to some other
text normatizations. It's certainly a higher level than basic
text-to-speech generation, and this higher level can be common across
different synthesizers.  Information need not be lost, because the
higher-level processor can use the SSML <break> tag to indicate pauses.
I am happy to implement military spelling in eSpeak, but there is
little point unless it will be used, and with an agreed SSML tag.

