[dasher] AlphabetMap::Add public, requires no duplicate keys, Mandarin rehashes CHAlph

From: Patrick Welche <pwelche src gnome org>
To: commits-list gnome org
Cc:
Subject: [dasher] AlphabetMap::Add public, requires no duplicate keys, Mandarin rehashes CHAlph
Date: Tue, 15 Mar 2011 17:12:42 +0000 (UTC)
commit 92ba944d600b51da890d77b7fc73efc460a796ed
Author: Alan Lawrence <acl33 inf phy cam ac uk>
Date:   Mon Feb 21 12:27:42 2011 +0000

    AlphabetMap::Add public, requires no duplicate keys, Mandarin rehashes CHAlph
    
    no reason to prevent other clients manipulating AlphabetMaps w/ non-const ptr!
    
    Duplicate-key check previously only performed for single-octet chars, now all.
    Fixed tajik, czech, latin, polish + portuguese alphs accordingly.
    TODO, this breaks "Pure katakana" totally. What is rationale behind this alph?
    
    Remove empty groups in CAlphIO (fixes Korean).
    
    This would also break usage of CHAlphabet for Mandarin, as has many duplicates;
    hence MandarinAlphMgr makes own AlphabetMap & list of chars w/ distinct unicode.

 Data/alphabets/alphabet.Tajik.xml       |    2 +-
 Data/alphabets/alphabet.czech.xml       |    4 +-
 Data/alphabets/alphabet.latin.xml       |    2 +-
 Data/alphabets/alphabet.polish.xml      |    2 +
 Data/alphabets/alphabet.portuguese.xml  |    2 +-
 Src/DasherCore/Alphabet/AlphIO.cpp      |   15 +++++-
 Src/DasherCore/Alphabet/AlphabetMap.cpp |    8 +--
 Src/DasherCore/Alphabet/AlphabetMap.h   |   46 ++++++++---------
 Src/DasherCore/MandarinAlphMgr.cpp      |   81 +++++++++++++++++--------------
 Src/DasherCore/MandarinAlphMgr.h        |   12 ++++-
 10 files changed, 99 insertions(+), 75 deletions(-)
---
diff --git a/Data/alphabets/alphabet.Tajik.xml b/Data/alphabets/alphabet.Tajik.xml
index ab0ccf8..f24dd67 100644
--- a/Data/alphabets/alphabet.Tajik.xml
+++ b/Data/alphabets/alphabet.Tajik.xml
@@ -57,7 +57,7 @@
 <s d="&#x049F;" t="&#x049F;" />
 <s d="&#x049D;" t="&#x049D;" />
 <s d="&#x043B;" t="&#x043B;" />
-<s d="&#x1D2B;" t="&#x1D2B;" />
+<!--ACL duplicated in uppercase also, unicode title is "small capital letter el"<s d="&#x1D2B;" t="&#x1D2B;" />-->
 <s d="&#x04C6;" t="&#x04C6;" />
 <s d="&#x0459;" t="&#x0459;" />
 <s d="&#x0509;" t="&#x0509;" />
diff --git a/Data/alphabets/alphabet.czech.xml b/Data/alphabets/alphabet.czech.xml
index a4c5817..7175a53 100644
--- a/Data/alphabets/alphabet.czech.xml
+++ b/Data/alphabets/alphabet.czech.xml
@@ -139,7 +139,7 @@
 <s  b="106" d="&#x00AB;" t="&#x00AB;" note="French left double quotation mark" />
 <s  b="107" d="&#x201E;" t="&#x201E;" note="German left double quotation mark; Albanian right" />
 <s  b="108" d="&#x201C;" t="&#x201C;" note="left double quotation mark" />
-<s b="109" d="&#x0022;" t="&#x0022;" note="deprecated vertical double quotation mark" />
+<!--<s b="109" d="&#x0022;" t="&#x0022;" note="deprecated vertical double quotation mark, replaced by following" />-->
 <s b="107" d="&quot;" t="&quot;"/>
 <s b="108" d="&#x201D;" t="&#x201D;" note="right double quotation mark" />
 <s b="109" d="&#x201A;" t="&#x201A;" note="German left single quotation mark; Albanian right" />
@@ -279,7 +279,7 @@
 <s b="106" d="&#x00AB;" t="&#x00AB;" note="French left double quotation mark" />
 <s b="107" d="&#x201E;" t="&#x201E;" note="German left double quotation mark; Albanian right" />
 <s b="108" d="&#x201C;" t="&#x201C;" note="left double quotation mark" />
-<s b="109" d="&#x0022;" t="&#x0022;" note="deprecated vertical double quotation mark" />
+<!--<s b="109" d="&#x0022;" t="&#x0022;" note="deprecated vertical double quotation mark, replaced by following" />-->
 <s b="107" d="&quot;" t="&quot;"/>
 <s b="108" d="&#x201D;" t="&#x201D;" note="right double quotation mark" />
 <s b="109" d="&#x201A;" t="&#x201A;" note="German left single quotation mark; Albanian right" />
diff --git a/Data/alphabets/alphabet.latin.xml b/Data/alphabets/alphabet.latin.xml
index 3af2c66..757d493 100644
--- a/Data/alphabets/alphabet.latin.xml
+++ b/Data/alphabets/alphabet.latin.xml
@@ -228,7 +228,7 @@ Any word that can be written using one of the Latin ISO-8859 character sets (ISO
 <s d="&#x00D8;" t="&#x00D8;" note="CAPITAL LETTER O WITH STROKE"/>
 <s d="&#x0152;" t="&#x0152;" note="CAPITAL LIGATURE OE"/>
 <s d="&#x0166;" t="&#x0166;" note="CAPITAL LETTER T WITH STROKE"/>
-<s d="&#x00DE;" t="&#x00DE;" note="CAPITAL LETTER THORN"/>
+<!--ACL: same as small letter <s d="&#x00DE;" t="&#x00DE;" note="CAPITAL LETTER THORN"/>-->
 </group>
 <!-- http://www.fileformat.info/info/unicode/char/ -->
 <group name="Numbers" b="113">
diff --git a/Data/alphabets/alphabet.polish.xml b/Data/alphabets/alphabet.polish.xml
index d340f9d..bf19f05 100644
--- a/Data/alphabets/alphabet.polish.xml
+++ b/Data/alphabets/alphabet.polish.xml
@@ -146,11 +146,13 @@
 
       <s t="Z" d="Z" />
 
+<!-- ACL removing duplicates. Why are these here?
       <s t="Å¹" d="Å¹" />
 
       <s t="Å»" d="Å»" />
 
       <s t="Z" d="Z" />
+-->
     </group>
 
     <group name="Liczby"   b="113">
diff --git a/Data/alphabets/alphabet.portuguese.xml b/Data/alphabets/alphabet.portuguese.xml
index 39a34a3..02d1370 100644
--- a/Data/alphabets/alphabet.portuguese.xml
+++ b/Data/alphabets/alphabet.portuguese.xml
@@ -130,7 +130,7 @@
 <s d="(" t="("/>
 <s d=")" t=")"/>
 <s d="&#x201C;" t="&#x201C;" note="left double quotation mark" />
-<s d="&#x0022;" t="&#x0022;" note="deprecated vertical double quotation mark" />
+<!--<s d="&#x0022;" t="&#x0022;" note="deprecated vertical double quotation mark, replaced by following" />-->
 <s d="&quot;" t="&quot;"/>
 <s d="&#x201D;" t="&#x201D;" note="right double quotation mark" />
 <s d="&#x2018;" t="&#x2018;" note="left single quotation mark" />
diff --git a/Src/DasherCore/Alphabet/AlphIO.cpp b/Src/DasherCore/Alphabet/AlphIO.cpp
index ac36d7f..3f2f731 100644
--- a/Src/DasherCore/Alphabet/AlphIO.cpp
+++ b/Src/DasherCore/Alphabet/AlphIO.cpp
@@ -670,10 +670,19 @@ void CAlphIO::XML_EndElement(void *userData, const XML_Char *name) {
   }
 
   if(!strcmp(name, "group")) {
-    Me->m_vGroups.back()->iEnd = Me->InputInfo->m_vCharacters.size()+1;
-    //child groups were added (to linked list) in reverse order. Put them in (iStart/iEnd) order...
-    Reverse(Me->m_vGroups.back()->pChild);
+    SGroupInfo *finished = Me->m_vGroups.back();
     Me->m_vGroups.pop_back();
+    finished->iEnd = Me->InputInfo->m_vCharacters.size()+1;
+    if (finished->iEnd == finished->iStart) {
+      //empty group. Delete it now, and elide from sibling chain
+      SGroupInfo *&ptr(Me->m_vGroups.size()==0 ? Me->InputInfo->m_pBaseGroup : Me->m_vGroups.back()->pChild);
+      DASHER_ASSERT(ptr == finished);
+      ptr = finished->pNext;
+      delete finished;
+    } else {
+      //child groups were added (to linked list) in reverse order. Put them in (iStart/iEnd) order...
+      Reverse(finished->pChild);
+    }
     return;
   }
 }
diff --git a/Src/DasherCore/Alphabet/AlphabetMap.cpp b/Src/DasherCore/Alphabet/AlphabetMap.cpp
index f27705e..2017cf6 100644
--- a/Src/DasherCore/Alphabet/AlphabetMap.cpp
+++ b/Src/DasherCore/Alphabet/AlphabetMap.cpp
@@ -181,12 +181,10 @@ void CAlphabetMap::Add(const std::string &Key, symbol Value) {
   }
   Entry *&HashEntry = HashTable[Hash(Key)];
 
-  // Loop through Entries with the correct Hash value.
+  //Loop through Entries with the correct Hash value,
+  // to check the key is not already present
   for(Entry * i = HashEntry; i; i = i->Next) {
-    if(i->Key == Key) {
-      // Add symbol and leave
-	  i->Symbol = Value;
-    }
+    DASHER_ASSERT(i->Key != Key);
   }
 
   // When hash table gets 1/2 full...
diff --git a/Src/DasherCore/Alphabet/AlphabetMap.h b/Src/DasherCore/Alphabet/AlphabetMap.h
index 6bab5de..a35470b 100644
--- a/Src/DasherCore/Alphabet/AlphabetMap.h
+++ b/Src/DasherCore/Alphabet/AlphabetMap.h
@@ -21,25 +21,29 @@
 
 namespace Dasher {
   class CAlphabetMap;
-  class CAlphInfo;
 } 
 
 /// \ingroup Alphabet
 /// \{
 
-/// Class used for fast conversion from training text (i.e. same format as
-/// text output from Dasher...I think, Mandarin / Super-PinYin is probably
-/// an exception and Japanese probably would be too if it worked!) into
-/// Dasher's internal "symbol" indices. One of these is created for the
-/// alphabet (CAlphInfo) currently in use (CAlphInfo is a friend of this
-/// class, to allow creation/setup of the map).
+/// Class used for fast conversion from training text (i.e. catenated
+/// non-display text of symbols; Mandarin / Super-PinYin is a bit more
+/// complicated but still uses one!) into Dasher's internal "symbol" indices.
+/// One of these is created for the alphabet (CAlphInfo) currently in use,
+/// tho there are no restrictions on creation of CAlphabetMaps in other places
+/// (Mandarin!) - or modification, if you have a non-const pointer!
 ///
 /// Ian clearly had reservations about this system, as follows; and I'd add
 /// that much of the fun comes from supporting single unicode characters
-/// which are multiple octets,as we use  std::string (which works in octets)
+/// which are multiple octets, as we use  std::string (which works in octets)
 /// for everything...note that we do *not* support multi-unicode-character
 /// symbols (such as the "asdf" suggested below) except in the case of "\r\n"
 /// for the paragraph symbol.
+///
+/// Note that in 2010 we did indeed tailor this to the alphabet more closely,
+/// fast-casing single-octet characters to avoid using a hash etc. - this makes
+/// many common alphabets substantially faster!
+///
 /// Anyway, Ian writes:
 ///
 /// If I were just using GCC, which comes with the CGI "STL" implementation, I would
@@ -62,21 +66,12 @@ namespace Dasher {
 /// Sorry if this seems really unprofressional.
 /// 
 /// Replacing it might be a good idea. On the other hand it could be customised
-/// to the needs of the alphabet, so that it works faster. For example,
-/// currently if I have a string "asdf", it might be that "a" is checked
-/// then "as" is checked then "asd" is checked. I shouldn't need to keep
-/// rehashing the leading characters. I plan to fix that here. Doing so with
-/// a standard hash_map would be hard.
-/// 
-/// Usage:
-/// CAlphabetMap MyMap(NumberOfEntriesWeExpect); // Can omit NumberOfEntriesWeExpect
-/// MyMap.add("asdf", 15);
-/// symbol i = MyMap.get("asdf") // i=15
-/// symbol j = MyMap.get("fdsa") // j=0
-/// 
+/// to the needs of the alphabet, so that it works faster.
+///
 /// You can't remove items once they are added as Dasher has no need for that.
 /// 
 /// IAM 08/2002
+
 class Dasher::CAlphabetMap {
 
 public:
@@ -104,13 +99,16 @@ public:
   
   void GetSymbols(std::vector<symbol> &Symbols, const std::string &Input) const;
   //SymbolStream *GetSymbols(std::istream &in) const;
-  
-private:
-  friend class CAlphInfo;
+
   CAlphabetMap(unsigned int InitialTableSize = 255);
   void AddParagraphSymbol(symbol Value);
+  
+  ///Add a symbol to the map
+  /// \param Key text of the symbol; must not be present already
+  /// \param Value symbol number to which that text should be mapped
   void Add(const std::string & Key, symbol Value);
-
+  
+private:
   class Entry {
   public:
     Entry(std::string Key, symbol Symbol, Entry * Next)
diff --git a/Src/DasherCore/MandarinAlphMgr.cpp b/Src/DasherCore/MandarinAlphMgr.cpp
index 26a3473..7a1822e 100644
--- a/Src/DasherCore/MandarinAlphMgr.cpp
+++ b/Src/DasherCore/MandarinAlphMgr.cpp
@@ -48,16 +48,21 @@ static char THIS_FILE[] = __FILE__;
 
 CMandarinAlphMgr::CMandarinAlphMgr(CDasherInterfaceBase *pInterface, CNodeCreationManager *pNCManager, const CAlphInfo *pAlphabet, const CAlphabetMap *pAlphMap)
   : CAlphabetManager(pInterface, pNCManager, pAlphabet, pAlphMap),
-    m_pCHAlphabet(pInterface->GetInfo("Chinese ç®?ä½?ä¸æ?? (simplified chinese, in pin yin groups)")),
-    m_pCHAlphabetMap(m_pCHAlphabet->MakeMap()),
     m_pConversionsBySymbol(new set<symbol>[GetAlphabet()->GetNumberTextSymbols()+1]) {
-  //the CHAlphabet contains a group for each SPY syllable+tone, with symbols being chinese characters.
-  // Build a map from SPY to set of chinese chars (note, the same chinese unicode can occur in multiple places;
-  // hence, we represent as unicode not CHAlph symbol number)...
-  map<string,set<string> > conversions;
+      
+  //the CHAlphabet contains a group for each SPY syllable+tone, with symbols being chinese characters.      
+  const CAlphInfo *pCHAlphabet = pInterface->GetInfo("Chinese ç®?ä½?ä¸æ?? (simplified chinese, in pin yin groups)");
+  //Build a map from SPY group label, to set of chinese chars (represented as start & end of group in pCHAlphabet)
+  map<string,pair<symbol,symbol> > conversions;
+  //Dasher's alphabet format means that space and paragraph can't be put into groups,
+  // so put them into their own group, manually, keyed by _symbol_ display text:
+  if (symbol sp = pCHAlphabet->GetSpaceSymbol())
+    conversions[pCHAlphabet->GetDisplayText(sp)]=pair<symbol,symbol>(sp,sp+1);
+  if (symbol para = pCHAlphabet->GetParagraphSymbol())
+    conversions[pCHAlphabet->GetDisplayText(para)]=pair<symbol,symbol>(para,para+1);
   //Non-recursive traversal of all the groups in the CHAlphabet (we don't care where they are, just to find them)
   vector<const SGroupInfo *> groups;
-  groups.push_back(m_pCHAlphabet->m_pBaseGroup);
+  groups.push_back(pCHAlphabet->m_pBaseGroup);
   while (!groups.empty()) {
     const SGroupInfo *pGroup(groups.back()); groups.pop_back();
     if (pGroup->pNext) groups.push_back(pGroup->pNext);
@@ -65,37 +70,41 @@ CMandarinAlphMgr::CMandarinAlphMgr(CDasherInterfaceBase *pInterface, CNodeCreati
     //process this group. The SPY syll+tone is stored as the label, using a tone mark over the vowel, e.g. &#257; = a1
     // such equivalences are recorded in the xml 'name' attribute of the group, but we don't need that.
     if (pGroup->strLabel.length()) {
-      set<string> &chars(conversions[pGroup->strLabel]);
-      DASHER_ASSERT(chars.empty()); //no previous group with same label
-      for (int ch=pGroup->iStart; ch<pGroup->iEnd; ch++)
-        chars.insert(m_pCHAlphabet->GetText(ch));
+      DASHER_ASSERT(conversions.find(pGroup->strLabel)==conversions.end()); //no previous group with same label
+      conversions[pGroup->strLabel] = pair<symbol,symbol>(pGroup->iStart, pGroup->iEnd);
     }
   }
-  //Dasher's alphabet format means that space and paragraph can't be put into groups,
-  // so the above will skip them. Hence, add them using the _symbol_ display text:
-  if (symbol sp = m_pCHAlphabet->GetSpaceSymbol())
-    conversions[m_pCHAlphabet->GetDisplayText(sp)].insert(m_pCHAlphabet->GetText(sp));
-  if (symbol para = m_pCHAlphabet->GetParagraphSymbol())
-    conversions[m_pCHAlphabet->GetDisplayText(para)].insert(m_pCHAlphabet->GetText(para));
 
   //Now: symbols in the primary (SPY) alphabet are syllable+tone, with the string SPY description
   // (using unicode tone marks, e.g. &#257;) in the display text, matching up with the CHAlphabet groups. 
   // (The SPY symbols are arranged in hierarchical groups according to the numbered-tone version, e.g. "a1";
   // but we don't do anything special with those groups, they are just displayed on screen as any normal alphabet).
   //Punctuation is the same way, i.e. PYAlph symbol w/ displaytext "," maps to the CHAlphabel group w/ label ","
+
+  //When we find a group in pCHAlphabet is needed, we add its symbols to m_CH{text,displayText,AlphabetMap}
+  // _only_ if the same unicode character is not already present; thus m_CHtext etc. will be a 1-1 mapping
+  // between indices and actual chinese unicode characters.
+  m_CHtext.push_back(""); m_CHdisplayText.push_back(""); m_CHcolours.push_back(0); //as usual, element 0 is the "unknown symbol"
   std::vector<symbol> vSyms;
   for (symbol i=1; i<=GetAlphabet()->GetNumberTextSymbols(); i++) {
-    set<string> &convs(conversions[m_pAlphabet->GetDisplayText(i)]);
-    DASHER_ASSERT(!convs.empty());
-    //convert each of these chinese unicode characters into a CHAlphabet symbol...
-    for (set<string>::const_iterator it=convs.begin(); it!=convs.end(); it++) {
-      vSyms.clear();
-      m_pCHAlphabetMap->GetSymbols(vSyms, *it);
-      DASHER_ASSERT(vSyms.size()==1 && vSyms[0]!=0); //i.e. conversion is exactly one chinese symbol
-      DASHER_ASSERT(m_pCHAlphabet->GetText(vSyms[0]) == *it);
-      m_pConversionsBySymbol[i].insert(vSyms[0]);
-      //Also the reverse lookup: (valid/used chinese symbol number) -> (pinyin by which it could be produced)
-      m_PinyinByChinese[vSyms[0]].insert(i);
+    DASHER_ASSERT(conversions.find(m_pAlphabet->GetDisplayText(i))!=conversions.end());
+    pair<symbol,symbol> convs(conversions[m_pAlphabet->GetDisplayText(i)]);
+    //for each chinese unicode character in the group, hash it to ensure same unicode = same index into m_CH{text,displayText,AlphabetMap}
+    for (symbol CHsym=convs.first; CHsym<convs.second; CHsym++) {
+      const string &text(pCHAlphabet->GetText(CHsym));
+      int target=m_CHAlphabetMap.Get(text);
+      if (!target) {
+        //unicode char not seen already, allocate new symbol number
+        target = m_CHtext.size();
+        m_CHtext.push_back(text);
+        m_CHdisplayText.push_back(pCHAlphabet->GetDisplayText(CHsym));
+        m_CHcolours.push_back(pCHAlphabet->GetColour(CHsym));
+        m_CHAlphabetMap.Add(text,target);
+      }
+      DASHER_ASSERT(m_CHtext[m_CHAlphabetMap.Get(text)] == text);
+      m_pConversionsBySymbol[i].insert(target);
+      //Also the reverse lookup: (rehashed chinese symbol number) -> (pinyin by which it could be produced)
+      m_PinyinByChinese[target].insert(i);
     }
   }
   //that leaves m_pConversionsBySymbol as desired.
@@ -108,27 +117,27 @@ CMandarinAlphMgr::~CMandarinAlphMgr() {
 void CMandarinAlphMgr::CreateLanguageModel(CEventHandler *pEventHandler, CSettingsStore *pSettingsStore) {
   //std::cout<<"CHALphabet size "<< pCHAlphabet->GetNumberTextSymbols(); [7603]
   std::cout<<"Setting PPMPY model"<<std::endl;
-  m_pLanguageModel = new CPPMPYLanguageModel(pEventHandler, pSettingsStore, m_pCHAlphabet->GetNumberTextSymbols(), m_pAlphabet->GetNumberTextSymbols());
+  m_pLanguageModel = new CPPMPYLanguageModel(pEventHandler, pSettingsStore, m_CHtext.size()-1, m_pAlphabet->GetNumberTextSymbols());
   //our superclass destructor will call ReleaseContext on the iLearnContext when we are destroyed,
   // so we need to put _something_ in there (even tho we don't use it atm!)...
   m_iLearnContext = m_pLanguageModel->CreateEmptyContext();
 }
 
 CTrainer *CMandarinAlphMgr::GetTrainer() {
-  return new CMandarinTrainer(m_pLanguageModel, m_pAlphabetMap, m_pCHAlphabetMap);
+  return new CMandarinTrainer(m_pLanguageModel, m_pAlphabetMap, &m_CHAlphabetMap);
 }
 
 CAlphabetManager::CAlphNode *CMandarinAlphMgr::GetRoot(CDasherNode *pParent, unsigned int iLower, unsigned int iUpper, bool bEnteredLast, int iOffset) {
 
   int iNewOffset(max(-1,iOffset-1));  
   // Use chinese alphabet, not pinyin...
-  pair<symbol, CLanguageModel::Context> p=GetContextSymbols(pParent, iNewOffset, m_pCHAlphabetMap);
+  pair<symbol, CLanguageModel::Context> p=GetContextSymbols(pParent, iNewOffset, &m_CHAlphabetMap);
 
   CAlphNode *pNewNode;
   if (p.first==0 || !bEnteredLast) {
     pNewNode = new CGroupNode(pParent, iNewOffset, iLower, iUpper, "", 0, this, NULL);
   } else {
-    DASHER_ASSERT(p.first>0 && p.first<=m_pCHAlphabet->GetNumberTextSymbols());
+    DASHER_ASSERT(p.first>0 && p.first<m_CHtext.size());
     pNewNode = new CMandSym(pParent, iNewOffset, iLower, iUpper,  "", this, p.first, 0);
   }
   pNewNode->iContext = p.second;
@@ -137,7 +146,7 @@ CAlphabetManager::CAlphNode *CMandarinAlphMgr::GetRoot(CDasherNode *pParent, uns
 }
 
 int CMandarinAlphMgr::GetCHColour(symbol CHsym, int iOffset) const {
-  int iColour = m_pCHAlphabet->GetColour(CHsym);
+  int iColour = m_CHcolours[CHsym];
   if (iColour==-1) {
     //none specified in alphabet
     static int colourStore[2][3] = {
@@ -218,7 +227,7 @@ CMandarinAlphMgr::CMandSym *CMandarinAlphMgr::CreateCHSymbol(CDasherNode *pParen
   // what's right 
 
   int iNewOffset = pParent->offset()+1;
-  if (m_pCHAlphabet->GetText(iCHsym) == "\r\n") iNewOffset++;
+  if (m_CHtext[iCHsym] == "\r\n") iNewOffset++;
   CMandSym *pNewNode = new CMandSym(pParent, iNewOffset, iLbnd, iHbnd, strGroup, this, iCHsym, iPYparent);
   pNewNode->iContext = m_pLanguageModel->CloneContext(iContext);
   m_pLanguageModel->EnterSymbol(pNewNode->iContext, iCHsym);
@@ -325,7 +334,7 @@ void CMandarinAlphMgr::GetConversions(std::vector<pair<symbol,unsigned int> > &v
 }
 
 CMandarinAlphMgr::CMandSym::CMandSym(CDasherNode *pParent, int iOffset, unsigned int iLbnd, unsigned int iHbnd, const std::string &strGroup, CMandarinAlphMgr *pMgr, symbol iSymbol, symbol pyParent)
-: CSymbolNode(pParent, iOffset, iLbnd, iHbnd, pMgr->GetCHColour(iSymbol,iOffset), strGroup+pMgr->m_pCHAlphabet->GetDisplayText(iSymbol), pMgr, iSymbol), m_pyParent(pyParent) {
+: CSymbolNode(pParent, iOffset, iLbnd, iHbnd, pMgr->GetCHColour(iSymbol,iOffset), strGroup+pMgr->m_CHdisplayText[iSymbol], pMgr, iSymbol), m_pyParent(pyParent) {
 }
 
 CDasherNode *CMandarinAlphMgr::CMandSym::RebuildSymbol(CAlphNode *pParent, unsigned int iLbnd, unsigned int iHbnd, const std::string &strGroup, int iBkgCol, symbol iSymbol) {
@@ -399,5 +408,5 @@ void CMandarinAlphMgr::CMandSym::RebuildForwardsFromAncestor(CAlphNode *pNewNode
 
 const std::string &CMandarinAlphMgr::CMandSym::outputText() {
   //use chinese, not pinyin, alphabet...
-  return mgr()->m_pCHAlphabet->GetText(iSymbol);
+  return mgr()->m_CHtext[iSymbol];
 }
diff --git a/Src/DasherCore/MandarinAlphMgr.h b/Src/DasherCore/MandarinAlphMgr.h
index 31dd4bf..7a35a0b 100644
--- a/Src/DasherCore/MandarinAlphMgr.h
+++ b/Src/DasherCore/MandarinAlphMgr.h
@@ -122,8 +122,16 @@ namespace Dasher {
     /// implements 2-phase colour cycling by low-bit of offset (as GetColour).
     int GetCHColour(symbol CHsym, int iOffset) const;
     
-    const CAlphInfo *m_pCHAlphabet;
-    const CAlphabetMap *m_pCHAlphabetMap;
+    /// Texts (multiple-octet but single unicode chars) for chinese characters - every element unique
+    /// Element 0 is blank, for the "unknown symbol" (easiest to store it)
+    std::vector<std::string> m_CHtext;
+    /// Display texts, as per previous
+    std::vector<std::string> m_CHdisplayText;
+    //colour, as per previous
+    std::vector<int> m_CHcolours;
+    /// Map from unicode char to index into m_CH{text,displayText}
+    CAlphabetMap m_CHAlphabetMap;
+    
     ///Indexed by SPY (syll+tone) alphabet symbol number,
     // the set of CHAlphabet symbols it can be converted to.
     std::set<symbol> *m_pConversionsBySymbol;
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]