[pan2: 14/23] Guess deliberate line breaks



commit 3741e64e23909d60a21cc072105c832d2cb3b522
Author: K. Haley <haleykd users sf net>
Date:   Sat May 15 00:49:38 2010 -0600

    Guess deliberate line breaks
    
    Guess wrap length in original message be finding max line length.
    If first word on a line could have been put on previous line
    assume it was a deliberate line break.

 pan/usenet-utils/text-massager-test.cc |   16 +++++++++-------
 pan/usenet-utils/text-massager.cc      |   18 +++++++++++++++---
 2 files changed, 24 insertions(+), 10 deletions(-)
---
diff --git a/pan/usenet-utils/text-massager-test.cc b/pan/usenet-utils/text-massager-test.cc
index 3013e95..6043383 100644
--- a/pan/usenet-utils/text-massager-test.cc
+++ b/pan/usenet-utils/text-massager-test.cc
@@ -71,11 +71,12 @@ int main (void)
 "Cybe R. Wizard wrote:\n"
 "\n"
 "> Nice to know it works, right, and that's why I\n"
-"> tried it. I ran SETI home under win95 for a\n"
-"> while but on my Pentium 166 it's not really\n"
-"> worth it.  It took upwards of 500 hours to do\n"
-"> one WU running full time in the background. Will\n"
-"> the Linux version do better???\n"
+"> tried it.\n"
+"> I ran SETI home under win95 for a while but on\n"
+"> my Pentium 166 it's not really worth it.  It\n"
+"> took upwards of 500 hours to do one WU running\n"
+"> full time in the background.\n"
+"> Will the Linux version do better???\n"
 "\n"
 "500 hours seems like an awfully long time to me...\n"
 "I'm running setiathome on all my systems, and on\n"
@@ -86,8 +87,8 @@ int main (void)
 "> that came with my Mandrake 7.2 the Galaxies 2.0\n"
 "> screensaver ran VERY slowly.  I had no real hope\n"
 "> that Codeweaver's wine would do any better but\n"
-"> the thing runs FASTER than under win95. I wonder\n"
-"> why that is...\n"
+"> the thing runs FASTER than under win95.\n"
+"> I wonder why that is...\n"
 "\n"
 "Heh, I remember OS/2 running Windows programs\n"
 "faster than windows did :^)\n"
@@ -99,6 +100,7 @@ int main (void)
 "\n"
 "Jan Eric";
    out = tm.fill (in);
+   std::cout<<out<<std::endl;
    check (out == expected_out);
 
    /* wrap real-world 2 */
diff --git a/pan/usenet-utils/text-massager.cc b/pan/usenet-utils/text-massager.cc
index d34d808..10c7cf7 100644
--- a/pan/usenet-utils/text-massager.cc
+++ b/pan/usenet-utils/text-massager.cc
@@ -111,11 +111,17 @@ namespace
    void merge_fixed (paragraphs_t &paragraphs, lines_t &lines, int wrap_col)
    {
      int prev_content_len = 0;
+     int max_len = wrap_col;
      StringView cur_leader;
      std::string cur_content;
 
      for (lines_cit it=lines.begin(), end=lines.end(); it!=end; ++it)
      {
+       const Line& line (*it);
+       max_len = MAX(max_len, line.leader.len + line.content.len);
+     }
+     for (lines_cit it=lines.begin(), end=lines.end(); it!=end; ++it)
+     {
         const Line& line (*it);
         bool paragraph_end = true;
         bool hard_break = false;
@@ -128,9 +134,15 @@ namespace
            paragraph_end = true;
         }
 
-        // we usually don't want to wrap really short lines
-        if (prev_content_len && prev_content_len<(wrap_col/2))
-           paragraph_end = true;
+        // if first word could have been wrapped onto previous line
+        // line but wasn't assume deliberate line break.
+        if (!paragraph_end && prev_content_len && line.content.len)
+        {
+          int space = max_len - (prev_content_len + line.leader.len) - 1;
+          if ( space > 0 && ((line.content.len < space)
+                              || g_utf8_strchr (line.content.str, space, ' ')) )
+            paragraph_end = true;
+        }
 
         if (paragraph_end) // the new line is a new paragraph, so save old
         {



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]