Re: [xml] Making Sure Output is XML safe



2008/11/25 Danie van der Walt <dvdwalt foneworx co za>:
Hi Elvis

Here is a code snippet:
  const xmlChar *xml = "<foo />";
  xmlDoc *doc = xmlReadMemory(xml, 8, "xml", "UTF-8", 0);
  //xmlDoc *doc = xmlReadMemory(xml, 8, "xml", NULL, 0);
  xmlChar *send_sms_str_xml_safe=NULL;
.
.
.
.
.
  if(mysql_real_query(&gMY_DB,g_statement,strlen(g_statement)))
  {
    ReturnXmlError("System Error");
    fprintf(g_fptr,"Sql ERROR |%s|%s|\n",g_statement,mysql_error(&gMY_DB));
    exit(0);
  }
  g_result = mysql_store_result(&gMY_DB);
  if (mysql_num_rows(g_result)>0)
  {
    printf("<?xml version=\"1.0\"?>\n<sms_api>\n");
    //memset(row,0,sizeof(MYSQL_ROW));
    while((g_row = mysql_fetch_row(g_result))!=NULL)
    {
      printf("  <sms>\n");
      printf("   <sms_id>%s</sms_id>\n",g_row[0]);
      printf("   <status_id>%s</status_id>\n",g_row[1]);
      printf("   <status_text>%s</status_text>\n",g_row[7]);
      if(atoi(give_detail)==1)
      {
        send_sms_str_xml_safe = xmlEncodeEntitiesReentrant(doc, g_row[2]);
        printf("
<short_message>%s</short_message>\n",send_sms_str_xml_safe);
        free(send_sms_str_xml_safe);
        printf("   <source_addr>%s</source_addr>\n",g_row[3]);
        printf("   <destination_addr>%s</destination_addr>\n",g_row[4]);
      }
      printf("   <time_submitted>%s</time_submitted>\n",g_row[5]);
      printf("   <time_processed>%s</time_processed>\n",g_row[6]);
      printf("   <rule>%s</rule>\n",g_row[8]);

      printf("  </sms>\n");

      g_row= NULL;
    }
    mysql_free_result (g_result);
    printf("</sms_api>\n");
  }

..
.
.
.
.
g_row[2] - is a text field, with mysql(utf-8) saved information.

Thanks Danie, but I'm not going to debug your code for you. I did
however make a small test case myself, which works as expected. Maybe
it can help you figure out where you're going wrong:

CREATE DATABASE xml2 CHARACTER SET = utf8;
GRANT ALL ON xml2.* TO xml2 localhost IDENTIFIED BY 'xml2';
CREATE TABLE xml2.test (id INT(10) NOT NULL AUTO_INCREMENT, text
VARCHAR(255) CHARACTER SET utf8, PRIMARY KEY(id));
INSERT INTO xml2.test (text) VALUES ('Hello World™ & and Merry Christmas!');

Code:

#include <mysql.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

int main (int argc, char *argv[])
{
  MYSQL *db;
  MYSQL_RES *result;
  MYSQL_ROW row;
  xmlDoc *doc;
  unsigned long *field_lengths;
  int ret;

  LIBXML_TEST_VERSION

  /* Parse test document */
  doc = xmlReadMemory("<xml/>", 7, "xml", "UTF-8", 0);
  if (doc == NULL)
    exit(1);

  /* Initialize MySQL */
  db = mysql_init(NULL);
  if (db == NULL) {
    fprintf(stderr, "Out of memory!\n");
    exit(1);
  }
  mysql_options(db, MYSQL_SET_CHARSET_NAME, "utf8");

  /* Connect to MySQL server */
  db = mysql_real_connect(db, "localhost", "xml2", "xml2", "xml2",
3306, NULL, 0);
  if (db == NULL) {
    fprintf(stderr, "MySQL: %s\n", mysql_error(db));
    exit(1);
  }

  /* Select all rows from table `test` */
  ret = mysql_query(db, "SELECT * FROM test");
  if (ret != 0) {
    fprintf(stderr, "MySQL: %s\n", mysql_error(db));
    exit(1);
  }

  /* Fetch result */
  result = mysql_use_result(db);
  if (result == NULL) {
    fprintf(stderr, "MySQL: %s\n", mysql_error(db));
    exit(1);
  }

  /*
   * Do entity encoding on strings found in the second field
   */
  while ((row = mysql_fetch_row(result))) {
    field_lengths = mysql_fetch_lengths(result);
    if (field_lengths[1] != 0)
      printf("%s\n", xmlEncodeEntitiesReentrant(doc, BAD_CAST row[1]));
  }

  mysql_free_result(result);
  mysql_close(db);
  xmlCleanupParser();

  return(0);
}

$ gcc -Wall -o test `xml2-config --libs --cflags` `mysql_config --libs
--cflags` test.c
$ ./test
Hello World&#xE2;&#x201E;&#xA2; &amp; and Merry Christmas!

This was with MySQL Ver 14.14 Distrib 5.1.29-rc and libxml2 2.6.32.

Just make sure that g_row[2] is what you think it is, just print it
and examine it. It's basic programming.

Good luck,
Elvis


*********************************************************
Danie van der Walt
FoneWorx
Senior Programmer
Tel : +27112930000
MSN : predetor_me hotmail com
GoogleTalk : predetorlinux gmail com
*********************************************************


Elvis Stansvik wrote:

Hi,

2008/11/25 Danie van der Walt <dvdwalt foneworx co za>:


Hi Elvis

I seem to have run into another problem :(.
I have a mysql database, storing messages in UTF-8, when I select from it
using c/c++ and libmysql.

I get a warning from libxml saying:
    error : xmlEncodeEntitiesReentrant : input not UTF-8

Im not sure if it is the data that is getting returned by mysql, or the
format of the string.


You have to explain a bit better here, or better yet show the code. In
any case, xmlEncodeEntitiesReentrant() is not lying, if it says the
input is not UTF-8, it is not UTF-8. There's nothing libxml2 can do to
help you with that. You need to pass a sequence of NULL-terminated
UTF-8 bytes to the libxml2 library. Period.



When I store the data plain(latin1)  is seems to work. Only problem is I
need to store the
information in UTF-8.


What storage are you talking about here? Your MySQL database? In any
case, no matter if your data is in UTF-8 in your database or not, it
must be UTF-8 when it is passed to libxml2.



Have you come across a similar problem?


No, I have never parsed XML coming from a MySQL database with libxml2,
and it was a long time since I used the MySQL C client library. But I
think that is besides the point. The only thing that matters here is
that you are passing invalid UTF-8 to libxml2, and that will never
work. I really don't think this is a libxml2 question. Just debug your
code and make sure that at the call to xmlEncodeEntitiesReentrant(),
the string you pass is valid UTF-8 bytes, and NULL-terminated.

Good luck,
Elvis



*********************************************************
Danie van der Walt
FoneWorx
Senior Programmer
Tel : +27112930000
MSN : predetor_me hotmail com
GoogleTalk : predetorlinux gmail com
*********************************************************


Elvis Stansvik wrote:

I forgot the footnote:

[1] http://xmlsoft.org/html/libxml-entities.html#xmlEncodeEntitiesReentrant

Elvis

2008/11/14 Elvis Stansvik <elvstone gmail com>:


Hi Danie,

2008/11/5 Danie van der Walt <dvdwalt foneworx co za>:


HI Guys

I hope you can help me.
I'm currently using libxml to parse incomming xml, but simply using printf
to generate my reply xml.

I have one variable that may contain characters that are not xml
safe/friendly like '<' as an example.
Is there anyway that I can parse some text to a function and get a xml
"safe/friendly" output that I can use
in my app.


Use xmlEncodeEntitiesReentrant() [1] to encode entities in a string. Like
this:

#include <stdio.h>

#include <libxml/parser.h>
#include <libxml/entities.h>

int main (int argc, char *argv[])
{
   LIBXML_TEST_VERSION

   const xmlChar *str = "string with < and > in it";
   const xmlChar *xml = "<foo />";

   xmlDoc *doc = xmlReadMemory(xml, 8, "xml", "UTF-8", 0);
   xmlChar *safe_str = xmlEncodeEntitiesReentrant(doc, str);

   printf("%s\n", safe_str);

   xmlFree(safe_str);
   xmlFreeDoc(doc);
   xmlCleanupParser();

   return(0);
}

Note that you need to pass it your document pointer as argument too,
so that it will know about all entities and not just &lt;, &gt; et.c.

Regards,
Elvis



Regards
Danie


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml











[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]