[xml] C++ SAX interface, high memory usage.



Hi.

I'm using libxml2 to read a big (250MB) XML file using the SAX approach. The code works, but the memory usage is increasing at a rate of about 100 bytes per read record. The file has around 485000 <item xx="yy" zz="pp" ..> tags, so for every 10000 tags read, the application increases memory usage by 1GB. It reaches a limit where the OS kills the application without reaching the end of the XML file.

The most relevant part of the problem seems to be here:

 void StartElementCallback(void * pData,
                          const xmlChar * name,
                          const xmlChar ** attrs) {

  * ((Data *) pData) = Data(); // Data is a struc with many members and a Data() contructor to zero all of them.

  while (NULL != attrs && NULL != attrs[0]) {
    printf("attribute: %s=%s\n",attrs[0],attrs[1]);

    std::ostringstream strStream;

    strStream.str("");
    strStream << attrs[0];
    std::string strAttribute = strStream.str();

    strStream.str("");
    strStream << attrs[1];
    std::string strValue = strStream.str();
   
    ...

    attrs = &attrs[2];
  }
}

  My first thought was that strStream was holding a reference to attrs[n] so I tried copying attrs[n] like this:

 void StartElementCallback(void * pData,
                          const xmlChar * name,
                          const xmlChar ** attrs) {

  * ((Data *) pData) = Data(); // Data is a struc with many members and a Data() contructor to zero all of them.

  while (NULL != attrs && NULL != attrs[0]) {
    printf("attribute: %s=%s\n",attrs[0],attrs[1]);

    char * pKey = strdup(reinterpret_cast<const char*>(attrs[0]));

    std::string strAttribute = std::string(pKey);
    free(pKey);

    char * pValue = strdup(reinterpret_cast<const char*>(attrs[1]));
    std::string strValue = std::string(pValue);
    free(pValue);

    ...

    attrs = &attrs[2];
  }
}

But the problem persisted. So I tried this:

 void StartElementCallback(void * pData,
                          const xmlChar * name,
                          const xmlChar ** attrs) {

  * ((Data *) pData) = Data(); // Data is a struc with many members and a Data() contructor to zero all of them.

  while (NULL != attrs && NULL != attrs[0]) {
    printf("attribute: %s=%s\n",attrs[0],attrs[1]);

    std::string strAttribute = "1";
    std::string strValue = "2";

    ...

    attrs = &attrs[2];
  }
}

And the memory footprint of the program was reduced to 2.2MB, constant value.

Am I doing something wrong? Can you please help me find what is the problem?

Thanks a lot for your time.

Best regards,

Eduardo.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]