Hi. I'm using libxml2 to read a big (250MB) XML
file using the SAX approach. The code works, but the memory usage is
increasing at a rate of about 100 bytes per read record. The file has
around 485000 <item xx="yy" zz="pp" ..> tags, so for every 10000
tags read, the application increases memory usage by 1GB. It reaches a
limit where the OS kills the application without reaching the end of the
XML file.
The most relevant part of the problem seems to be here: void StartElementCallback(void * pData,
const xmlChar * name, const xmlChar ** attrs) { * ((Data *) pData) = Data(); // Data is a struc with many members and a Data() contructor to zero all of them. while (NULL != attrs && NULL != attrs[0]) { printf("attribute: %s=%s\n",attrs[0],attrs[1]); std::ostringstream strStream; strStream.str(""); strStream << attrs[0]; std::string strAttribute = strStream.str(); strStream.str(""); strStream << attrs[1]; std::string strValue = strStream.str(); ... attrs = &attrs[2]; } } My first thought was that strStream was holding a reference to attrs[n] so I tried copying attrs[n] like this: void StartElementCallback(void * pData,
const xmlChar * name, const xmlChar ** attrs) { * ((Data *) pData) = Data(); // Data is a struc with many members and a Data() contructor to zero all of them. while (NULL != attrs && NULL != attrs[0]) { printf("attribute: %s=%s\n",attrs[0],attrs[1]); char * pKey = strdup(reinterpret_cast<const char*>(attrs[0])); std::string strAttribute = std::string(pKey); free(pKey); char * pValue = strdup(reinterpret_cast<const char*>(attrs[1])); std::string strValue = std::string(pValue); free(pValue); ... attrs = &attrs[2]; } } But the problem persisted. So I tried this: void StartElementCallback(void * pData,
const xmlChar * name, const xmlChar ** attrs) { * ((Data *) pData) = Data(); // Data is a struc with many members and a Data() contructor to zero all of them. while (NULL != attrs && NULL != attrs[0]) { printf("attribute: %s=%s\n",attrs[0],attrs[1]); std::string strAttribute = "1"; std::string strValue = "2"; ... attrs = &attrs[2]; } } And the memory footprint of the program was reduced to 2.2MB, constant value. Am I doing something wrong? Can you please help me find what is the problem? Thanks a lot for your time. Best regards, Eduardo. |