Re: [xml] xmlDocDump() on Windows



On 13 Sep 2007, Daniel Corbe stated:
Why would I run into heap corruption issues unless there's something
blatantly wrong with xmlDocDump.

My understanding of HEAP vs STACK memory is that local variables come from
the stack and global variables along with anything malloc() comes off of the
heap.

This is correct from the point of view of the C abstract machine (except
that it doesn't call the stack anything specifically; it's just
automatic allocation). It's incorrect from the point of view of what the
machine actually does, where (on a Linux box and many/most other ELF
systems):

 - stack variables come off the stack. This is generally strictly limited
   in size, and extremely limited on 32-bit threaded environments because
   of the need to fit all the stacks for all the threads into the address
   space.

 - initialized static variables (both global and local) come from the .data
   section of the executable, which is privately mmap()ped and modifiable.
   This is limited in size only by available memory and address space.

 - uninitialized static variables (both global and local) come from the
   .bss section, which is allocated by the dynamic linker and filled in with
   zeroes. This too is limited in size only by available memory and address
   space.

 - heap allocations are satisfied on nearly all Unixes from an arena
   maintained by the C library, raised and shrunk on demand via the brk()
   system call. Because this is a single contiguous arena, it can suffer
   from fragmentation and overruns. Most C libraries store housekeeping data
   for a block before the start of that block, so underrunning a block can
   corrupt the arena and crash programs on later malloc() or free() calls.
   This is theoretically limited only by available memory and address space,
   but alignment constraints, housekeeping data, and especially fragmentation
   can reduce its effective size substantially as a program runs. On Linux/
   glibc and some other systems, large malloc() allocations are satisfied
   via mmap() directly from the OS, mostly to reduce heap fragmentation.
   (The definition of `large' is changeable by the application and on modern
   glibc versions varies dynamically). Overruns in these areas might cause
   segfaults but will not corrupt other state or cause later crashes in
   other calls.

(Windows's memory allocation models are profoundly different and the last time
I had to deal with them was in the Windows 3.1 days, so anything I could say
would be more misleading than useful. If anyone else wants to describe the
Windows model, feel free.

       Heap memory is essentially limited only by the amount of physical RAM
and virtual memory in your machine

With modern RAM volumes, address space is a more serious constraint on many
applications. I doubt that Windows apps can allocate anything like as much
as 4Gb on a 32-bit platform.

                                    whereas your call stack is generally
limited to 1Mbit per thread by default on Windows.

The amount on Linux 32-bit platforms has varied with time and is
customizable; the default is generally somewhere betwen four and eight
megabytes, IIRC.

The default stack size in most Linux distributions is unlimited or some very
high number

stack size              (kbytes, -s) 8192

            so I could easily see how I may have missed a stack issue.
 Issues with the heap tend to be more visible (in the form of crashes) and
obvious (dereferencing null or uninitialized pointers, reading/writing
out-of-bounds, etc)

If this app runs on Linux too, you might want to try to valgrind it and
see if that spots anything. (valgrind is *very* good at detecting
overruns on the heap, although less good at detecting stack
problems. GCC 4's -fmudflap option might also be useful.

-- 
`Some people don't think performance issues are "real bugs", and I think 
such people shouldn't be allowed to program.' --- Linus Torvalds



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]