Zero Copy / CORBA high performance enablers?



Has anyone been looking at implementing zero copy optimizations into ORBit?
Just wondering since zero copies are the topic of the OMG High Performance
enablers and also from the following web page the topic and implemented in
the FreeBSD 4.0 kernel's TCP/IP.    I didn't see anything about Linux
supporting zero copy yet...  But maybe I just didn't look far enough
through my google results.

Any comments?

Thanks,

David Haverkamp


Zero-Copy Sockets
  Conventional TCP/IP communication incurs a high cost to copy data between
kernel buffers and user process virtual memory at the socket layer. This
situation has motivated development of techniques to reduce or eliminate
data copying by page remapping between the user process and the kernel when
size and alignment properties allow [6,4,7]. A page remapping scheme should
preserve the copy semantics of the existing socket interface.


In general, zero-copy optimizations assume MTUs matched to the page size of
the endstation hardware and operating system. Ideally, each packet payload
is an even multiple of the page size, and is stored in buffers that
naturally align on page boundaries. On the receive side, the NIC must
separate the headers and payload into separate buffers, leaving the payload
page-aligned. This can be done with special support to recognize TCP/IP
packets on the NIC, or by constructing receive mbuf chains that
optimistically assume that received packets are TCP packets. In Trapeze,
the sending host explicitly separates header and payload portions of each
packet; the Trapeze driver optimistically assumes that data in the first
mbuf of an outgoing chain is header data, and places its data in the
control message. The link layer preserves this separation on the receiving
side.


We implemented zero-copy TCP/IP extensions at the socket layer in the
FreeBSD 4.0 kernel, using code developed by John Dyson for zero-copy I/O
through the read/write system call interface. The zero-copy extensions
require some buffering support in the network driver, but are otherwise
independent of the underlying network, assuming that it supports
sufficiently large MTUs and page-aligned sends and receives. Section 3
reports results from zero-copy TCP experiments on both Trapeze/Myrinet and
Alteon Gigabit Ethernet hardware.


The page remapping occurs in a variant of the uiomove kernel routine, which
directs the movement of data to and from the process virtual memory for all
variants of the I/O read and write system calls. Our zero-copy socket code
is implemented as a new case alongside Dyson's code in uiomoveco, which is
invoked from socket-layer sosend and soreceive when a process requests the
kernel to transfer a page or more of data to or from a page-aligned user
buffer.


For a zero-copy read, uiomoveco maps kernel buffer pages directly into the
process address space. If the read is from a file, it creates a
copy-on-write mapping to a page in FreeBSD's unified buffer cache; the
copy-on-write preserves the file data in case the user process stores to
the remapped page. In the case of a receiver read from a socket,
copy-on-write is unnecessary because there is no need to retain the kernel
buffer after the read; ordinarily soreceive simply frees the kernel buffers
once the data has been delivered to the user process. The remapping case
instead frees just the mbuf headers and any physical page frames that
previously backed remapped virtual pages in the user buffer. Thus most
receive-side page remappings actually trade page frames between the process
and the kernel buffer pool, preserving equilibrium.


On the send side, copy-on-write is used because the sending process may
overwrite its send buffer once the send is complete. The send-side code
maps each whole page from the user buffer into the kernel address space,
references it with an external mbuf, and marks the page as copy-on-write.
The mbuf chains and their pages are then passed through the TCP/IP stack to
the network driver, which attaches them to outgoing messages as payloads.
When each mbuf is freed on transmit complete, the external free routine
releases the page's copy-on-write mapping. The new socket layer code
handles only anonymous virtual memory pages; we do not support zero-copy
transmission of memory backed by mapped files because this would duplicate
the functionality of the sendfile routine already implemented by David
Greenman.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]