Re: mmap and SIGBUS



Pavel Roskin wrote:
> > I like SIGBUS because it can be used for error detection and recovery.
> > (When it is implemented correctly and returns the address in
> > siginfo->si_addr; not all the Linux targets do this properly, in fact
> > i386 was one that did not pass a test of this)
> 
> I believe that it's better if mmap just failed instead of returning
> success and allocating memory that cannot be accessed.  Maybe there is a
> flag that would cause mmap to fail?

It is not possible to implement such a flag.  The failure happens when a
page is read in, and either the remote file server reports an error (at
read time, not open time), or there is an error reading the page from
disk.

So for example you can mmap a file from floppy, view it and remove the
floppy.  Then later as you scroll through the viewed file, you will
receive a SIGBUS.

> > And I like mmap, not just because of the file size thing but because it
> > should provide the fastest parsers and reduce VM page duplication.  (I
> > use it for text parsing, and so does GCC these days).  SIGBUS is an
> > essential ingredient in fast parsers, if they are to recover on
> > problematic filesystems.  Fortunately, recovery from SIGBUS _is_
> > possible when it's required.
> 
> I'm not very experienced in such matters, but I understand that the only
> way to recover after SIGBUS and SIGSEGV is to use longjpm in the signal
> handler.  Otherwise, the same instruction is executed, and if it's
> something like "movl (%edx), %eax", there is no way to prevent it from
> failing over and over again.  Is it correct?

No, it is possible to change the mapping and then return from the signal
handler.  The same instruction can then proceed without raising a signal.

However, in the cases of a file read error (due to permissions or hard
errors), you'd probably want to use a longjmp or siglongjmp.  That would
at least prevent a viewer from crashing: it would report a read error
instead.

That is probably the right thing to do for GMC.

Getting the error code is more interesting.  siginfo->si_error is not
implemented reliably.  So you should re-read the problem page inside the
SIGBUS handler, which will probably return the correct error code.  If
it surprises you be succeeding, you can just return, or report an
unknown error.

> I can intercept SIGBUS, but should the viewer refuse to view the file at
> all or it should revert to loading the whole file in memory?

There is no point in loading the whole file.  If you get SIGBUS, it
means you failed to read part of the file, so the viewer should report a
read error.

> Your sentiment about network transparency is very understandable.  Some
> people want their fancy keyboard combinations work they do in Windows, but
> the terminfo database doesn't have entries for keys like Ctrl-Alt-PgUp.
> Making everyone happy requires either hacks or redesigning very
> fundamental things.

Well, xterm, emacs, ghostview etc. and even netscape & mozilla work just
fine remotely, so I'm disappointed to find Gnome apps have the special
feature of _not_ working remotely.  It is a peculiar weakness, given
that "network" is what the N stands for.

> > Trapping SIGBUS and recovering in a file viewer is important IMO, not
> > least because it is quite easy to do.  You don't even need
> > siginfo->si_addr if you know which file is being parsed for the display
> > at SIGBUS time, and the parsing/display code itself does not need any
> > special code.
> 
> The viewer is written in a highly reentrant manner, but it should be
> possible to set some static variable to the "struct view" containing all
> the inforamtion about the current viewer instance before doing "dangerous"
> things, like accessing the mmap()ed memory the first time.

Probably the easiest thing to do is sigsetjmp() as you suggested.

> If SIGBUS can arrive during any memory access, not just the first one,
> then it may be harder to do, but still possible (only the currently
> displayed viewer is likely to be "guilty").

SIGBUS can arrive during any memory access.

> "info libc" (glibc-2.2.4) doesn't mention "siginfo" and "si_addr".  Is it
> Linux-specific?

No, you can find it in lots of modern unix systems.  It seems to be the
standard way of passing information to a signal handler nowadays.
Unfortunately Linux does not implement it properly, though it improves
as the years go by.

The Glibc manual seems to document the "GNU" system, which I have never
seen.  It has features that GNU/Linux does not, and vice versa.

-- Jamie




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]