[Builder] Highlighting

From: Christian Hergert <christian hergert me>
To: builder-list gnome org
Subject: [Builder] Highlighting
Date: Thu, 26 Mar 2015 12:28:58 -0700
First off, I owe you all some other emails about the progress we've made
and what LibIDE is and will become!

But today, we chat about highlighting.

So it's time to start plumbing highlighting into the source view based
on information from clang. There are a few ways we can do this, and they
all have trade-offs.

First off, I want to say that we wont be replacing the c.language
defintion that is part of GtkSourceView. This highlighting will be
strictly additive. We might change our minds in the future, but I don't
think that is very likely.

Once we have a valid translation-unit[1] from clang, we can walk through
the AST to get highlight information. `clang_visitChildren()` seems to
be what most people are using for this. An example of which could be
described succinctly as follows.

```c
static CXChildVisitResult
visit_all_children (CXCursor     cursor,
                    CXCursor     parent,
                    CXClientData user_data)
{
  CXSourceRange range = clang_getCursorExtents (cursor);

  switch (clang_getCursorKind (cursor)) {
  case CXCursor_MacroExpansion:
    highlight (user_data, range, "def:preprocessor");
    break;
  // ...
  }

  return CXChildVisit_Continue;
}
```

So how we go about doing this will have various memory and performance
impacts. We compile the translation-unit at most 2-3x per second. There
are some tricks in clang to "reparse" an existing translation unit
(which makes it faster), but I've yet to take advantage of this. In
general, we don't build a translation unit until we've had 250msec of
delay after the last key-press.

So some options:

1) Generate the highlight information on every compile request, and
   build an index that can be used by the highlight engine[2] to apply
   changes to the buffer.

   Pros: All this work is done in a background thread, and then is
         read-only afterwards. Very convenient for the engine to apply
         changes incrementally to the buffer based on word matches.
   Cons: Can result in invalid matches. For example, a struct field
         named "foo" could result in all words "foo" being highlighted.
         Probably some tricks we can do to mitigate this (like only
         highlight foo if it comes after . or ->
         This will have lots of memory churn, since we are rebuilding
         a lot of strings every time. GStringChunk helps, but only so
         much. (At least it will result in large-contiguous pages being
         freed when the compilation unit is released).

2) Use the translation-unit to update the file directly, in real-time.
   Pros: Correct information (if the buffer has not changed since the
         compilation request. Hard to do 100% of the time, but happens
         enough that you can "clean-up" and be correct every couple
         seconds.
   Cons: It's hard to do this incrementally. That means we waste a lot
         of time walking through the entire buffer. This also has
         reduced memory bloat since we reuse the translation unit's
         strings/cursors directory.
   Todo: Get timing information to update a buffer in real-time. We can
         start with an example like gtktextview.c, which is in the
         10,000+ lines of code realm. If we can highlight the entire
         buffer in < 1-2msec, it might be viable.

3) Use multiple indexes for project-global and file-local symbols.
   If we track what symbols are in the project included headers (such as
   gtk+, glib, etc) separately from what symbols are created in the
   project, we can possibly keep the memory bloat under control. Most of
   the symbols we will see during compilation will occur in external
   files.
   Pros: Reduced duplication of symbols from global headers.
         I have a semi-work prototype of this.
   Cons: Very complex to implement. Hard to keep up to date. Requires
         fancy locking semantics to deal with concurrent compilations.
         Expensive to determine if symbol is from project or
         system-wide.

4) This is a variant of option #1. If we only index the words that are
   found within the translation unit we are building, we can keep the
   index fairly small. (Probably less than the system PAGE_SIZE).
   The next translation unit compile will pick up additional symbols for
   which we can then go update.
   Pros: Reasonably fast to build the index. Low memory usage.
   Cons: typing in gtk_widget_show() won't appear highlighted until the
         next compilation unit unless you've already used that method
         before. Also, this wont make it easy for us to highlight member
         dereferences specially without extra work (the same ./->field
         name trick).

Currently, I'm leaning towards #4. I think that is how some other IDEs
work in that it takes a moment for a function to highlight correctly.
(Where as with our indexing tricks, it's basically immediate).

I really would like to be able to highlight fields specially when they
are correct such as the "bar" in foo->bar. However, we will know that it
is wrong from the red squiggle underlines, so perhaps we just make
errors change the syntax color to black and we get the same effect with
much less effort.

Anyhow, all the little cheats and tricks here is what makes it go fast.
So I'll probably be experimenting a bit on wip/highlight.

-- Christian


[1] A translation unit is the result from translating source code to an
    Abstract Syntax Tree (AST).
[2] IdeHighlightEngine was something I just put together to do
    incremental updates to the source buffer. It only runs for 1msec
    every 50msecs to ensure we stay interactive.
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]