[gjs: 1/6] initial commit of heapgraph scripts

From: Philip Chimento <pchimento src gnome org>
To: commits-list gnome org
Cc:
Subject: [gjs: 1/6] initial commit of heapgraph scripts
Date: Mon, 9 Apr 2018 21:16:56 +0000 (UTC)
commit 32b307463a6fb2c62f985417ea8442c86800b0e6
Author: Andy Holmes <andrew g r holmes gmail com>
Date:   Sat Apr 7 21:32:04 2018 -0700

    initial commit of heapgraph scripts

 tools/README.md    | 174 ++++++++++++++
 tools/heapdot.py   | 269 ++++++++++++++++++++++
 tools/heapgraph.py | 659 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1102 insertions(+)
---
diff --git a/tools/README.md b/tools/README.md
new file mode 100644
index 00000000..16aac275
--- /dev/null
+++ b/tools/README.md
@@ -0,0 +1,174 @@
+# gjs-heapgraph
+
+A heap analyzer for Gjs based on https://github.com/amccreight/heapgraph to aid
+in debugging and plugging memory leaks.
+
+## Resource Usage
+
+Be aware that parsing a heap can take a fair amount of RAM depending on the
+heap size and time depending on the amount of target objects and path length.
+
+Examples of approximate memory and time required to build DOT graphs on an
+IvyBridge i7:
+
+| Heap Size | RAM   | Targets | Time        |
+|-----------|-------|---------|-------------|
+| 5MB       | 80MB  | 1500    | 1.5 Minutes |
+| 30MB      | 425MB | 7700    | 40 Minutes  |
+
+## Basic Usage
+
+### Getting a Heap Dump
+
+Currently the only way to dump a heap is from within a script, although in the
+near future it will be possible to send `SIGUSR1` to a GJS process with the env
+variable `GJS_DEBUG_HEAP_OUTPUT` set:
+
+```
+$ GJS_DEBUG_HEAP_OUTPUT=myApp.heap gjs myApp.js
+$ kill -s SIGUSR1 <gjs-pid>
+```
+
+An example of dumping a heap from within a script:
+
+```js
+const System = imports.system;
+
+// Dumping the heap before the "leak" has happened
+System.dumpHeap('/home/user/myApp1.heap.');
+
+// Code presumably resulting in a leak...
+
+// Running the garbage collector before dumping can avoid some false positives
+System.gc();
+
+// Dumping the heap after the "leak" has happened
+System.dumpHeap('/home/user/myApp2.heap.');
+```
+
+### Output
+
+The default output of `./heapgraph.py` is a tiered tree of paths from root to
+rooted objects. If the output is being sent to a terminal (TTY) some minimal
+ANSI styling is used to make the output more readable. Additionally, anything
+that isn't part of the graph will be sent to `stderr` so the output can be
+directed to a file as plain text. Below is a snippet:
+
+```
+$ ./heapgraph.py file.heap Object > file.tree
+Parsing file.heap...done
+Found 343 targets with type "Object"
+
+$ cat file.tree
+├─[vm_stack[1]]─➤ [Object jsobj@0x7fce60683440]
+│ 
+├─[vm_stack[1]]─➤ [Object jsobj@0x7fce606833c0]
+│ 
+├─[exact-Object]─➤ [Object jsobj@0x7fce60683380]
+│ 
+├─[exact-Object]─➤ [GjsGlobal jsobj@0x7fce60680060]
+│ ├─[Debugger]─➤ [Function Debugger jsobj@0x7fce606a4540]
+│ │ ╰─[Object]─➤ [Function Object jsobj@0x7fce606a9cc0]
+│ │   ╰─[prototype]─➤ [Object (nil) jsobj@0x7fce60681160]
+│ │
+...and so on
+```
+
+`heapgraph.py` can also output DOT graphs that can be a useful way to visualize
+the heap graph, especially if you don't know exactly what you're looking for.
+Passing the `--dot-graph` option will output a DOT graph to `<input-file>.dot`
+in the current working directory.
+
+There are a few choices for viewing dot graphs, and many utilities for
+converting them to other formats like PDF, Tex or GraphML. For Gnome desktops
+[`xdot`](https://github.com/jrfonseca/xdot.py) is a nice lightweight
+Python/Cairo viewer available on PyPi and in most distributions.
+
+```sh
+$ ./heapgraph.py --dot-graph /home/user/myApp2.heap Object
+Parsing file.heap...done
+Found 343 targets with type "Object"
+
+$ xdot myApp2.heap.dot
+```
+
+### Excluding Nodes from the Graph
+
+The exclusion switch you are most likely to use is `--diff-heap` which will
+exclude all nodes in the graph common to that heap, allowing you to easily
+see what's not being collected between two states.
+
+```sh
+$ ./heapgraph --diff-heap myApp1.heap myApp2.heap GObject
+```
+
+You can also exclude Gray Roots, WeakMaps, nodes with a heap address or nodes
+with labels containing a string. Because GObject addresses are part of the node
+label, these can be excluded with `--hide-node` as well.
+
+By default the global object (GjsGlobal aka `window`), imports (GjsModule,
+GjsFileImporter), and namespaces (GIRepositoryNamespace) aren't shown in the
+graph since these are less useful and can't be garbage collected anyways.
+
+```sh
+$ ./heapgraph.py --hide-addr 0x7f6ef022c060 \
+                 --hide-node 'self-hosting-global' \
+                 --no-gray-roots \
+                 /home/user/myApp2.heap Object
+$ ./heapgraph.py --hide-node 0x55e93cf5deb0 /home/user/myApp2.heap Object
+```
+
+### Command-Line Arguments
+
+> **NOTE:** Command line arguments are subject to change; Check
+> `./heapgraph.py --help` before running.
+
+```
+usage: heapgraph.py [-h] [--edge | --function | --string] [--count]
+                    [--dot-graph] [--no-addr] [--diff-heap FILE]
+                    [--no-gray-roots] [--no-weak-maps] [--show-global]
+                    [--show-imports] [--hide-addr ADDR] [--hide-node LABEL]
+                    FILE TARGET
+
+Find what is rooting or preventing an object from being collected in a GJS
+heap using a shortest-path breadth-first algorithm.
+
+positional arguments:
+  FILE                  Garbage collector heap from System.dumpHeap()
+  TARGET                Heap address (eg. 0x7fa814054d00) or type prefix (eg.
+                        Array, Object, GObject, Function...)
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --edge, -e            Treat TARGET as a function name
+  --function, -f        Treat TARGET as a function name
+  --string, -s          Treat TARGET as a string literal or String()
+
+Output Options:
+  --count, -c           Only count the matches for TARGET
+  --dot-graph, -d       Output a DOT graph to FILE.dot
+  --no-addr, -na        Don't show addresses
+
+Node/Root Filtering:
+  --diff-heap FILE, -dh FILE
+                        Don't show roots common to the heap FILE
+  --no-gray-roots, -ng  Don't show gray roots (marked to be collected)
+  --no-weak-maps, -nwm  Don't show WeakMaps
+  --show-global, -g     Show the global object (eg. window/GjsGlobal)
+  --show-imports, -i    Show import and module nodes (eg. imports.foo)
+  --hide-addr ADDR, -ha ADDR
+                        Don't show roots with the heap address ADDR
+  --hide-node LABEL, -hn LABEL
+                        Don't show nodes with labels containing LABEL
+```
+
+## See Also
+
+Below are some links to information relevant to SpiderMonkey garbage collection
+and heap parsing:
+
+* [GC.cpp Comments](https://searchfox.org/mozilla-central/source/js/src/gc/GC.cpp)
+* [How JavaScript Objects Are 
Implemented](https://www.infoq.com/presentations/javascript-objects-spidermonkey)
+* [Tracing garbage collection](https://en.wikipedia.org/wiki/Tracing_garbage_collection#Tri-color_marking) 
on Wikipedia
+* [SpiderMonkey Memory](https://gitlab.gnome.org/GNOME/gjs/blob/master/doc/SpiderMonkey_Memory.md) via GJS 
Repo
+
diff --git a/tools/heapdot.py b/tools/heapdot.py
new file mode 100644
index 00000000..e9333ce7
--- /dev/null
+++ b/tools/heapdot.py
@@ -0,0 +1,269 @@
+#!/usr/bin/env python3
+
+# This Source Code Form is subject to the terms of the Mozilla Public
+# License, v. 2.0. If a copy of the MPL was not distributed with this
+# file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#
+# heapdot.py - DOT Graph output
+
+
+import re
+
+func_regex = re.compile('Function(?: ([^/]+)(?:/([<|\w]+))?)?')
+gobj_regex = re.compile('([^ ]+) (\(nil\)|0x[a-fA-F0-9]+$)')
+
+
+###############################################################################
+# Shape Compression
+###############################################################################
+
+def findi(m, x):
+    if not x in m:
+        m[x] = [x, 0]
+        return m[x]
+    if m[x][0] == x:
+        return m[x]
+    z = findi (m, m[x][0])
+    m[x] = z
+    return z
+
+
+def find(m, x):
+    return findi(m, x)[0]
+
+
+def union(m, rep, x, y):
+    xp = findi (m, x)
+    yp = findi (m, y)
+    if xp == yp:
+        return
+    if xp[1] < yp[1]:
+        rep[yp[0]] = rep.get(xp[0], xp[0])
+        if xp[0] in rep:
+            del rep[xp[0]]
+        m[xp[0]][0] = yp[0]
+    elif xp[1] > yp[1]:
+        m[yp[0]][0] = xp[0]
+    else:
+        m[yp[0]][0] = xp[0]
+        m[xp[0]][1] += 1
+
+
+def compress_shapes(graph, nodes, edges):
+    shape_merge = {}
+    shape_rep = {}
+
+    def canon_node(x):
+        y = find(shape_merge, x)
+        if y in shape_rep:
+            y = shape_rep[y]
+        return y
+
+    for x in nodes:
+        if graph.node_labels.get(x, '') != 'shape':
+            continue
+        if not x in edges:
+            continue
+        for y in edges[x]:
+            if graph.node_labels.get(y, '') != 'shape' and graph.node_labels.get(y, '') != 'base_shape':
+                continue
+            union(shape_merge, shape_rep, y, x)
+            break
+
+    # Remove merged away nodes
+    for x in shape_merge.keys():
+        if canon_node(x) != x:
+            nodes.remove(x)
+
+    # Update the edges for merging
+    new_edges = {}
+    for x, dsts in edges.items():
+        new_dsts = set([])
+        for y in dsts:
+            new_dsts.add(canon_node(y))
+        x = canon_node(x)
+        if x in new_dsts:
+            new_dsts.remove(x)
+        new_edges[x] = new_edges.get(x, set([])).union(new_dsts)
+    edges = new_edges
+
+
+###############################################################################
+# DOT Graph Output
+###############################################################################
+
+dot_graph_paths = []
+
+
+def add_dot_graph_path(path):
+    dot_graph_paths.append(path)
+
+
+def output_dot_file(args, graph, targs, fname):
+
+    # build the set of nodes
+    nodes = set([])
+    for p in dot_graph_paths:
+        for x in p:
+            nodes.add(x)
+
+    # build the edge map
+    edges = {}
+
+    for p in dot_graph_paths:
+        prevNode = None
+        for x in p:
+            if prevNode:
+                edges.setdefault(prevNode, set([])).add(x)
+            prevNode = x
+
+    # Shape Compression
+    compress_shapes(graph, nodes, edges)
+
+
+    # Write out the DOT graph
+    outf = open(fname, 'w')
+    outf.write('digraph {\n')
+
+    # Nodes
+    for addr in nodes:
+        label = graph.node_labels.get(addr, '')
+        color = 'black'
+        style = 'solid'
+        shape = 'rect'
+        native = ''
+
+        if label.endswith('<no private>'):
+            label = label[:-13]
+
+        # Lookup the edge label for this node
+        elabel = ''
+
+        for origin in graph.edge_labels.values():
+            if addr in origin:
+                elabels = origin[addr]
+                elabel = elabels[0]
+                break
+
+
+        # GObject or something else with a native address
+        gm = gobj_regex.match(label)
+
+        if gm:
+            label = gm.group(1)
+            color = 'orange'
+            style = 'bold'
+
+            if not args.no_addr:
+                native = gm.group(2)
+
+            # Some kind of GObject
+            if label.startswith('GObject_'):
+                shape = 'circle'
+
+                if elabel in ['prototype', 'group_proto']:
+                    style += ',dashed'
+            # Another object native to Gjs
+            elif label.startswith('Gjs') or label.startswith('GIR'):
+                shape = 'octagon'
+        elif label.startswith('Function'):
+            fm = func_regex.match(label)
+
+            if fm.group(2) == '<':
+                label = 'Function via {}()'.format(fm.group(1))
+            elif fm.group(2):
+                label = 'Function {} in {}'.format(fm.group(2), fm.group(1))
+            else:
+                if len(label) > 10:
+                    label = label[9:]
+                label += '()'
+
+            color = 'green'
+            style = 'bold,rounded'
+        # A function context
+        elif label == 'Call' or label == 'LexicalEnvironment':
+            color = 'green'
+            style = 'bold,dashed'
+        # A file/script reference
+        elif label.startswith('script'):
+            label = label[7:].split('/')[-1]
+            shape = 'note'
+            color = 'blue'
+        # A WeakMap
+        elif label.startswith('WeakMap'):
+            label = 'WeakMap'
+            style = 'dashed'
+        # Mostly uninteresting objects
+        elif label in ['base_shape', 'object_group', 'type_object']:
+            style = 'dotted'
+            if label == 'base_shape':
+                label = 'shape'
+            elif label == 'type_object':
+                label = 'type'
+
+        # Only mark the target if it's a single match
+        if addr == targs[0] and len(targs) == 1:
+            color = 'red'
+            style = 'bold'
+
+        if args.no_addr:
+            outf.write('  node [label="{0}", color={1}, shape={2}, style="{3}"] q{4};\n'.format(label, 
color, shape, style, addr))
+        else:
+            if native:
+                outf.write('  node [label="{0}\\njsobj@{4}\\nnative@{5}", color={1}, shape={2}, style="{3}"] 
q{4};\n'.format(label, color, shape, style, addr, native))
+            else:
+                outf.write('  node [label="{0}\\njsobj@{4}", color={1}, shape={2}, style="{3}"] 
q{4};\n'.format(label, color, shape, style, addr))
+
+    # Edges (relationships)
+    for origin, destinations in edges.items():
+        for destination in destinations:
+            labels = graph.edge_labels.get(origin, {}).get(destination, [])
+            ll = []
+
+            for l in labels:
+                if len(l) == 2:
+                    l = l[0]
+                if l.startswith('**UNKNOWN SLOT '):
+                    continue
+                ll.append(l)
+
+            label = ''
+            style = 'solid'
+            color = 'black'
+
+            if len(ll) == 1:
+                label = ll[0]
+
+                # Object children
+                if label.startswith('objects['):
+                    label = label[7:]
+                # Array elements
+                elif label.startswith('objectElements['):
+                    label = label[14:]
+                # prototype/constructor function
+                elif label in ['prototype', 'group_proto']:
+                    color = 'orange'
+                    style = 'bold,dashed'
+                # fun_environment
+                elif label == 'fun_environment':
+                    label = ''
+                    color = 'green'
+                    style = 'bold,dashed'
+                elif label == 'script':
+                    label = ''
+                    color = 'blue'
+                # Signals
+                # TODO: better heap label via gi/closure.cpp & gi/object.cpp
+                elif label == 'signal connection':
+                    color = 'red'
+                    style = 'bold,dashed'
+
+                if len(label) > 18:
+                    label = label[:8] + '...' + label[-8:]
+            else:
+                label = ',\\n'.join(ll)
+
+            outf.write('  q{0} -> q{1} [label="{2}", color={3}, style="{4}"];\n'.format(origin, destination, 
label, color, style))
+
+    outf.write('}\n')
+    outf.close()
diff --git a/tools/heapgraph.py b/tools/heapgraph.py
new file mode 100755
index 00000000..4b54b1f5
--- /dev/null
+++ b/tools/heapgraph.py
@@ -0,0 +1,659 @@
+#!/usr/bin/env python3
+
+# This Source Code Form is subject to the terms of the Mozilla Public
+# License, v. 2.0. If a copy of the MPL was not distributed with this
+# file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#
+# heapgraph.py - Top-level script for interpreting Garbage Collector heaps
+
+import argparse
+import copy
+from collections import namedtuple
+from collections import deque
+import os
+import re
+import sys
+
+try:
+    from heapdot import output_dot_file
+    from heapdot import add_dot_graph_path
+except ImportError:
+    sys.stderr.write('DOT graph output not available\n')
+
+
+########################################################
+# Command line arguments.
+########################################################
+
+parser = argparse.ArgumentParser(description='Find what is rooting or preventing an object from being 
collected in a GJS heap using a shortest-path breadth-first algorithm.')
+
+parser.add_argument('heap_file', metavar='FILE',
+                    help='Garbage collector heap from System.dumpHeap()')
+
+parser.add_argument('target', metavar='TARGET',
+                    help='Heap address (eg. 0x7fa814054d00) or type prefix (eg. Array, Object, GObject, 
Function...)')
+
+### Target Options
+targ_opts = parser.add_mutually_exclusive_group()
+
+
+targ_opts.add_argument('--edge', '-e', dest='edge_target', action='store_true',
+                       default=False,
+                       help='Treat TARGET as a function name')
+
+targ_opts.add_argument('--function', '-f', dest='func_target', action='store_true',
+                       default=False,
+                       help='Treat TARGET as a function name')
+
+targ_opts.add_argument('--string', '-s', dest='string_target', action='store_true',
+                       default=False,
+                       help='Treat TARGET as a string literal or String()')
+
+### Output Options
+out_opts = parser.add_argument_group('Output Options')
+
+out_opts.add_argument('--count', '-c', dest='count', action='store_true',
+                      default=False,
+                      help='Only count the matches for TARGET')
+
+out_opts.add_argument('--dot-graph', '-d', dest='dot_graph',
+                      action='store_true', default=False,
+                      help='Output a DOT graph to FILE.dot')
+
+out_opts.add_argument('--no-addr', '-na', dest='no_addr',
+                      action='store_true', default=False,
+                      help='Don\'t show addresses')
+
+### Node and Root Filtering
+filt_opts = parser.add_argument_group('Node/Root Filtering')
+
+filt_opts.add_argument('--diff-heap', '-dh', dest='diff_heap', action='store',
+                      metavar='FILE',
+                      help='Don\'t show roots common to the heap FILE')
+
+filt_opts.add_argument('--no-gray-roots', '-ng', dest='no_gray_roots',
+                       action='store_true', default=False,
+                       help='Don\'t show gray roots (marked to be collected)')
+
+filt_opts.add_argument('--no-weak-maps', '-nwm', dest='no_weak_maps',
+                       action='store_true', default=False,
+                       help='Don\'t show WeakMaps')
+
+filt_opts.add_argument('--show-global', '-g', dest='show_global',
+                       action='store_true', default=False,
+                       help='Show the global object (eg. window/GjsGlobal)')
+
+filt_opts.add_argument('--show-imports', '-i', dest='show_imports',
+                       action='store_true', default=False,
+                       help='Show import and module nodes (eg. imports.foo)')
+
+filt_opts.add_argument('--hide-addr', '-ha', dest='hide_addrs', action='append',
+                       metavar='ADDR', default=[],
+                       help='Don\'t show roots with the heap address ADDR')
+
+filt_opts.add_argument('--hide-node', '-hn', dest='hide_nodes', action='append',
+                       metavar='LABEL', default=['GIRepositoryNamespace', 'GjsFileImporter', 'GjsGlobal', 
'GjsModule'],
+                       help='Don\'t show nodes with labels containing LABEL')
+
+
+###############################################################################
+# Heap Patterns
+###############################################################################
+
+GraphAttribs = namedtuple('GraphAttribs', 'edge_labels node_labels roots root_labels weakMapEntries')
+WeakMapEntry = namedtuple('WeakMapEntry', 'weakMap key keyDelegate value')
+
+
+addr_regex = re.compile('[A-F0-9]+$|0x[a-f0-9]+$')
+node_regex = re.compile ('((?:0x)?[a-fA-F0-9]+) (?:(B|G|W) )?([^\r\n]*)\r?$')
+edge_regex = re.compile ('> ((?:0x)?[a-fA-F0-9]+) (?:(B|G|W) )?([^\r\n]*)\r?$')
+wme_regex = re.compile ('WeakMapEntry map=([a-zA-Z0-9]+|\(nil\)) key=([a-zA-Z0-9]+|\(nil\)) 
keyDelegate=([a-zA-Z0-9]+|\(nil\)) value=([a-zA-Z0-9]+)\r?$')
+
+func_regex = re.compile('Function(?: ([^/]+)(?:/([<|\w]+))?)?')
+gobj_regex = re.compile('([^ ]+) (0x[a-fA-F0-9]+$)')
+
+###############################################################################
+# Heap Parsing
+###############################################################################
+
+def parse_roots(fobj):
+    """Parse the roots portion of a garbage collector heap."""
+
+    roots = {}
+    root_labels = {}
+    weakMapEntries = []
+
+    for line in fobj:
+        node = node_regex.match(line)
+
+        if node:
+            addr = node.group(1)
+            color = node.group(2)
+            label = node.group(3)
+
+            # Only overwrite an existing root with a black root.
+            if addr not in roots or color == 'B':
+                roots[addr] = (color == 'B')
+                # It would be classier to save all the root labels, though then
+                # we have to worry about gray vs black.
+                root_labels[addr] = label
+        else:
+            wme = wme_regex.match(line)
+
+            if wme:
+                weakMapEntries.append(WeakMapEntry(weakMap=wme.group(1),
+                                                   key=wme.group(2),
+                                                   keyDelegate=wme.group(3),
+                                                   value=wme.group(4)))
+            # Skip comments, arenas, compartments and zones
+            elif line[0] == '#':
+                continue
+            # Marks the end of the roots section
+            elif line[:10] == '==========':
+                break
+            else:
+                sys.stderr.write('Error: unknown line {}\n'.format(line))
+                exit(-1)
+
+    return [roots, root_labels, weakMapEntries]
+
+
+def parse_graph(fobj):
+    """Parse the node and edges of a garbage collector heap."""
+
+    edges = {}
+    edge_labels = {}
+    node_labels = {}
+
+    def addNode (addr, node_label):
+        edges[addr] = {}
+        edge_labels[addr] = {}
+
+        if node_label != '':
+            node_labels[addr] = node_label
+
+    def addEdge(source, target, edge_label):
+        edges[source][target] = edges[source].get(target, 0) + 1
+
+        if edge_label != '':
+            edge_labels[source].setdefault(target, []).append(edge_label)
+
+    node_addr = None
+
+    for line in fobj:
+        e = edge_regex.match(line)
+
+        if e:
+            if node_addr not in args.hide_addrs:
+                addEdge(node_addr, e.group(1), e.group(3))
+        else:
+            node = node_regex.match(line)
+
+            if node:
+                node_addr = node.group(1)
+                node_color = node.group(2)
+                node_label = node.group(3)
+
+                # Use this opportunity to map hide_nodes to addresses
+                for hide_node in args.hide_nodes:
+                    if hide_node in node_label:
+                        args.hide_addrs.append(node_addr)
+                        break
+                else:
+                    addNode(node_addr, node_label)
+            # Skip comments, arenas, compartments and zones
+            elif line[0] == '#':
+                continue
+            else:
+                sys.stderr.write('Error: Unknown line: {}\n'.format(line[:-1]))
+
+    # yar, should pass the root crud in and wedge it in here, or somewhere
+    return [edges, edge_labels, node_labels]
+
+
+def parse_heap(fname):
+    """Parse a garbage collector heap."""
+
+    try:
+        fobj = open(fname, 'r')
+    except:
+        sys.stderr.write('Error opening file {}\n'.format(fname))
+        exit(-1)
+
+    [roots, root_labels, weakMapEntries] = parse_roots(fobj)
+    [edges, edge_labels, node_labels] = parse_graph(fobj)
+    fobj.close()
+
+    graph = GraphAttribs(edge_labels=edge_labels, node_labels=node_labels,
+                         roots=roots, root_labels=root_labels,
+                         weakMapEntries=weakMapEntries)
+
+    return (edges, graph)
+
+
+def find_nodes(fname):
+    """Parse a garbage collector heap and return a list of node addresses."""
+
+    addrs = [];
+
+    try:
+        fobj = open(fname, 'r')
+        sys.stderr.write('Parsing {0}...'.format(fname))
+    except:
+        sys.stderr.write('Error opening file {}\n'.format(fname))
+        exit(-1)
+
+    # Whizz past the roots
+    for line in fobj:
+        if '==========\n' in line:
+            break
+
+    for line in fobj:
+        node = node_regex.match(line)
+
+        if node:
+            addrs.append(node.group(1))
+
+    fobj.close()
+
+    sys.stderr.write('done\n')
+    sys.stderr.flush()
+
+    return addrs
+
+
+
+# Some applications may not care about multiple edges.
+# They can instead use a single graph, which is represented as a map
+# from a source node to a set of its destinations.
+def to_single_graph(edges):
+    single_graph = {}
+
+    for origin, destinations in edges.items():
+        d = set([])
+        for destination, distance in destinations.items():
+            d.add(destination)
+        single_graph[origin] = d
+
+    return single_graph
+
+
+def load_graph(fname):
+    sys.stderr.write('Parsing {0}...'.format(fname))
+    (edges, graph) = parse_heap(fname)
+    edges = to_single_graph(edges)
+    sys.stderr.write('done\n')
+
+    sys.stderr.flush()
+
+    return (edges, graph)
+
+
+###############################################################################
+# Path printing
+###############################################################################
+
+tree_graph_paths = {}
+
+
+class style:
+    DIM = '\033[30m'
+    BOLD = '\033[1m'
+    ITALIC = '\033[3m'
+    UNDERLINE = '\033[4m'
+    END = '\033[0m'
+
+
+def get_edge_label(graph, origin, destination):
+    elabel = lambda l: l[0] if len(l) == 2 else l
+    labels = graph.edge_labels.get(origin, {}).get(destination, [])
+
+    if len(labels) == 1:
+        label = labels[0]
+
+        if label == 'signal connection':
+            return 'GSignal'
+        else:
+            return label
+    elif len(labels) > 1:
+        return ', '.join([elabel(l) for l in labels])
+    else:
+        return ''
+
+
+def get_node_label(graph, addr):
+    label = graph.node_labels[addr]
+
+    if label.endswith(' <no private>'):
+        label = label[:-13]
+
+    if label.startswith('Function '):
+        fm = func_regex.match(label)
+
+        if fm.group(2) == '<':
+            return 'Function via {}'.format(fm.group(1))
+        elif fm.group(2):
+            return 'Function {} in {}'.format(fm.group(2), fm.group(1))
+        else:
+            return label
+    if label.startswith('script'):
+        label = label[7:].split('/')[-1]
+    elif label.startswith('WeakMap'):
+        label = 'WeakMap'
+    elif label == 'base_shape':
+        label = 'shape'
+    elif label == 'type_object':
+        label = 'type'
+
+    return label[:50]
+
+
+def get_root_label(graph, root):
+    # This won't work on Windows.
+    #label = re.sub(r'0x[0-9a-f]{8}', '*', graph.root_labels[root])
+    label = graph.root_labels[root]
+    return '(ROOT) {}'.format(label)
+
+
+def output_tree_graph(graph, data, base='', parent=''):
+    while data:
+        addr, children = data.popitem()
+
+        # Labels
+        if parent:
+            edge = get_edge_label(graph, parent, addr)
+        else:
+            edge = graph.root_labels[addr]
+
+        node = get_node_label(graph, addr)
+        has_native = gobj_regex.match(node)
+
+        # Color/Style
+        if os.isatty(1):
+            if parent:
+                edge = style.DIM + edge + style.END
+            else:
+                edge = style.ITALIC + edge + style.END
+
+            orig = style.UNDERLINE + 'jsobj@' + addr + style.END
+
+            if has_native:
+                node = style.BOLD + has_native.group(1) + style.END
+                orig += ' ' + style.UNDERLINE + 'native@' + has_native.group(2) + style.END
+            else:
+                node = style.BOLD + node + style.END
+        else:
+            orig = 'jsobj@' + addr
+
+            if has_native:
+                node = has_native.group(1)
+                orig += ' native@' + has_native.group(2)
+
+        # Print the path segment
+        if args.no_addr:
+            if data:
+                print('{0}├─[{1}]─➤ [{2}]'.format(base, edge, node))
+            else:
+                print('{0}╰─[{1}]─➤ [{2}]'.format(base, edge, node))
+        else:
+            if data:
+                print('{0}├─[{1}]─➤ [{2} {3}]'.format(base, edge, node, orig))
+            else:
+                print('{0}╰─[{1}]─➤ [{2} {3}]'.format(base, edge, node, orig))
+
+        # Print child segments
+        if children:
+            if data:
+                output_tree_graph(graph, children, base + '│ ', addr)
+            else:
+                output_tree_graph(graph, children, base + '  ', addr)
+        else:
+            if data:
+                print(base + '│ ')
+            else:
+                print(base + '  ')
+
+
+def add_tree_graph_path(owner, path):
+    o = owner.setdefault(path.pop(0), {})
+    if path:
+        add_tree_graph_path(o, path)
+
+
+def add_path(args, graph, path):
+    if args.dot_graph:
+        add_dot_graph_path(path)
+    else:
+        add_tree_graph_path(tree_graph_paths, path)
+
+
+###############################################################################
+# Breadth-first shortest path finding.
+###############################################################################
+
+def find_roots_bfs(args, edges, graph, target):
+    workList = deque()
+    distances = {}
+
+    def traverseWeakMapEntry(dist, k, m, v, label):
+        if not k in distances or not m in distances:
+            # Haven't found either the key or map yet.
+            return
+
+        if distances[k][0] > dist or distances[m][0] > dist:
+            # Either the key or the weak map is farther away, so we
+            # must wait for the farther one before processing it.
+            return
+
+        if v in distances:
+            return
+
+        distances[v] = (dist + 1, k, m, label)
+        workList.append(v)
+
+
+    # For now, ignore keyDelegates.
+    weakData = {}
+    for wme in graph.weakMapEntries:
+        weakData.setdefault(wme.weakMap, set([])).add(wme)
+        weakData.setdefault(wme.key, set([])).add(wme)
+        if wme.keyDelegate != '0x0':
+            weakData.setdefault(wme.keyDelegate, set([])).add(wme)
+
+    # Unlike JavaScript objects, GObjects can be "rooted" by their refcount so
+    # we have to use a fake root (repetitively)
+    startObject = 'FAKE START OBJECT'
+    rootEdges = set([])
+
+    for addr, isBlack in graph.roots.items():
+        if isBlack or not args.no_gray_roots:
+            rootEdges.add(addr)
+
+    #FIXME:
+    edges[startObject] = rootEdges
+    distances[startObject] = (-1, None)
+    workList.append(startObject)
+
+    # Search the graph.
+    while workList:
+        origin = workList.popleft()
+        dist = distances[origin][0]
+
+        # Found the target, stop digging
+        if origin == target:
+            continue
+
+        # origin does not point to any other nodes.
+        if not origin in edges:
+            continue
+
+        for destination in edges[origin]:
+            if destination not in distances:
+                distances[destination] = (dist + 1, origin)
+                workList.append(destination)
+
+        if origin in weakData:
+            for wme in weakData[origin]:
+                traverseWeakMapEntry(dist, wme.key, wme.weakMap, wme.value,
+                                     "value in WeakMap " + wme.weakMap)
+                traverseWeakMapEntry(dist, wme.keyDelegate, wme.weakMap, wme.key,
+                                     "key delegate in WeakMap " + wme.weakMap)
+
+
+    # Print out the paths by unwinding backwards to generate a path,
+    # then print the path. Accumulate any weak maps found during this
+    # process into the printWorkList queue, and print out what keeps
+    # them alive. Only print out why each map is alive once.
+    printWorkList = deque()
+    printWorkList.append(target)
+    printedThings = set([target])
+
+    while printWorkList:
+        p = printWorkList.popleft()
+        path = []
+        while p in distances:
+            path.append(p)
+            dist = distances[p]
+            if len(dist) == 2:
+                [_, p] = dist
+            else:
+                # The weak map key is probably more interesting,
+                # so follow it, and worry about the weak map later.
+                [_, k, m, label] = dist
+
+                graph.edge_labels[k].setdefault(p, []).append(label)
+                p = k
+                if not m in printedThings and not args.no_weak_maps:
+                    printWorkList.append(m)
+                    printedThings.add(m)
+
+        if path:
+            path.pop()
+            path.reverse()
+            add_path(args, graph, path)
+
+
+########################################################
+# Target selection
+########################################################
+
+def target_edge(graph, target):
+    targets = []
+
+    for origin, destinations in graph.edge_labels.items():
+        for destination in destinations:
+            if target in graph.edge_labels[origin][destination]:
+                targets.append(destination)
+
+    sys.stderr.write('Found {} objects with edge label of {}\n'.format(len(targets), target))
+    return targets
+
+
+def target_func(graph, target):
+    targets = []
+
+    for addr, label in graph.node_labels.items():
+        if not label[:9] == 'Function ':
+            continue
+
+        if label[9:] == target:
+            targets.append(addr)
+
+    sys.stderr.write('Found {} functions named "{}"\n'.format(len(targets), target))
+    return targets
+
+
+def target_gobject(graph, target):
+    targets = []
+
+    for addr, label in graph.node_labels.items():
+        if label.endswith(target):
+            targets.append(addr)
+
+    sys.stderr.write('Found GObject with address of {}\n'.format(target))
+    return targets
+
+
+def target_string(graph, target):
+    targets = []
+
+    for addr, label in graph.node_labels.items():
+        if label[:7] == 'string ' and target in label[7:]:
+            targets.append(addr)
+        elif label[:10] == 'substring ' and target in label[10:]:
+            targets.append(addr)
+
+    sys.stderr.write('Found {} strings containing "{}"\n'.format(len(targets), target))
+    return targets
+
+
+def target_type(graph, target):
+    targets = []
+
+    for addr in edges.keys():
+        if graph.node_labels.get(addr, '')[0:len(args.target)] == args.target:
+            targets.append(addr)
+
+    sys.stderr.write('Found {} targets with type "{}"\n'.format(len(targets), target))
+    return targets
+
+
+def select_targets(args, edges, graph):
+    if args.edge_target:
+        return target_edge(graph, args.target)
+    elif args.func_target:
+        return target_func(graph, args.target)
+    elif args.string_target:
+        return target_string(graph, args.target)
+    # If args.target seems like an address search for a JS Object, then GObject
+    elif addr_regex.match(args.target):
+        if args.target in edges:
+            sys.stderr.write('Found object with address "{}"\n'.format(args.target))
+            return [args.target]
+        else:
+            return target_gobject(graph, args.target)
+
+    # Fallback to looking for JavaScript objects by class name
+    return target_type(graph, args.target)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args()
+
+    # Node and Root Filtering
+    if args.show_global:
+        args.hide_nodes.remove('GjsGlobal')
+    if args.show_imports:
+        args.hide_nodes.remove('GjsFileImporter')
+        args.hide_nodes.remove('GjsModule')
+        args.hide_nodes.remove('GIRepositoryNamespace')
+
+    # Make sure we don't filter an explicit target
+    if args.target in args.hide_nodes:
+        args.hide_nodes.remove(args.target)
+
+    # Heap diffing; we do these addrs separately due to the sheer amount
+    diff_addrs = []
+
+    if args.diff_heap:
+        diff_addrs = find_nodes(args.diff_heap)
+
+    # Load the graph
+    (edges, graph) = load_graph(args.heap_file)
+    targets = select_targets(args, edges, graph)
+
+    if len(targets) == 0:
+        sys.stderr.write('No targets found for "{}".\n'.format(args.target))
+        sys.exit(-1)
+    elif args.count:
+        sys.exit(-1);
+
+    for addr in targets:
+        if addr in edges and addr not in diff_addrs:
+            find_roots_bfs(args, edges, graph, addr)
+
+    if args.dot_graph:
+        output_dot_file(args, graph, targets, args.heap_file + ".dot")
+    else:
+        output_tree_graph(graph, tree_graph_paths)
+
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]