[tracker/wip/carlosg/serialize-api: 27/36] libtracker-data: Hint best tracker_triples access strategy for DESCRIBE

From: Carlos Garnacho <carlosg src gnome org>
To: commits-list gnome org
Cc:
Subject: [tracker/wip/carlosg/serialize-api: 27/36] libtracker-data: Hint best tracker_triples access strategy for DESCRIBE
Date: Mon, 10 Jan 2022 10:58:35 +0000 (UTC)

commit bfeebe2b121763cf988f70c5a90e9544c392eeac
Author: Carlos Garnacho <carlosg gnome org>
Date:   Wed Nov 24 10:45:14 2021 +0100

    libtracker-data: Hint best tracker_triples access strategy for DESCRIBE
    
    With recent SQLite, there's been changes that affected the "best" index
    picked when accessing the tracker_triples table, giving up filtering
    on the "subject" column when it actually could do that.
    
    Commit 4a08ea0f9b tried to address this making the tracker_triples table
    provide an estimation of the cost to access the table with the different
    constraints, which fixed some cases like "tracker info <uri>" being able to
    filter by the given subject.
    
    But this in turn broke other situations where the tracker_triples table
    is used to match with a large number of items, it turns out that SQLite
    query planner doesn't distinguish between:
    
      SELECT FROM triples_table WHERE subject IN ($small_set)
    
    and:
    
      SELECT FROM triples_table WHERE subject IN ($large_set)
    
    To have a point where using the constraint by "subject" does not make
    sense anymore, and it becomes faster to not access the triples_table
    by providing many "subject" matches, but query everything altogether.
    
    This is most noticeable on DESCRIBE queries due to the way the
    tracker_triples table is joined, which makes them slow on large
    datasets.
    
    Since we can't tweak the tracker_triples xBestIndex method to pick
    the best behavior here, and the tracker_triples behavior is desirable
    on most other places, opt for tweaking the SQLite query planning in the
    DESCRIBE query string, so the subject is made ineligible for lookups
    in the tracker_triples table there.
    
    This brings the "best" of both worlds for DESCRIBE queries, so the
    fastest access method is used depending on whether it is asked to
    describe a fixed set of resources or a generic query.

 src/libtracker-data/tracker-sparql.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)
---
diff --git a/src/libtracker-data/tracker-sparql.c b/src/libtracker-data/tracker-sparql.c
index e3c0694df..225554dc8 100644
--- a/src/libtracker-data/tracker-sparql.c
+++ b/src/libtracker-data/tracker-sparql.c
@@ -3360,6 +3360,7 @@ translate_DescribeQuery (TrackerSparql  *sparql,
        TrackerVariable *variable;
        TrackerBinding *binding;
        GList *resources = NULL, *l;
+       gboolean has_variables = FALSE;
        gboolean glob = FALSE;
 
        /* DescribeQuery ::= 'DESCRIBE' ( VarOrIri+ | '*' ) DatasetClause* WhereClause? SolutionModifier
@@ -3383,7 +3384,7 @@ translate_DescribeQuery (TrackerSparql  *sparql,
                _append_string (sparql, "WHERE ");
        }
        _append_string (sparql,
-                       "object IS NOT NULL AND subject IN (");
+                       "object IS NOT NULL ");
 
        if (_accept (sparql, RULE_TYPE_LITERAL, LITERAL_GLOB)) {
                glob = TRUE;
@@ -3408,6 +3409,7 @@ translate_DescribeQuery (TrackerSparql  *sparql,
 
                                variable = tracker_token_get_variable (&resource);
                                binding = tracker_variable_binding_new (variable, NULL, NULL);
+                               has_variables = TRUE;
                        }
 
                        tracker_binding_set_data_type (binding, TRACKER_PROPERTY_TYPE_RESOURCE);
@@ -3418,6 +3420,25 @@ translate_DescribeQuery (TrackerSparql  *sparql,
                tracker_sparql_pop_context (sparql, FALSE);
        }
 
+       if (has_variables) {
+               /* If we have variables, we will likely have a WHERE clause
+                * that will return a moderately large set of results.
+                *
+                * Since the turning point is not that far where it is faster
+                * to query everything from the triples table and filter later
+                * than querying row by row, we soon want the former if there
+                * is a WHERE pattern.
+                *
+                * To hint this to SQLite query planner, use the unary plus
+                * operator to disqualify the term from constraining an index,
+                * (https://www.sqlite.org/optoverview.html#disqualifying_where_clause_terms_using_unary_)
+                * which is exactly what we are meaning to do here.
+                */
+               _append_string (sparql, "AND +subject IN (");
+       } else {
+               _append_string (sparql, "AND subject IN (");
+       }
+
        while (_check_in_rule (sparql, NAMED_RULE_DatasetClause))
                _call_rule (sparql, NAMED_RULE_DatasetClause, error);
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]