[tracker/wip/carlosg/serialize-api: 27/36] libtracker-data: Hint best tracker_triples access strategy for DESCRIBE
- From: Carlos Garnacho <carlosg src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [tracker/wip/carlosg/serialize-api: 27/36] libtracker-data: Hint best tracker_triples access strategy for DESCRIBE
- Date: Mon, 10 Jan 2022 10:58:35 +0000 (UTC)
commit bfeebe2b121763cf988f70c5a90e9544c392eeac
Author: Carlos Garnacho <carlosg gnome org>
Date: Wed Nov 24 10:45:14 2021 +0100
libtracker-data: Hint best tracker_triples access strategy for DESCRIBE
With recent SQLite, there's been changes that affected the "best" index
picked when accessing the tracker_triples table, giving up filtering
on the "subject" column when it actually could do that.
Commit 4a08ea0f9b tried to address this making the tracker_triples table
provide an estimation of the cost to access the table with the different
constraints, which fixed some cases like "tracker info <uri>" being able to
filter by the given subject.
But this in turn broke other situations where the tracker_triples table
is used to match with a large number of items, it turns out that SQLite
query planner doesn't distinguish between:
SELECT FROM triples_table WHERE subject IN ($small_set)
and:
SELECT FROM triples_table WHERE subject IN ($large_set)
To have a point where using the constraint by "subject" does not make
sense anymore, and it becomes faster to not access the triples_table
by providing many "subject" matches, but query everything altogether.
This is most noticeable on DESCRIBE queries due to the way the
tracker_triples table is joined, which makes them slow on large
datasets.
Since we can't tweak the tracker_triples xBestIndex method to pick
the best behavior here, and the tracker_triples behavior is desirable
on most other places, opt for tweaking the SQLite query planning in the
DESCRIBE query string, so the subject is made ineligible for lookups
in the tracker_triples table there.
This brings the "best" of both worlds for DESCRIBE queries, so the
fastest access method is used depending on whether it is asked to
describe a fixed set of resources or a generic query.
src/libtracker-data/tracker-sparql.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
---
diff --git a/src/libtracker-data/tracker-sparql.c b/src/libtracker-data/tracker-sparql.c
index e3c0694df..225554dc8 100644
--- a/src/libtracker-data/tracker-sparql.c
+++ b/src/libtracker-data/tracker-sparql.c
@@ -3360,6 +3360,7 @@ translate_DescribeQuery (TrackerSparql *sparql,
TrackerVariable *variable;
TrackerBinding *binding;
GList *resources = NULL, *l;
+ gboolean has_variables = FALSE;
gboolean glob = FALSE;
/* DescribeQuery ::= 'DESCRIBE' ( VarOrIri+ | '*' ) DatasetClause* WhereClause? SolutionModifier
@@ -3383,7 +3384,7 @@ translate_DescribeQuery (TrackerSparql *sparql,
_append_string (sparql, "WHERE ");
}
_append_string (sparql,
- "object IS NOT NULL AND subject IN (");
+ "object IS NOT NULL ");
if (_accept (sparql, RULE_TYPE_LITERAL, LITERAL_GLOB)) {
glob = TRUE;
@@ -3408,6 +3409,7 @@ translate_DescribeQuery (TrackerSparql *sparql,
variable = tracker_token_get_variable (&resource);
binding = tracker_variable_binding_new (variable, NULL, NULL);
+ has_variables = TRUE;
}
tracker_binding_set_data_type (binding, TRACKER_PROPERTY_TYPE_RESOURCE);
@@ -3418,6 +3420,25 @@ translate_DescribeQuery (TrackerSparql *sparql,
tracker_sparql_pop_context (sparql, FALSE);
}
+ if (has_variables) {
+ /* If we have variables, we will likely have a WHERE clause
+ * that will return a moderately large set of results.
+ *
+ * Since the turning point is not that far where it is faster
+ * to query everything from the triples table and filter later
+ * than querying row by row, we soon want the former if there
+ * is a WHERE pattern.
+ *
+ * To hint this to SQLite query planner, use the unary plus
+ * operator to disqualify the term from constraining an index,
+ * (https://www.sqlite.org/optoverview.html#disqualifying_where_clause_terms_using_unary_)
+ * which is exactly what we are meaning to do here.
+ */
+ _append_string (sparql, "AND +subject IN (");
+ } else {
+ _append_string (sparql, "AND subject IN (");
+ }
+
while (_check_in_rule (sparql, NAMED_RULE_DatasetClause))
_call_rule (sparql, NAMED_RULE_DatasetClause, error);
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]