On Wed, 2016-12-21 at 14:29 +0100, Carlos Garnacho wrote: Hello Carlos,
Personally think some sort of reference counting will be needed for anonymous nodes references by different graphs..It's not as much reference counting as it is a "weak pointer" what we want here, no? i.e. we want to know whether the reference count dropped to 0 anywhere else, and drop the last bit of info in that case.
That's correct. Full reference counting isn't strictly necessary, just what a weak pointer provides should be sufficient. Reference counting in a column like Resource table would be a possible implementation.
And that seems precisely the info we can obtain from foreign keys with RESTRICT action, deleting the parent key would fail unless it's no longer referenced.
Right.
I also don't think it's a problem to rid ourselves of orphaned anonymous nodes. Without a graph to be owned by, they can't be referenced other than by their uuid anyway.Right, good that you confirm :). I've been playing further with foreign keys, and I think there are two possible ways to properly maintain the Resource table: 1) Doing immediate deletes, requires: a) all rowid columns in class tables to set a foreign key on Resource.ID with ON DELETE CASCADE b) all property columns of type "resource" to set a foreign key on Resource.ID with ON DELETE SET NULL c) Triggering deletion on the Resource table when a resource is to be deleted.
ok
2) Doing deferred deletes, requires: a) all rowid columns in class tables to set a foreign key on Resource.ID with ON DELETE RESTRICT b) all property columns of type "resource" to set a foreign key on Resource.ID with ON DELETE RESTRICT c) running a maintenance task that tries to harvest no longer used resources, which past our initial heuristics (eg. checking it's not in rdfs:Resource table) would still fail if referenced somewhere.
ok
(FWIW, I'm omitting *:graph-id columns management)
ok
#1 allows this broken node harvesting to happen immediately, while #2 still relies on the user being educated enough for the harvesting to be effective. It might be an option though to use ON DELETE CASCADE on 2.b so the references pointing to this node are cut, and we get the good effects of #1 when nodes are harvested. But both fail pretty bad on step b... there's many checks going on in child tables each time a Resource table row is deleted, and those happen on columns that don't have an index. sqlite docs [1] recommend that an index on the child table column(s) holding the foreign key is created. Given our current ontology, that means adding some 140 extra indexes, as each one requires at least an extra page [2], that already means ~0.5MB extra on a database created from cold start, roughly 1/6th of the current size.
ok
Without indexes, the required checks get quite noticeable esp. on mass deletes (eg. tracker reset -f ~/ ), not as much for spare delete ops. And I guess this is where #2 wins, it at least could be made to split the heavy ops in non-noticeable chunks without cramming everything with indexes, at the expense of giving up on timing promises. I need testing with further nodes though, as I'd expect things to get linearly worse with data, both in size with the additional indexes and performance w/o indexes. I kinda expected we'd have to pick our poison here though :).
Might be that just counting references and incrementing them in a column in the Resource table is cheaper than letting SQLite do all the extra required indexes? But right. Without testing, it's hard to know. And if SQLite does it for us with CASCADING Delete and RESTRICT: that's less code for us write. Philip
Attachment:
signature.asc
Description: This is a digitally signed message part