Re: [RFC] use reflinks to dedup files with the same content



On Tue, Jan 30, 2018, at 11:41 AM, Giuseppe Scrivano wrote:
Hi,

I'd like to take advantage of reflinks where possible so that
deduplication can be achieved also with files that differ only for their
xattrs.

Right.  One other use case that came up when we were chatting
about this earlier was that if we know we have have reflinks, in
the build system case (including e.g. rpm-ostree client side layering)
we can stop using rofiles-fuse and do reflinks instead.  This wouldn't
be too hard to implement, we'd just need a way to return a "reflinks
worked" bit to users of ostree_repo_checkout() or so.  Or maybe
an API to test beforehand?  That way they can know to fall back to
rofiles-fuse.

The use case I have in mind is that we will be able to dedup files
coming from a container image that are already present in the ostree
repository but with a different SELinux label.

Right though...how often does that happen?  If we apply the same
SELinux labeling policy on import of containers as we use for the
host, we'll get this for free right?  BTW this is also
https://github.com/flatpak/flatpak/issues/927
 
What do you think?  How to store the "content checksum"?

This gets into the fact that our "exploded files" model for local
metadata isn't great.  We should have something like git's packfiles
just for metadata.

I guess a big question here is; do we want to try to add the content
checksum to *upstream* metadata?  I could even imagine doing
this by default for OCI images we build as part of Project Atomic.
Though of course that gets into a discussion there about new metadata
formats and fixing the OCI checksum stability problem.

As far as storing the content checksum we compute locally....IMO a
feature of libostree today is that the storage is so...simple.  It's easy to
understand and maintain.  As soon as we involve indexed packfiles or
sqlite or whatever things get less simple.   I think we'd need to do some
research in this area.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]