Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json-meta:// store #9551

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ericson2314
Copy link
Member

@Ericson2314 Ericson2314 commented Dec 6, 2023

Motivation

Way back in c1a07f9 Nix switched to using SQLite, and for good reason: it makes a lot of operations much faster. But it is still useful to have a simple text file metadata alternative for a few niche use-cases:

  • Broadcasting: if one wants an store over NFS to a very large number of consumers in a pub-sub manner, it is disadvantageous for all writes to modify the the database file. Separate files per separate "rows" avoids synchronization

    • IPFS, the janker way: No SQLite, no .nar is a quick and dirty way to get a filesystem store representation that a tool like IPFS could mirror pretty well. Of course, I am personally fond of a deeper integration where we do things like

      • make references actually point to other valid path info objects
      • make valid path info objects actually point to the data they own

      by persisting the JSON into IPFS's native JSON representation rather than just JSON-in-a-file, but I suppose it is good to not let the perfect be the enemy of the good.

  • High Security Stores: It is hard to audit the contents of a SQLite database. Even if we can be sure that the SQL program only reads out good data, since it is an opaque binary format there may still be opportunities for stenography. More broadly, database performance is in fundamental tension with restricting to normal forms. A store that has everything in plane text is easy to hand-audit, and therefore better suited to be a secure (albeit slow) store for various purposes.

Separate from the feature itself, I also think this is a good exercise to disentangle a store being "local" from a store using SQLite. Having a second tiny implementation ensures we don't start "over fitting" to SQLite in various ways, e.g. encouraging factoring out the parts of LocalStore that don't have to do with SQLite.

This is a little toy store that stores ValidPathInfos and Realisations in JSON format in a separate directory. This is an on-disk format that is very easy to work with (easier than narinfo line format, I think).

Context

TODO tests, but this is a general problem for more store implementations that we need to solve properly once at for all. E.g. #9429 has the same issue.

Priorities

Add 👍 to pull requests you find important.

CC @RaitoBezarius @flokli @ryantm @danielfullmer

@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Dec 6, 2023
@edolstra
Copy link
Member

edolstra commented Dec 6, 2023

Before Nix used SQLite for the Nix store (and after it used Berkeley DB), it used flat files to store metadata (b0e92f6). Apart from performance and disk space overhead, the main problem was ensuring transactional semantics for the referrers mapping (especially needed for garbage collection). But that might not be a problem for some use cases.

High Security Stores: It is hard to audit the contents of a SQLite database.

I'm not convinced by this argument, since SQLite is one of the most-used and best-tested pieces of software out there. Certainly better tested than an ad hoc metadata store, even if it contains JSON.

@Ericson2314
Copy link
Member Author

The main problem was ensuring transactional semantics for the referrers mapping (especially needed for garbage collection

Yes, I didn't implement any of that, on purpose. References are just stored "forward" as part of the valid path info JSON. This makes adding new store objects easy / well isolated, and everything else a pain in the ass :). Very intentional!

I'm not convinced by this argument, since SQLite is one of the most-used and best-tested pieces of software out there. Certainly better tested than an ad hoc metadata store, even if it contains JSON.

Yes no hate against SQLite, it is fantastic software. My argument is not that SQLite could be better, but that a higher performance database of that sort must unavoidably sacrifice having a nice normal form. This however does have a normal form (packaged json, in order, assuming we didn't screw anything up). That has some nice problems SQLite can not have.

(An in-kernel SQLite, where it was impossible to see the underlying bytes but just use the abstract interface, would also help. Maybe someday we'll have Nix on SQL-supporting mainframes and can do thing that way. But this also just kicks the can down to "what is the best way to have secure on-disk representations of kernel data structures", which is a question the likes of dm-verity, https://github.com/project-machine/puzzlefs, squash-fs, etc. are all trying to answer.)

@roberth
Copy link
Member

roberth commented Dec 6, 2023

  • We have 20 SQLiteStmt instances in libstore, so factoring out a persistence layer does not seem far fetched.

  • When you're already doing network file systems, having a proper database server might not be a bad idea?

@Ericson2314
Copy link
Member Author

factoring out a persistence layer does not seem far fetched.

I am hoping to get there little by little via the various "shuffle around the Store hierarchy" projects I have in flight. :)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-12-08-nix-team-meeting-minutes-110/36721/1

@arianvp
Copy link
Member

arianvp commented Dec 11, 2023

This sounds similar to me do the narinfo without nar store idea that has been floated around. What is the benefit of this over that?

@Ericson2314
Copy link
Member Author

Seems pretty much the same? I don't like the narinfo format and rather use JSON, but that's small potatos.

@arianvp
Copy link
Member

arianvp commented Dec 14, 2023

Cool. I guess a "benefit" of reusing .narinfo is that you could have an S3 bucket that is both a binary cache store and a file-store by both having a <hash-a>.nar file and a <hash-a>/ directory accompanied by a <hash-a>.narinfo file. Then you could use that bucket either as a substitutor or as a store (e.g. by using Mountpoint for S3).

@Ericson2314
Copy link
Member Author

Ericson2314 commented Dec 14, 2023

@arianvp But I would like to have binary caches also use the JSON format :). We can upload both to binary caches for backwards compat.

For example, at some point I need to propose what I think is a stronger version of the narHash field for checking not just the file system object closure of a single store object, but an entire store object closure. This would involve giving every reference a hash, so we have a "store path -> hash" map for the references. With the current narinfo format I would be just making up a new syntax, with JSON it's already provided for me.

@edolstra
Copy link
Member

Yes, we should allow .narinfo files to be JSON. I.e. if the first character is {, then parse it as a JSON object.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Dec 14, 2023

That works too, if we are OK with suddenly cutting off old Nix from new objects :)

(or rather the read side can be flexible like that, but the write side can do two files. Postel's law type stuff.)

@roberth
Copy link
Member

roberth commented Dec 14, 2023

"narinfo" is not descriptive. It contains a bit of nar "info", such as the hash, but the rest is store object info, such as name and references, binary cache info such as file location, and realisation info such as deriver and signatures (if that's not its own category).

Query behavior can be modified in the nix-cache-info file, so client behavior can be guided if needed - e.g. to make it query only one of the two possible files.

we should allow .narinfo files to be JSON.

This can't be the main mechanism, because it's not compatible with existing Nix versions. We should have a transition period where both formats are available. A new file extension helps with that.

Wouldn't hurt to have docs (EDIT: of narinfo and the binary cache protocol at the very least) before considering any of this, so everyone can have a good understanding of the domain.

EDIT: Consider HTTP Accept:, although the protocol should function quite well with a simple server.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Dec 14, 2023

draft #9348 adds some docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
store Issues and pull requests concerning the Nix store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants