-
How to model data in XTDB? Example of Social Network data model. Vertex Edegs In this example we have persons who can be members of groups and can have relationships among them like friendship, or follower. Person A is friend of Person B and Person C. Of course till the graph is small there is no problem in representing data, but if the complexity increase I think it's necessary to find the best way to do it.
The second way could be a little different creating a document for every relationship
This way I have a vector of edges id. The problem of writing a new version of both Person friends remains.
This way I have a document representing a Person linked to a document containing the list of friendships that links to every friendship node (representing and edge if this were a graph). Adding a new friendship, I have to create a friendship node and update the two friendship lists owned by the two friends but I don't have to update the two persons documents. Is one of these models the best one to represent this kind of data in XTDB or there is a different way? Thanks a lot |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Hi @franz65 We have an article which discusses the philosophy of "records" at a very high level. We have 3 or 4 more of these articles pending but I haven't found the days required to sit down and write them yet. 😅 https://xtdb.com/articles/strength-of-the-record.html The most hand-wavy answer to your question would be "don't nest too deeply" or, if flipped around, "try to keep your documents reasonably flat." These are thumb-rules, but xtdb 1.x strongly supports this suggestion by not indexing beyond the root keys for any given document. Deeply-nested documents are permitted, but not recommended. The "Strength of the Record" article references a post written by Sarah Mei, assertively titled "Why You Should Never Use MongoDB": http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ Sarah's article addresses your question even more directly. If you start nesting "friendships" within the documents themselves, you have easy access to the data, but no way to traverse the graph. With that in mind, the first solution seems sub-optimal. The second solution is probably the route you want to take, though it comes in different flavours. You do not need to model edges explicitly in xtdb and, initially, it might save you some grief not to do so. Instead, you can track a vector of friend ids on any given Person directly. Assuming you are querying with EDN Datalog, traversing these relationships is then quite easy (and direct). Modeling relationships (such as friendships) as documents themselves is only necessary if you want to add (meta-)data to the edges themselves. "Friendship date" seems a bit contrived, but you might want to add weights to edges to alter "shape" of the graph. If you are confident you want this extra data on edges, it makes sense to record them as documents, as in your second example. The third solution feels like excessive denormalization. Unless you really want to avoid updating Person records when friendships change (and I would question, "why?") adding another layer of documents might just make all your queries unnecessarily awkward. In the past, we've recommended this kind of pseudo-denormalization for examples like "Likes" or "Clicks." If a user creates a Post and the Post is represented as its own record, you don't want to update the Post 100,000 times if it receives 100,000 Likes. In that case, a separate document to track Likes makes sense. But friendships/relationships are unlikely to accumulate at that sort of order-of-magnitude and — especially initially — it may be much easier to record them as fields on your Person records. Caveat Regarding Software Life CyclesI keep saying "initially." There is the very real possibility someone reading this discussion thinks to herself "but I'm migrating a legacy database" or "I already know what my records look like." Fair enough. None of these are hard and fast rules. XTDB can even be used as an exploratory data store for unstructured data, which pretty quickly breaks the "don't nest too deeply" thumb-rule. In a throw-away exploratory or analytics db, it might make complete sense to do
and then start playing around with it, teasing it into pieces, querying over the pieces, and reassembling them as you go. The earlier in the life cycle of your project, the more carefree I would encourage you to be. It's been hard for me to break out of my old habits, personally. Part of my brain still wants to set up my schema migrations and schema-on-write tools before I put my first record in the database. Of course, that doesn't make a lot of sense if I don't know what my data shapes look like yet. Similarly, once you've broken out of the early phases of a project, you may want to return to your existing data and migrate it into a new shape — perhaps a more complicated one, where a collection of "Friendships" is separate from a "Person." Thanks to the absence of schema-on-write, xtdb encourages you to store the data you have rather than the data you think you'll need. Thanks to immutable records, you can always access your old data based on older (simpler) models. If and when you decide to migrate to a new schema, your old data is still there, saved forever in Hope that makes sense — and I hope it helps! Have fun playing with xtdb. (And I do mean that literally: try to play before you build. XT is a lot of fun! 😃) |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot @deobald. |
Beta Was this translation helpful? Give feedback.
-
@deobald thanks for the thorough explanation, it helps a lot 👍
But according to :
So shouldn't a simple query on the relationship nodes (knowing '{:find [x]
:where [[i :friend/1 "uuid of person A"]
[i :friend/2 x]]} |
Beta Was this translation helpful? Give feedback.
Thanks a lot @deobald.
I had read the articles you cited, that's why I had already decided not to use the first solution. I proposed it there only because I think that for a small dataset it can still be useful.
As you supposed I was going in the direction of the second solution, but I wanted to have some confirmation from someone more skilled in the field.
I used relational databases and a little bit of graph databases but I'm totally new to document database and to xtdb.
Thanks again for your answer.