Is Firebase Realtime Database a good fit for my project?

275 views
Skip to first unread message

Isaac Kriegman

unread,
Sep 12, 2022, 12:25:50 PM9/12/22
to Firebase Google Group

I’m building a web application that will allow users to collaborate to build a giant graph together (with a front end UI like this, https://reactflow.dev/).  I’m considering using the Realtime Database as the main database so that users can immediately see changes that other users are making to the part of the graph they are currently viewing/editing (hopefully, this will prevent many conflicting edits).

For the most part, the graph will be open to the world for anyone with a verified email address to edit (in the spirit of Wikipedia) and so will be vulnerable to vandalism.  For that reason, it's critical that everything be carefully versioned, and that the database be resilient to malicious intentional corruption.

The way I’m planning to structure this in the Realtime Database is to have a list of graph nodes.  Each node will contain a list of the nodes it is connected to.  Web clients will have permissions to create and update nodes in this node list.

Whenever a web client creates or updates a node in the node list, it will trigger a database function that will record the latest version of the node in a separate list of node versions.  This node version list will not be writable by the web app clients, only readable.  This way the version information cannot be maliciously corrupted, and it will always be possible to roll back malicious changes to the nodes themselves using the version information.

From my reading of the Realtime Database documentation, it seems like using triggers to call functions to record version information should be fairly straight forward (but I’m not entirely sure).  

On the other hand based on reading the documentation, it’s not clear to me that enforcing database validation will be straightforward. Whenever a link is created between two nodes, there needs to be an update to BOTH nodes updating their lists of nodes they are linked to.  Whenever a link is deleted between two nodes, BOTH nodes must also be updated to remove the link to the other.  Is it possible to write validation rules that would enforce the requirement that both nodes are added or deleted to their respective node link lists simultaneously?

Overall, does this structure make sense?  Any obvious problems with the way I'm thinking about the implementation?  Does it sound like Realtime Database will be a good fit for this project?


Nathan

unread,
Sep 22, 2022, 11:44:08 PM9/22/22
to Firebase Google Group
This is a very interesting problem! I'm not an expert on Firebase, but have a background in security engineering, so this piqued my interest.

I do not believe it is possible to enforce both related nodes are updated, using validation rules. You would need to hide an update behind a cloud function call (e.g., calling a function connectNodes). That has a certain conceptual clarity to it, but it's slower, and the costs could add up quickly. Connections would take a second to a few seconds to make, so you'd probably need to maintain client side state of the change to make it appear instant while the actual data change gets written.

As for version info, using cloud functions triggers is probably the cleanest and most secure. You can listen to changes in the DB, and record the delta as an event (so you can reconstruct it from a stream of changes), or you can just copy the old state of the object for each change (easier to rollback w/o replaying changes, but more data usage). This will result in a cloud function invocation on each change, so be aware this could end up being pricey. Alternatively, having version info reported by the client COULD work (as well behaving clients would create accurate historical records to rollback to), but you'd need to filter out bad or missing records from Bad Actors, which would be quite difficult (and potentially impossible depending on the use cases/architecture).

Going back to enforcing mirroring changes to both nodes when making a connection, that's honestly a really difficult problem. Instead of having connections be attributes on the nodes themselves (essentially pointing to each-other), you could have a relations object that tracks the relations between all nodes-- so adding or removing a connection would involve changing this master list of node relations. The client would read that, along with the individual node information, to build the graph. That way, each connection has a single source of truth (and you could specify bidirectional or unidirectional connections as well). Many-to-one would be represented by multiple connection entries. Not quite sure of the best way to represent that in RTDB, as it doesn't deal with arrays super well. Maybe an object, with an id being [fromnode-tonode-direction], like "node1-node7-bidirectional": true as an object entry?

Alternatively, if you do want the connections to live on the nodes, you could just built in tolerance in interpreting the data. If a connection exists on each node, that means there's a bidirectional connection. If it only exists on one, it's a unidirectional connection. Alternatively, you could have the connection be represented on both nodes (with directionality included in the entry), and you can just discard a connection if you don't see it on both-- so a Bad Actor who wrote data to only wouldn't cause any issues.

Personally, I'd go with a master list of relations, seems easiest to manage.

If you ever care to share this project's website, I'd be quite interested in seeing it!
Message has been deleted

Andreas B

unread,
Sep 23, 2022, 6:00:55 AM9/23/22
to Firebase Google Group
I agree with Nathan, separating edge and node data seems like a good idea to allow better validation. However, it is not completely obvious how one would prevent both keys "node1-node7-bidirectional" and "node7-node-1-bidirectional" from existing at the same time. If both are allowed to exist, this could lead to problems later, if one but not the other gets deleted.

Instead, what about implementing this graph as directed right from the start?

edges: // contains all edge data
  nodeID359.nodeID1701: // edge IDs are creating by concatenating node IDs in order (from, to).
    fromNode = nodeID359
    toNode = nodeID1701
nodes: // contains all node data
  nodeID359: //this could just be random IDs using push()
    ...
  nodeID1701:
    ...

To retrieve edges going out from a certain node, you could orderByKey() and equalTo() to filter: https://firebase.google.com/docs/database/android/lists-of-data

Validation would then include:
- checking that all edge data has a fromNode and a toNode, and nothing else
- that the concatenated values of fromNode and toNode are the same as the edge ID we're writing.

Isaac Kriegman

unread,
Sep 27, 2022, 2:03:56 PM9/27/22
to Firebase Google Group
Thanks for your suggestions.  I'm starting to think that maybe Firestore would be a better fit, because it allows indexing on fields, which would make it easier to traverse the graph in both directions without keeping a reference in both nodes connected by an edge.  

Also, I hadn't been considering the cost of cloud functions.  I'd assumed the cost a cloud function doing version housekeeping would be negligible because the computation involved would be negligible.  I'll look into how cloud function pricing works.

Thanks for your suggestions!  Still a little unclear to me whether even Firestore would be a proper fit for this project, or whether I really need a full server.

Isaac Kriegman

unread,
Sep 27, 2022, 3:57:08 PM9/27/22
to Firebase Google Group
I think this is the right pricing page for Firestore functions:


Seems like the price for this type of function (merely recording some version information in the Firestore database whenever a document is updated) would be pretty negligible.  Would probably need many thousands of active users before I even need to graduate from the free tier.  But, if I'm misunderstanding how the pricing works, please someone let me know!

JP Ventura

unread,
Sep 27, 2022, 5:34:04 PM9/27/22
to fireba...@googlegroups.com
TL; DR

I will +1, 0 or -1 if IHMO Firebase Realtime Database will play helping, do nothing, or against your product

CHECKLIST
  • Realtime two-way databinding
  • Information graph structuring
  • Authenticated and authorized fine control
  • Node audit trail
On Mon, Sep 12, 2022 at 1:25 PM Isaac Kriegman <zackr...@gmail.com> wrote:

I’m building a web application that will allow users to collaborate to build a giant graph together (with a front end UI like this, https://reactflow.dev/).  I’m considering using the Realtime Database as the main database so that users can immediately see changes that other users are making to the part of the graph they are currently viewing/editing (hopefully, this will prevent many conflicting edits).

+1 

For the most part, the graph will be open to the world for anyone with a verified email address to edit (in the spirit of Wikipedia) and so will be vulnerable to vandalism.  For that reason, it's critical that everything be carefully versioned, and that the database be resilient to malicious intentional corruption.

-1 

The way I’m planning to structure this in the Realtime Database is to have a list of graph nodes.  Each node will contain a list of the nodes it is connected to.  Web clients will have permissions to create and update nodes in this node list.

Whenever a web client creates or updates a node in the node list, it will trigger a database function that will record the latest version of the node in a separate list of node versions.  This node version list will not be writable by the web app clients, only readable.  This way the version information cannot be maliciously corrupted, and it will always be possible to roll back malicious changes to the nodes themselves using the version information.

-1 

From my reading of the Realtime Database documentation, it seems like using triggers to call functions to record version information should be fairly straight forward (but I’m not entirely sure). 

+1 

 

On the other hand based on reading the documentation, it’s not clear to me that enforcing database validation will be straightforward. Whenever a link is created between two nodes, there needs to be an update to BOTH nodes updating their lists of nodes they are linked to.  Whenever a link is deleted between two nodes, BOTH nodes must also be updated to remove the link to the other.  Is it possible to write validation rules that would enforce the requirement that both nodes are added or deleted to their respective node link lists simultaneously?

-1 

Overall, does this structure make sense?  Any obvious problems with the way I'm thinking about the implementation?  Does it sound like Realtime Database will be a good fit for this project?

NO

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/92904be7-a8ec-4a5e-96ea-f24793a862cbn%40googlegroups.com.


--
JP Ventura
Software Architect - University of Campinas
https://br.linkedin.com/in/jpventura

Computer games don't affect kids.
I mean if Pac-Man affected us as kids, we'd all
be running around darkened rooms, munching magic
pills and listening to repetitive electronic music.

Kristian Wilson, Nintendo, Inc. 1989.

=============================================

WARNING
This message is exclusively destined for the people to whom it is directed, and it can bear private and/or legally exceptional information.

If you are not addressed to this message, you are advised to not release, copy, distribute, check or, otherwise, use the information contained in this message. 
Sharing it without parties permission is not only illegal, but also CCPA, GDPR, and LGPD violations.

If you received this message by mistake, we ask you to return this email, making possible, as soon as possible, the elimination of its contents of your database, registrations or controls system.

The message that bears any mandatory links, issued by someone who has no representation powers, shall be null or void.

JP Ventura

unread,
Sep 27, 2022, 6:08:47 PM9/27/22
to fireba...@googlegroups.com
TL; DR

I will assume the scenario where suddenly you have 100K customers (an excellent problem 😉)

It would be better if you double-check the features your application requires. As far as I could tell, they are:
  1. Realtime two-way databinding
  2. Information graph structuring
  3. Authenticated and authorized fine control
  4. Node audit trail
Please let us know if the addressed above features cover your questions

DESCRIPTION

REALTIME TWO-WAY DATABINDING


In the near future, basic database operations will become cumbersome and it is always tempting to create bloated nodes with thousands of listeners (or many lambdas to keep smaller nodes that will be binded in realtime.

I strongly suggest you use RxJS (or) because it really simplifies data manipulation (in both Firebase and Firestore), because:
  • Chaining monad operations becomes trivial
  • Subscriptors and Observables are a natural listener partner
INFORMATION GRAPH STRUCTURING

Take a look at Structure your Data in Firebase and and 6 Rules of Thumb for MongoDB Schema. There is no silver bullet, thus no one but you, Christopher, Moritz, and John will create the better solution to your scenario.

AUTHENTICATED AND AUTHORIZED FINE CONTROL

If I have the swing vote, username/password is never an option. I always force the users to use some social network authentication, otherwise phasing out of Firebase Auth becomes quite tricky.

NODE AUDIT TRAIL

This is the feature in your platform that is not clear to me. It seems like you are trying to code a git log (or an audit trail) relying purely on Firebase.

It is possible (and I have worked in a similar project some years ago), but you will be coding something that comes out-of-the-box if you use Datomic. 

I always wished Google Cloud provided a chronological database like Datomic as a feature on Google Firebase Datastore.

Isaac Kriegman

unread,
Sep 28, 2022, 9:57:14 AM9/28/22
to Firebase Google Group
Thanks.  I think that Firestore's ability to index on fields will make it so I don't need two way references.  Instead, I can keep references in one direction and use an indexed query to go in the other direction.  This will make it easier to enforce data validity and prevent malicious users from corrupting the database.

And, based on the cloud function pricing I look at, I THINK (but am not entirely sure) that recording version information will be inexpensive as well.

Isaac Kriegman

unread,
Oct 7, 2022, 1:20:58 PM10/7/22
to Firebase Google Group
Update:  it looks like Firestore security rules can probably do it all.  They are very flexible.  I think they can even force clients to maintain an accurate version history whenever the client adds/removes/deletes nodes, so I won't need to use cloud functions for that after all.  I'll probably still need to use cloud functions for some parts of my project, but looks like maintaining database integrity and versioning can be done purely with security rules.
Reply all
Reply to author
Forward
0 new messages