Hello,
I am looking for advice on efficiently constructing a forest of tree-structures where each object/row has the following format:
class Node(Base):
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('parent.id'))
parent = relationship('Node', back_populates='children')
children = relationship('Node')
The primary database target is Postgres but I would also like to stay compatible with sqlite if feasible.
At the moment, my sample table is relatively small (less than 500 rows) so I experimented with the following approach:
- Fetch all rows
- Create a list of all root nodes (those without a parent_id)
- Recursively attempt to insert all rows into the forest/trees
I foresee the following issues:
- It is an expensive query.
- It will not scale well.
- I don't foresee any easy way to fetch multiple trees, or multiple sub-trees, without running this entire query+algorithm.
- I have to convert from SQL Alchemy objects to dictionaries, in order to manually populate the "children" lists.
Here are the types of queries I hope to more efficiently perform:
- Select the entire "forest". Basically, `select all` with preloading.
- Select a whole tree based on a child node multiple levels deep.
- Select a sub-tree where a given node will become the root.
- Select multiple trees based on multiple child odes.
I am trying to avoid double-linked or other complex data-structures where managing the state becomes more complicated and error-prone.
If anyone can give me pointers or recommendations, I would appreciate it.
Thanks!