When this was done last, to what extent were small vs. large structures tested? My assumption would be that the array approach would be far more performant for smaller databases (which I'd also guess would cover the majority of Blockly projects), but that the tree structure (depending on implementation) could be far more performant on very large projects. The other aspect I'm wondering about is what critical paths need optimizing for both of these categories of use cases.
The TypeScript migration seems very unlikely to me to affect this. A very quick comparison of the last JS version and the compiled JS of the latest tip-of-tree TS version of the file are almost identical, but particularly the data structure handling and searching look basically the same.
As far as I can tell (I'm far from an expert here), there are three main operations exposed by the database:
- Tracking connections (adding/removing) -- O(n) worst case if there's a full array copy needed or if the first element is removed, but adding should be typically O(log n) for the index search.
- Retrieving neighbors -- somewhere between O(n) and O(log n) depending on how large the radius is (since it could conceivably iterate the entire database).
- Searching for closest -- same as retrieving neighbors -- somewhere between O(n) and O(log n) depending on radius.
What's interesting is looking at the callsites here:
- Moving a block would presumably result in a bunch of connection movements which would then introduce a bunch of add/removes in the database. I can imagine that for block dragging in particular this is happening a LOT and could potentially be slow if there's a very large database. That being said, the database wouldn't be growing so adding and removing should very much stay O(log n). I'd be surprised at a BST improving this performance.
- Neighbor retrieval is used as part of bumping blocks. I actually suspect this is a major likelihood of performance since it can cascade pretty significantly: bumping a block recursively bumps its neighbors, and each block bump will fetch all neighbors for all connections (if I'm following the logic correctly), and each fetch is probably O(log n). This gets tricky because a more complex bump (i.e. with a lot more blocks) would scale against the total number of connections in the workspace. This could be a lot of operations.
- Searching for closest has some operations (particularly with new keyboard navigation behaviors) but these don't seem like they'd scale poorly since they're very much one-off and specific to a block rather than a group of blocks or many updates like the other two.
Given that the original post was on dragging my bet is on retrieving neighbors being the bottleneck here. However, given that the search should already be O(log n) I'm surprised that AVL scaling actually makes a difference.
I can't speak for the broader team here but I think filing an issue on the issue tracker along with your suggested fix could be useful to look at even if we might not have capacity to try and fix this in the near term.
Ben