Hi Nick,
Thanks for reaching out, great question. When it comes to `np.where` and other numpy functionalities, there's a lot of room for improvement in Modin. With the recent updates to numpy, we can implement native versions of these and have them run just as fast. The performance degradation you're seeing is the result of us converting the object to a numpy array, then back to a distributed Series. This takes a long time because we're effectively collecting all of the data, merging it, doing the operation (np.where), then resplitting the data. Each of these has a high overhead, which causes the runtime to explode. Thanks for posting this question, feel free to reach out with any others!
Devin