The first step in any posit square root routine is to return NaR if the MSB is a 1 (which takes care of all negative inputs and NaR input), and then return 0 if the input is 0.
The next step is to decode the scaling factor from the regime and exponent, and do argument reduction either to [1, 2) or [2, 4), producing the scale factor of the square root while you're at it. I do not see how you can land on the edge of a regime. Mitch, can you provide an example where this happens, ideally in low precision so it's easier to read?
Three iterations of Newton-Raphson using the starting piecewise linear function in SoftPosit should provide about 70 correct bits, more than enough to round correctly to 60 bits. Correct rounding is usually accomplished using the Tuckerman test or "Tuckerman rounding" in the case where the rounding lands on the tie point between two representable values. It requires two multiplies and comparison tests. The SoftPosit method uses the technique developed by John Hauser for SoftFloat, that keeps the errors all positive or all negative after each Newton-Raphson iteration, and then you only need one multiply and comparison to break the tie.
It is not necessary to have 4 more bits than what you're going to round to… but the more bits you have, the less chance that you land on a tie case and have to invoke the Tuckerman test, so that improves speed. While Cerlane wrote almost all of SoftPosit, I did supply the square root routines for her library, and I'm pretty sure that for the 32-bit posits I got 31 bits of accuracy after two Newton-Raphson iterations, just 3 bits more than the 28 bits needed in general. And it works.