I believe that you're right about the problem with clipping the large D8 raster. By clipping, there is likely no "0" to indicate no downslope cell which (I believe, with no Rust experience!) "FindMainStem" relies on that
condition.
I'm assuming you're extracting streams with the same D8FA threshold? It is likely the clip causes some edge issues and that the re-generated D8 now has multiple "0" values to start from in your subwatershed. Sorry I can't help more on that - it could be a number of reasons.
One potential solution is to clip the larger D8, D8FA, and streams by the subwatershed boundary. Assuming your subwatershed has one outlet and that the outlet has the highest flow accumulation, you could reclassify the clipped D8FA max to 0 and all else to 1. Then you could multiply this by the clipped D8 to make sure your outlet point D8 is 0. There is no need to re-run D8 or D8FA or extract streams this way.
Another potential solution would be to expand your subwatershed boundary by ~xx number of cells or distance. Then you can clip the DEM, run D8, run D8FA, extract streams, run FindMainStem tool, then clip by the sub watershed boundary. This might be able to replicate the results of the larger DEM procedure. This might eliminate the edge effects you are seeing and send those problems over to the edge of the expanded subwatershed boundary.