The goal in the task is to estimate a few probabilities that we use to derive the two measures. In general, the more trials you have to base that probability on, the better (the larger your denominator and the more resolution you have). So, if you have only 10 possible lures to potentially respond "similar" on, you have a coarse estimate (0.1, 0.2 ... ) while if you have 20 you have a finer one (0.05, 0.1, 0.15, ...). The truth of the person's ability, however, isn't changed by how you're estimating it. If the truth is a value of 0.14 and you have 10 trials, you'll come up with 0.1. If you have 20 trials, you'll come up with 0.15 as your estimated probability if we factor out basic noise, so you'll be more accurate with more trials. But, toss in a touch of noise and say run this on 100 people, all of whom have a true value of 0.14 and average the results from both the short and longer sets and you'll end up with estimates of ~0.14 in both (the low set will have something like 60 people reading out at 0.1 and 40 people reading out at 0.2, averaging out to your 0.14).
All that is a long way, but hopefully a valuable way, of getting to the idea that adding in extra trials won't affect the actual LDI number you get (until you get to very very small sample sizes where working memory becomes more of an issue). We ran this with 16-64 items / condition in one of the papers. More trials just reduce your quantization error, but do so at the cost of a longer experiment. At some point, you hit diminishing returns. FWIW, I've done a bit of analysis on that and have some scripts if you really want to simulate / look at the effects. But, don't worry about mixing the set sizes -- just pay attention to counterbalancing the actual stimulus sets used.
Craig