Hi Maoting,
Thanks for your interest. Yes, building all 16 samples together with grouping according to experimental groups is the way to go in this case. This ensures the LSV definitions (including LSV ID strings, junctions in the splice graph, ordering of junctions in the output) will be the same in the downstream outputs (when they make it to the outputs, see below) and are thus more easily comparable.
To decide which LSVs are in the data and which junctions are associated with each LSV there are filtering heuristics applied at the build stage (is the LSV/junction reliably detected in at least one group) and the quantification stage (is the LSV quantifiable in this group (PSI) or in both groups (dPSI or HET)). The reliable thresholds just have to be met in at least one group for that LSV or de novo junction to be a part of the splicegraph. If the LSV was not reliably detected in other groups, when you run quantifications on that LSV that involve that other group, the LSV will not be output there, although it still exists in the splicegraph and has the same junctions, exon boundaries, etc.
In the dPSI quantification step the quantifiable filters must be met in both groups being compared for the LSV to make it to the output. You could run PSI on each experimental group separately if you want quantifications for every LSV in that group or capture any that are unique to a specific group. In my experience the heuristics and filters help clean out low read coverage events, but if you are really interested in detecting and quantifying everything even with low read evidence you can consider altering these filters or changing the minimum number of experiments you need for the LSV to be reliably detected at the build stage. By default this is 51% of your samples (so it would be 3 out of 4), so you could consider changing this to 2 or even 1 sample if you want to be maximally permissive.
Hope this helps and feel free to follow up if anything was unclear.
All the best,
Matt Gazzara