Hello Vachan,
What you are describing is very similar to Hesthaven's framework of build matrix operators and apply them to your degrees of freedom. The main advantage of this operator form is to have a common mass (if linear elements), differentiation, flux, and boundary operators for all your elements since the operators are defined on the reference element. Therefore, using MeshWorker loop to go through every single element to build those matrices would result in building redundant local matrices and storing them in a global matrix. Can be very expensive memory wise.
Yes, you could use MeshWorker to compute only your normal fluxes.
So, it's feasible, but I wouldn't use the MeshWorker loop to build those matrices . You are welcome to look at step-33 to see how you can loop through your elements, and then build your different mass matrices. Then use a similar loop to apply your, interpolation, differentiation, lifting matrices on the local solution to assemble your right-hand side.
Yes, if you build those operators, you will save a significant amount of time for explicit time stepping since you will basically have concatenated multiple operators together. Interpolation, differentiation, integration, etc. into a single matrix-vector product.
I think know what you are referring to with that sparse flux matrix. If it's a local matrix, you would want to store the lifting operator, which is the mass inverse times your sparse flux matrix, which is not necessarily a sparse matrix (unless mass is diagonal). Either way, I wouldn't worry too much about that operation taking a long time. Your other operations on the volume take much longer, and if you decide to go with nonlinear fluxes, the time it takes to apply this lifted flux vector is meaningless. deal.II has a SparseMatrix class, in case you want to use it, and you can provide the set the SparsityPattern if you want.
If you were planning on building multiple global matrices, then yes, your approach makes sense using SparseMatrix; be careful about memory. If you only plan on doing linear advection, your problems should be simple enough that you wouldn't care about memory nor computational time. I haven't seen people build those global matrices, but it should be faster than assembling every loop. It's the old trade-off between memory and computation.
Doug