"Screens" are the physical display devices that are connected to your computer. You have one screen.
"Surfaces" are imaginary and unlimited. They basically correspond conceptually to the things that you're projecting onto. You can have as many as you want. So each leg, the cyc, the cape, etc can all be different "surfaces." You make surfaces in the sideo settings. You can use "shutters" (drag from the edge of the video proxy in the surface settings) to make rectangular surfaces within the throw of your projector and/or use "masks" which are arbitrary b&w image files that block out part of the projector's throw to narrow it down to a certain shape. These can have soft edges as well. (Note that with the current implementation, a mask on any surface that's in use will block out video on all other surfaces as well. I think this might be changing in a future update …?)
So, you have:
— legs: totally doable as surfaces in QLab
— cyc: easy, just make a surface
— flat: easy, just make a surface
— "about 20 boxes that are being held by actors": uh … with QLab? … not so easy. I would not attempt this with QLab. Now you're into the realm of Isadora or MadMapper etc. You probably want to do something like put infrared LEDs on the boxes and track their movement in real time, right? QLab will NOT do that for you.
— two walls made out of boxes: easy, make them surfaces
— a cape: pretty easy to do as a surface with a custom mask, unless the actor moves a lot
— actors' bodies: depending how much they move and how precise you need to be, this is also iffy as far as QLab goes. Maybe doable? But see note on boxes.
Does this help?