I think there's only two things that will help set you straight:
1. AudioKit works on a pull model. Nodes only generated samples when they are requested from something down the line. So, the output must be set to the mic so that the mic can deliver something, even if its delivering zeroes.
2. If you want to get access to those samples and use them for something other than the output, you use a tap. Taps can steal the samples off of most any node (in theory, though it seems to work best if you access a mixer node). Then you can do other things with those samples. In AudioKit, we plot using those samples, do amplitude analysis, and FFT. You'd probably write your own tap.
That's only if you're still set on using AudioKit. EZAudio seems like a fine solution for what you're doing though, might want to look at that closer. Its not being maintained anymore but only just recently, so I think its still fairly fresh.
Aure