Although ENMeval only accepts raster stacks as input, one of the first steps internally is to extract to a data frame. Then, it runs maxent using the data frame, just as you are wanting to do. We made it that way because you're right, it is faster. So, we could eventually make an option for user inputs to already be in a data frame ("samples with data") format but time savings would be very marginal (you'd only avoid the time to do 1 extraction of data from the grids). Also, it wouldn't work with any of automated spatial data partitioning methods (block or checkerboard routines).
In terms of having control on background, this is very possible with the current version of ENMeval. You need only to provide the coordinates of background points as the "bg.coords" argument for the ENMevaluate function (see help documentation). You can also provide the groups to use as evaluation bins (if choosing the "user-defined" partitioning method), or use one of the other 5 automated methods to partition your (user supplied) background points for model evaluation.
In a tangential note, Jamie and I are working on a new version of ENMeval that can implement parallel processing. This can help speed things up a lot. We'll be posting to this list as soon as that is available (hopefully within 2 months).
I hope this is helpful, let me know if I can clarify anything or you have more questions.
Bob