This response is late, so my apologies. The default value of the "replicates" argument in the Java implementation of Maxent is 1, meaning no cross-validation is done on models and only training evaluation is done. This means the whole dataset is tested on itself, which should show higher values than cross-validation. Setting "replicates" to values higher than 1 defaults to cross-validation, which will be random. Looking at the documentation, it does seem that the model returned in the Java software is an average of these folds.
In ENMeval, all the evaluation is handled by the package and not by the Maxent software, including random cross-validation. ENMeval does not return an average of folds, but rather the model constructed on the full dataset. The point here is that we want to use all the data for the final prediction and only use the cross-validation results to <choose> model settings (i.e., feature classes and regularization multipliers). Averages of random cross-validation folds might make sense to produce a "stable" prediction, but you are always using an incomplete dataset to build each model, so the competing philosophy is to use the full dataset to make the final model and predictions. However, for block cross-validation that uses spatial or temporal (or other) blocks, averaging the folds makes even less sense. This is because each model is expected to be incomplete on purpose so that we get better estimates of how well the model can transfer to new conditions. We should not expect that an average of a set of purposefully incomplete models should produce a more accurate prediction than a model built with the full dataset. This is why we should use the block cross-validation results to <choose> settings only.
Hope this helps.
-Jamie