Ok I don't fully understand what you've done, but you have two options:
Firstly (regardless of which path you follow below), prepare your enviro grids such that they are whatever size you want for model-fitting. They all need to have identical extents. (I'm assuming here that you're not using SWD format for model-fitting, and that rather you are allowing Maxent to sample the background for you, from your grids, and to extract predictor values at your occurrence localities.) The extent of your grids will influence your background sample, i.e. background will be sampled from all cells that have data for all predictor grids (in other words, if you have any predictors that have NA for a particular cell, then that cell will not be permitted in the background sample -- this is the key idea to masking, which I'll mention below).
Now, you can either:
(1) create a mask grid, which will determine both (a) where background can be sampled from (during model-fitting), and (b) where predictions should be made (during model-projection). Note that the mask grid used during model-fitting must be identical in extent to your predictors used during model-fitting, and that your mask grid used during model-projection must be identical in extent to the predictors used for projection. Mask grids should have some constant value (e.g. 1) at cells that you want included in your background (model-fitting) and that you want to project to (model-projection). Cell that you don't want included in background, or that you don't want to project to, should be set to NA. You can create two different mask grids for this purpose. For example, if you want background to be able to drawn from the entire larger initial extent of your predictor grids, then you can set all cells of the model-fitting mask grid to have values of 1. If you want to project to just a smaller sub-region of your enviro grids, you can set your mask grid such that all cells that you don't want to project to, to NA. The values of these cells in the projection grid that Maxent generates will be NA. There are a couple of reasons why you might want to do things this way (rather than clipping all the enviro grids for projection)... it can save you grid-processing time (you already have the predictor grids at the larger extent), but more importantly, sometimes you want to restrict your predictors to non-rectangular bounds, so clipping doesn't quite achieve what you're after.
or...
(2) Forget about a mask. Clip all your enviro grids such that they are at the extent that you want to project to. Point Maxent to the folder that contains these clipped grids. They must all be identical in extent. (note that if it makes sense for your background to also be restricted to this clipped region, then you should provide these clipped grids during model-fitting as well, instead of the grids with larger extent.)