Hi,
yes, if the model is setup correctly, the intensity estimates are in "length units squared" for whatever length units the mesh is defined in. So if you have a projection CRS with units=km the result is in "per sq km".
The tricky part can be for line transect sampling, where one needs to ensure that the transect width is handled correctly. In our examples, we use an explicit cross-transect distance dimension which takes care of that in the likelihood construction.
For ordinary "fully observed over a polygon" observations no special handling is needed.
For your prediction question, it depends on how you set it up; if you just evaluate at points km apart, the units are unchanged. There is no automatic aggregation conversion done; the plain code doesn't treat grid cells as cells, but rather as points; multiplying with the cell area is a basic approximation of actual aggregation; as long as the intensity is still smooth within each cell it's an ok approximation.
With inlabru predict() calls, it's is more doable to do more accurate "integrate over cells" prediction, by using finer integration schemes, via fm_int(fm_subdivide(mesh, ...), ...). This is needed in cases where covariates have finer resolution than the mesh.
I would strongly encourage you to use the inlabru interface instead of plain inla and prediction stacks. See the several Point Process examples at
https://inlabru-org.github.io/inlabru/ which includes e.g. how to do posterior evaluation of total region counts etc.
Your code will both become much shorter and easier to maintain, but will also be benefit from the improvements we've done over the years in how point process data is handled.
Finn