Of course, it is possible, but you need to know each image what are its geometrical model, and its own orientation parameters (internal and external). These expressions are the ones that allow you to relate the image space (pixels measured in the image) with the terrain space, or vice versa (terrain coordinates, for example, LiDAR points with the coordinates in which this point appears in each of the images you have).
For conventional cameras you can use the collinearity model (together with the internal camera orientation or self-calibration parameters), for other types of cameras (fisheye, panoramic, etc.) you must use the specific models.
If you know the orientation parameters of the images (and their geometrical model) and the object appears at least in the two images, everything is reduced to an intersection of rays, starting from the projection centers of each image and passing through these points (homologous points), the intersection will be the ground point XYZ.
If you need more detail do not hesitate to contact me.
Jorge