Q1. If both refer to energies per atom, why is the difference between the two very large
Answer: The Ti and O energies from our ænet paper (N. Artrith and A. Urban, Comput. Mater. Sci. 114 (2016) 135-150) were obtained from calculations using the Quantum ESPRESSO (QE) package and ultrasoft pseudopotentials. The Materials Project contains data from VASP calculations with PAW pseudopotentials. The total energy of pseudopotential DFT calculations cannot be interpreted, and so the values from QE and VASP cannot be expected to be the same.
Q2. Since the "generate.in" file has links to all the different training structures, which structure's (energy per atom) are we supposed to consider for the TYPES field? Answer: The total energy can vary strongly with the composition. To make the ANN training easier, the 'atomic energies' are subtracted from the total energies before training is started. There is no unique choice how the 'atomic energies' in the TYPES section are defined.
One choice would be the energies of isolated (free) atoms in vacuum, which is what we used in the TiO2 paper that you referenced (as explained in the documentation: http://ann.atomistic.net/documentation/#input-file-example-generate-in-for-tio-sub-2-sub). With this choice the ANN potential would be trained on the cohesive energy, i.e., the difference of the compound energy and the energy of free atoms. Another choice for the 'atomic energy' would be to use the elements in their reference state (e.g., elemental Ti metal and elemental O2 gas) normalized per atom. With this choice, the training target would be the formation energy.
You can decide which values you want to use for the ‘atomic energies’ in the input for ‘generate.x’, which gives you some additional flexibility (i.e., the atomic energies don’t necessarily have to be the energy of free atoms in vacuum or the energies of the elements in their reference states).
Q3. And if I want to obtain energies for different elements (like Mn, Ni, etc..), which database reference do you suggest
Answer: When constructing an ANN potential, it is important that all energies in the reference data set are compatible. Please use first-principles (e.g., DFT) datasets obtained from calculations with the same approach, the same software package, and using the same input parameters (e.g., basis sets, accuracies, etc.). I would recommend using a database for a method to which you have access, so that you can extend the database by yourself for your own applications.