The first paper really says that in the 5 European languages they studied, that in those languages where the verb comes after the object, the auxiliary verb follows the verb. Then they generalize to the entire parse tree in all languages.
That is not so. Spanish adverbs modifying adjectives violate the final over final condition. For example, "rio grande" literally means "river big" (head final) but "very big river" is "rio muy grande" (head initial) not "rio grande muy".
The prior for natural language is the limited capacity for short term memory, about 7 words. This limits the complexity and depth of sentences to those where the head and modifier both fit into memory, with low frequency words persisting longer.