Seems the old demoroniser perl script might be the way to go, here:
https://www.fourmilab.ch/webtools/demoroniser/
Funny clip from the website:
//--clip
A little detective work revealed that, as is usually the case when you
encounter something shoddy in the vicinity of a computer, Microsoft
incompetence and gratuitous incompatibility were to blame. Western
language HTML documents are written in the ISO 8859-1 Latin-1 character
set, with a specified set of escapes for special characters. Blithely
ignoring this prescription, as usual, Microsoft use their own
"extension" to Latin-1, in which a variety of characters which do not
appear in Latin-1 are inserted in the range 0x82 through 0x95--this
having the merit of being incompatible with both Latin-1 and Unicode,
which reserve this region for additional control characters.
These characters include open and close single and double quotes, em
and en dashes, an ellipsis and a variety of other things you've been
dying for, such as a capital Y umlaut and a florin symbol. Well, okay,
you say, if Microsoft want to have their own little incompatible
character set, why not? Because it doesn't stop there--in their
inimitable fashion (who would want to?)--they aggressively pollute the
Web pages of unknowing and innocent victims worldwide with these
characters, with the result that the owners of these pages look like
semi-literate morons when their pages are viewed on non-Microsoft
platforms (or on Microsoft platforms, for that matter, if the user has
selected as the browser's font one of the many TrueType fonts which do
not include the incompatible Microsoft characters).
You see, "state of the art" Microsoft Office applications sport a nifty
feature called "smart quotes." (Rule of thumb--every time Microsoft use
the word "smart," be on the lookout for something dumb).
//--clip