There are some fairly gigantic XML files (the largest is around 30MB, and there are as many as 40 per directory).
They are encoded UTF-8, but unfortunately have some non UTF-8 characters, which should be replaced with the correct UTF-8 character codes.
(em-dash should be &emdash; or — etc.)
Anybody have experience with treating giant XML files as streams, operating on them (ideally in the manner described) and writing the correct version of the file, to disk?
Thanks!