Xa0 problem in texts

49 views
Skip to first unread message

Kadir Kaderoglu

unread,
Dec 7, 2023, 8:28:36 AM12/7/23
to AntWordProfiler-Discussion
I have this data set from a specific television series. 
I have the transcripts in the text format -srt format. 
First, I open them in the Word processor. 
Next, I clean up the data. That is, I check those words underlined in red in the Word processor. Then I get rid of the timestamps and and save each file in the plain text (.txt) format. Then I run the AntWordProfiler to analyze. The problem is: in the results I have words that apparently start with “xa0” (see the attached file, please)  and some other words with "xe" and "xf" . However, when I look at the original transcript, I cannot find words that start with “xa0” nor "xe" nor "xf". 
What is the cause of this problem?
How can I eliminate it? 
It would be fantastic if you could help me out with this.
Captura de pantalla 2023-12-04 182249.png

daxmt...@gmail.com

unread,
Dec 7, 2023, 7:03:52 PM12/7/23
to AntWordProfiler-Discussion
Hi Kadir,

I think "xa0" is code for a non-breaking space (an artefact left over from when the srt files were created maybe?). If you do a search and replace in Word (searching for ^s and replacing it with a regular space) that might fix it. I imagine "xe" and "xf" are codes for other special characters but, offhand, I’m not sure what those are.  Better yet, I wonder if running your Word doc through AntFileConverter might get rid of those special characters and save you doing a manual search and replace.

Don't know if this helps or not.

Take care,
Dax

Kadir Kaderoglu

unread,
Dec 11, 2023, 4:30:00 AM12/11/23
to AntWordProfiler-Discussion
Hi,

This is to inform you that I have followed the steps you recommended. All "xa0" characters disappeared. 

Thank you so much! 

Best,
Kadir
Reply all
Reply to author
Forward
0 new messages