character encoding for phonetic transcription

32 views
Skip to first unread message

Christoph Ruehlemann

unread,
Nov 22, 2016, 3:55:19 AM11/22/16
to corplin...@googlegroups.com
Hi all,

Does anybody know how to read in a file that contains phonetic transcriptions in such a way that the transcription symbols are kept?

Thanks a lot
Chris

Matías Guzmán Naranjo

unread,
Nov 22, 2016, 4:04:25 AM11/22/16
to corplin...@googlegroups.com
Dear Chris... what kind of file? R (under Linux) should be able to just read any utf8 file.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.

Hardie, Andrew

unread,
Nov 22, 2016, 4:10:47 AM11/22/16
to corplin...@googlegroups.com

An important factor here is  how old the files are. Back in the days when Unicode support was weak, phonetic transcription was often done in Word (or the like) using customised 8-bit fonts (which used capitals, punctuation, and the top half of the character space for the non-ASCII phonetic characters, and which therefore became character-soup when viewed without the font, e.g. when saved as plain text)… fonts such as Times New Roman Phonetic or the pre-unicode fonts from SIL.

 

best

 

Andrew.

 

From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com] On Behalf Of Matías Guzmán Naranjo
Sent: 22 November 2016 09:04
To: corplin...@googlegroups.com
Subject: Re: [CorpLing with R] character encoding for phonetic transcription

 

Dear Chris... what kind of file? R (under Linux) should be able to just read any utf8 file.

2016-11-22 9:55 GMT+01:00 'Christoph Ruehlemann' via CorpLing with R <corplin...@googlegroups.com>:

Hi all,

Does anybody know how to read in a file that contains phonetic transcriptions in such a way that the transcription symbols are kept?

Thanks a lot

Chris

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

 

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

Christoph Ruehlemann

unread,
Nov 22, 2016, 4:33:34 AM11/22/16
to corplin...@googlegroups.com
This is the output from an online conversion tool http://lingorado.com/ipa/


əʊ ænd ˈɛvrɪθɪŋ kʌmz æt wʌns dʌz nt ɪt

erm hiː wɒnts tuː kʌm əˈlɒŋ ænd siː juː

ɒv kɔːs hiː l biː ˈkʌmɪŋ ʌp təˈmɒrəʊ fɔː

nəʊ əʊ wɛl lɛts həʊp hiː l gɛt ˈbɛtə

ðeɪ wɒnt tuː nəʊ wɒt ˈspəʊkən ˈɪŋglɪʃ ɪz laɪk

juː riː nɒt wɛlʃ ˈspiːkɪŋ æt ɔːl ɑː juː



On Tue, Nov 22, 2016 at 10:04 AM, Matías Guzmán Naranjo <morte...@gmail.com> wrote:
Dear Chris... what kind of file? R (under Linux) should be able to just read any utf8 file.

2016-11-22 9:55 GMT+01:00 'Christoph Ruehlemann' via CorpLing with R <corpling-with-r@googlegroups.com>:
Hi all,

Does anybody know how to read in a file that contains phonetic transcriptions in such a way that the transcription symbols are kept?

Thanks a lot
Chris

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.

Matías Guzmán Naranjo

unread,
Nov 22, 2016, 4:36:49 AM11/22/16
to corplin...@googlegroups.com
This works just fine with standard input methods (again, under linux, no idea about windows).

Christoph Ruehlemann

unread,
Nov 22, 2016, 4:42:32 AM11/22/16
to corplin...@googlegroups.com
I'm using read.file and the arguments header=T, quote="", sep="\t", fill=T but they don't work: the transcription symbols are not rendered correctly

On Tue, Nov 22, 2016 at 10:36 AM, Matías Guzmán Naranjo <morte...@gmail.com> wrote:
This works just fine with standard input methods (again, under linux, no idea about windows).

Matías Guzmán Naranjo

unread,
Nov 22, 2016, 4:46:18 AM11/22/16
to corplin...@googlegroups.com
try adding
fileEncoding = "UTF-8"

Hardie, Andrew

unread,
Nov 22, 2016, 4:54:05 AM11/22/16
to corplin...@googlegroups.com

Alternative solution: Possibly the font in use by your terminal does not contain glyphs for the IPA block of Unicode?  

 

From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com] On Behalf Of Matías Guzmán Naranjo
Sent: 22 November 2016 09:46
To: corplin...@googlegroups.com
Subject: Re: [CorpLing with R] character encoding for phonetic transcription

 

try adding

fileEncoding = "UTF-8"

 

2016-11-22 10:42 GMT+01:00 'Christoph Ruehlemann' via CorpLing with R <corplin...@googlegroups.com>:

I'm using read.file and the arguments header=T, quote="", sep="\t", fill=T but they don't work: the transcription symbols are not rendered correctly

On Tue, Nov 22, 2016 at 10:36 AM, Matías Guzmán Naranjo <morte...@gmail.com> wrote:

This works just fine with standard input methods (again, under linux, no idea about windows).

2016-11-22 10:33 GMT+01:00 'Christoph Ruehlemann' via CorpLing with R <corplin...@googlegroups.com>:

This is the output from an online conversion tool http://lingorado.com/ipa/

əʊ ænd ˈɛvrɪθɪŋ kʌmz æt wʌns dʌz nt ɪt

erm hiː wɒnts tuː kʌm əˈlɒŋ ænd siː juː

ɒv kɔːs hiː l biː ˈkʌmɪŋ ʌp təˈmɒrəʊ fɔː

nəʊ əʊ wɛl lɛts həʊp hiː l gɛt ˈbɛtə

ðeɪ wɒnt tuː nəʊ wɒt ˈspəʊkən ˈɪŋglɪʃ ɪz laɪk

juː riː nɒt wɛlʃ ˈspiːkɪŋ æt ɔːl ɑː juː

 

On Tue, Nov 22, 2016 at 10:04 AM, Matías Guzmán Naranjo <morte...@gmail.com> wrote:

Dear Chris... what kind of file? R (under Linux) should be able to just read any utf8 file.

2016-11-22 9:55 GMT+01:00 'Christoph Ruehlemann' via CorpLing with R <corplin...@googlegroups.com>:

Hi all,

Does anybody know how to read in a file that contains phonetic transcriptions in such a way that the transcription symbols are kept?

Thanks a lot

Chris

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

 

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

 

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

Christoph Ruehlemann

unread,
Nov 22, 2016, 8:03:32 AM11/22/16
to corplin...@googlegroups.com
Solution:

Sys.getlocale() # to check whether console is using UTF8

# If not UTF8:

system("locale -a") # find the UTF8 locale you want to use, in my case: "de_DE.UTF-8")

Sys.setlocale("LC_ALL", "<as above>")

Thanks for everybody's help!
Chris


--

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

 

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

 

--

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages