Using Python for a demonstration in historical linguistics

Dax Bloom

unread,

Nov 5, 2010, 10:17:21 PM11/5/10

to

Hello,

In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?

Best regards,

Dax Bloom

Chris Rebert

unread,

Nov 5, 2010, 10:36:49 PM11/5/10

to Dax Bloom, pytho...@python.org

Tabular files:
http://docs.python.org/library/csv.html

Character substitution:
(a) http://docs.python.org/library/string.html#string.maketrans and
http://docs.python.org/library/stdtypes.html#str.translate
(b) http://docs.python.org/library/stdtypes.html#str.replace
In either case, learn about dicts:
http://docs.python.org/library/stdtypes.html#dict

Cheers,
Chris
--
http://blog.rebertia.com

MRAB

unread,

Nov 5, 2010, 10:37:06 PM11/5/10

to pytho...@python.org

Very. :-)

I'd build a dict of each rule:

bʰ → b
b → p

etc, and then use the re module to perform the replacements in one
pass, looking up the new sound for each match.

Peter Otten

unread,

Nov 6, 2010, 6:09:34 AM11/6/10

to

Dax Bloom wrote:

>>> s = """
... In the framework of a project on evolutionary linguistics I wish to
... have a program to process words and simulate the effect of sound
... shift, for instance following the Rask's-Grimm's rule. I look to have
... python take a dictionary file or a string input and replace the
... consonants in it with the Grimm rule equivalent. For example:
... """
>>> rules = ["bpf", ("d", "t", "th"), "gkx"]
>>> for rule in rules:
... rule = rule[::-1] # go back in time
... for i in range(len(rule)-1):
... s = s.replace(rule[i], rule[i+1])
...
>>> print s

In de brameworg ob a brojecd on evoludionary linguisdics I wish do
have a brogram do brocess words and simulade de ebbecd ob sound
shibd, bor insdance bollowing de Rasg's-Grimm's rule. I loog do have
bydon dage a dicdionary bile or a sdring inbud and reblace de
consonands in id wid de Grimm rule equivalend. For egamble:

;)

If you are using nonascii characters like θ you should use unicode instead
of str. Basically this means writing string constants as u"..." instead of
"..." and opening your files with

f = codecs.open(filename, encoding="utf-8")

instead of

f = open(filename)

Peter

Steven D'Aprano

unread,

Nov 6, 2010, 6:33:28 AM11/6/10

to

On Sat, 06 Nov 2010 11:09:34 +0100, Peter Otten wrote:

> If you are using nonascii characters like θ you should use unicode
> instead of str. Basically this means writing string constants as u"..."
> instead of "..."

Or using Python 3.1 instead of 2.x.

--
Steven

garabik-ne...@kassiopeia.juls.savba.sk

unread,

Nov 6, 2010, 6:49:28 AM11/6/10

to

Dax Bloom <bloo...@gmail.com> wrote:
...

> I look to have
> python take a dictionary file or a string input and replace the
> consonants in it with the Grimm rule equivalent.

...

> How easy is it to find the python functions
> to do that?
>

http://code.activestate.com/recipes/81330-single-pass-multiple-replace/

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Peter Otten

unread,

Nov 6, 2010, 7:18:39 AM11/6/10

to

Peter Otten wrote:

>>>> s = """
> ... In the framework of a project on evolutionary linguistics I wish to
> ... have a program to process words and simulate the effect of sound
> ... shift, for instance following the Rask's-Grimm's rule. I look to have
> ... python take a dictionary file or a string input and replace the
> ... consonants in it with the Grimm rule equivalent. For example:
> ... """
>>>> rules = ["bpf", ("d", "t", "th"), "gkx"]
>>>> for rule in rules:
> ... rule = rule[::-1] # go back in time
> ... for i in range(len(rule)-1):
> ... s = s.replace(rule[i], rule[i+1])
> ...

Warning: this simple-minded approach somewhat limits the possible rules.
E. g. it fails for

a --> b
b --> a

>>> "abba".replace("a", "b").replace("b", "a")
'aaaa'

while unicode.translate() can deal with it:

>>> u"abba".translate({ord(u"a"): u"b", ord(u"b"): u"a"})
u'baab'

Or, if you are using Python 3.x as Steven suggested:

>>> "abba".translate({ord("a"): "b", ord("b"): "a"})
'baab'

Peter

Vlastimil Brom

unread,

Nov 6, 2010, 7:41:09 AM11/6/10

to Dax Bloom, pytho...@python.org

2010/11/6 Dax Bloom <bloo...@gmail.com>:

> --
> http://mail.python.org/mailman/listinfo/python-list
>

Hi,
I guess, the most difficult part would be, to select appropriate
words, to apply the simple rules on (in order not to get "problems"
with Verner's Law or other special rules).
You also normally wouldn't want to chain the changes like the above,
but to keep them separated
bʰ → b; p → f (ie. *bʰrāter- > ... brother and not *p-... (at least
without the High German consonant shift)).
of course, there are also vowel changes to be dealt with and many more
peculiarities ...

As for implementation, I guess, the simplest way might be to use
regular expression replacements - re.sub(...) with a replace function
looking up the appropriate results in a dictionary.
maybe something along the lines:

########################################

Rask_Grimm_re = ur"[bdgptk]ʰ?"
Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ...

def repl_fn(m):
return Rask_Grimm_dct.get(m.group(), m.group())

ie_txt = u" bʰrāter ... "
almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt)
print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD

########################################

bʰrāter ... >> brāþer ...

hth,
vbr

Dax Bloom

unread,

Nov 26, 2010, 11:54:06 PM11/26/10

to

On Nov 6, 6:41 am, Vlastimil Brom <vlastimil.b...@gmail.com> wrote:
> 2010/11/6 Dax Bloom <bloom....@gmail.com>:

Hello,

Thx to every one of you for the prompt response. Resuming the thread
of November 5 on evolutionary linguistics, is there a way to refer to
a sub-category of text like vowels or consonants? If not, is there a
way to optimize the code by creating these sub-categories?
I would need to arrange substitution rules into groups because there
might be a whole lot more than the ones I mentioned in the example on
Rask-Grimm rule; I would like each substitution to produce a new entry
and not all substitutions to result in a single entry. I want to do
things in two steps (or ‘passes’) and apply to the results of the
group 1 of rules the rules of group 2.

I understand that it could be particularly useful for the study of
phonology to have a dynamic analysis system with adjustable rules; in
this branch of linguistics parts of a word like the nucleus or the
codas are tagged with abbreviatory notations explaining ‘phonological
processes’ with schemas; such historical mutations of language as the
metathesis, the prothesis, the anaptyxis or fusional assimilation
could be included among the rules that we mentioned for the
substitution. It might require the replacing of certain letters with
Greek notation in applying phonological processes. What function could
tag syllables, the word nucleus and the codas? How easy is it to
bridge this with a more visual environment where schematic analysis
can be displayed with highlights and notations such as in the
phonology textbooks?

To outline the goals of the program:
1) Arranging rules for substitution into groups of rules
2) Applying substitutions to string input in logic of “Multiple pass
multiple replace”
3) Returning a string for each substitution
4) Making program environment visual

When quoting parts of code can you please precise where to insert them
in the code and what the variables mean?

Best wishes,

Dax Bloom

unread,

Nov 27, 2010, 1:58:04 AM11/27/10

to

On Nov 6, 6:18 am, Peter Otten <__pete...@web.de> wrote:
> Peter Otten wrote:
> >>>> s = """

> > ... In the framework of a project onevolutionarylinguisticsI wish to

Hi Peter,

I read your interesting replies 20 days ago and after several exams
and a university semester, I would like to address more fully your
answers to my post. However could you please clarify some of the code
inputs that you suggested and in what order to insert them in the
script?

>>>> s = """
> > ... In the framework of a project onevolutionarylinguisticsI wish to

> > ... have a program to process words and simulate the effect of sound
> > ... shift, for instance following the Rask's-Grimm's rule. I look to have
> > ... python take a dictionary file or a string input and replace the
> > ... consonants in it with the Grimm rule equivalent. For example:
> > ... """
> >>>> rules = ["bpf", ("d", "t", "th"), "gkx"]
> >>>> for rule in rules:
> > ... rule = rule[::-1] # go back in time
> > ... for i in range(len(rule)-1):
> > ... s = s.replace(rule[i], rule[i+1])
> > ...

Best regards,

Dax Bloom

Vlastimil Brom

unread,

Nov 27, 2010, 8:25:05 AM11/27/10

to pytho...@python.org, Dax Bloom

2010/11/27 Dax Bloom <bloo...@gmail.com>:

> On Nov 6, 6:41 am, Vlastimil Brom <vlastimil.b...@gmail.com> wrote:
>> 2010/11/6 Dax Bloom <bloom....@gmail.com>:

>> ...

>> Rask_Grimm_re = ur"[bdgptk]ʰ?"
>> Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ...
>>
>> def repl_fn(m):
>> return Rask_Grimm_dct.get(m.group(), m.group())
>>
>> ie_txt = u" bʰrāter ... "
>> almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt)
>> print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD
>>
>> ########################################
>>
>> bʰrāter ... >> brāþer ...
>>
>> hth,
>> vbr

> ...
> Hello Vlastimil,
>
> Could you please explain what the variables %s and % mean and how to
> implement this part of the code in a working python program? I can't
> fully appreciate Peter's quote on rules
>
>
> Best regards,
>
> Dax Bloom
>
Hi, the mentioned part is called string interpolation;
the last line is equivalent to

print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD

is equivalent to the simple string concatenation:
print ie_txt+ u" >> " + almost_germ_txt
see:
http://docs.python.org/library/stdtypes.html#string-formatting-operations

The values of the tuple (or eventually dict or another mapping) given
after the modulo operator % are inserted at the respective positions
(here %s) of the preceding string (or unicode);
some more advanced adjustments or conversions are also possible here,
which aren't needed in this simple case.

(There is also another string formatting mechanism in the newer
versions of python
http://docs.python.org/library/string.html#formatstrings
which may be more suitable for more complex tasks.)

The implementation depends on the rest of your program and the
input/output of the data, you wish to have (to be able to print the
output with rather non-trivial characters, you will need the unicode
enabled console (Idle is a basic one available with python).
Otherwise the sample is self contained and should be runnable as is;
you can add other needed items to Rask_Grimm_dct and all substrings
matching Rask_Grimm_re will be replaced in one pass.
You can also add a series of such replacements (re pattern and a dict
of a ie: germ pairs), of course only for context-free changes.
On the other hand, I have no simple idea how th deal with Verner's Law
and the like (even if you passed the accents in the PIE forms); well
besides a lexicographic approach, where you would have to identify the
word stems to decide the changes to be applied.

hth,
vbr