[R] Calculating sum of letter values

0 views
Skip to first unread message

Rory.W...@rbs.com

unread,
Nov 24, 2008, 9:57:57 AM11/24/08
to r-h...@r-project.org
Hi all

If I have a string, say "ABCDA", and I want to convert this to the sum of the letter values, e.g.

A -> 1
B -> 2

etc, so "ABCDA" = 1+2+3+4+1 = 11

Is there an elegant way to do this? Trying something like

which(LETTERS %in% unlist(strsplit("ABCDA", "")))
is not quite correct, as it does not count repeated characters. I guess what I need is some kind of lookup table?

Cheers
Rory

Rory Winston
RBS Global Banking & Markets
280 Bishopsgate, London, EC2M 4RB
Office: +44 20 7085 4476

***********************************************************************************
The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised and regulated by the Financial Services Authority

This e-mail message is confidential and for use by the=2...{{dropped:25}}

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

unread,
Nov 24, 2008, 10:08:48 AM11/24/08
to Rory.W...@rbs.com, r-h...@r-project.org
on 11/24/2008 08:57 AM Rory.W...@rbs.com wrote:
> Hi all
>
> If I have a string, say "ABCDA", and I want to convert this to the sum of the letter values, e.g.
>
> A -> 1
> B -> 2
>
> etc, so "ABCDA" = 1+2+3+4+1 = 11
>
> Is there an elegant way to do this? Trying something like
>
> which(LETTERS %in% unlist(strsplit("ABCDA", "")))
> is not quite correct, as it does not count repeated characters. I guess what I need is some kind of lookup table?
>
> Cheers
> Rory


> sum(as.numeric(factor(unlist(strsplit("ABCDA", "")))))
[1] 11


Convert the letters to factors, after splitting the vector, which then
enables the use of the underlying numeric codes:

> as.numeric(factor(unlist(strsplit("ABCDA", ""))))
[1] 1 2 3 4 1

HTH,

Marc Schwartz

Rory.W...@rbs.com

unread,
Nov 24, 2008, 10:14:53 AM11/24/08
to marc_s...@comcast.net, r-h...@r-project.org
Hi Mark

Thanks, that's almost exactly what I need...theres just a slight difference with my requirement, in that I am looking for the actual index value in the alphabetical sequence, so that instead of:

as.numeric(factor(unlist(strsplit("XYZ",""))))
[1] 1 2 3

I would expect to see

[1] 24 25 26

I have got it to work in a fairly non-elegant manner, using the following code:

sum ( unlist(lapply(strsplit("TESTING",""), function(x) match(x,LETTERS) )) )

And over a list of names, this becomes:

lapply(namelist, function(Z) { sum ( unlist(lapply(strsplit(Z,""), function(x) match(x,LETTERS) )) ) } )

But this is kind of ugly....

Rory Winston
RBS Global Banking & Markets

-----Original Message-----
From: Marc Schwartz [mailto:marc_s...@comcast.net]
Sent: 24 November 2008 15:09
To: WINSTON, Rory, GBM
Cc: r-h...@r-project.org
Subject: Re: [R] Calculating sum of letter values

on 11/24/2008 08:57 AM Rory.W...@rbs.com wrote:
> Hi all
>
> If I have a string, say "ABCDA", and I want to convert this to the sum of the letter values, e.g.
>
> A -> 1
> B -> 2
>
> etc, so "ABCDA" = 1+2+3+4+1 = 11
>
> Is there an elegant way to do this? Trying something like
>
> which(LETTERS %in% unlist(strsplit("ABCDA", ""))) is not quite
> correct, as it does not count repeated characters. I guess what I need is some kind of lookup table?
>
> Cheers
> Rory


> sum(as.numeric(factor(unlist(strsplit("ABCDA", "")))))
[1] 11


Convert the letters to factors, after splitting the vector, which then enables the use of the underlying numeric codes:

> as.numeric(factor(unlist(strsplit("ABCDA", ""))))
[1] 1 2 3 4 1

HTH,

Marc Schwartz

***********************************************************************************


The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised and regulated by the Financial Services Authority

This e-mail message is confidential and for use by the=2...{{dropped:22}}

Gabor Grothendieck

unread,
Nov 24, 2008, 10:17:12 AM11/24/08
to Rory.W...@rbs.com, r-h...@r-project.org
Here are a couple of solutions.

The first matches each
character against LETTERS returning the position number
in LETTERS of the match. strsplit returns a list of which
we want the first element and then we sum that.

The second applies function(x) match(x, LETTERS),
which is specified in formula notation, to each letter
and simplifies the result using sum.

sum(match(strsplit(s, "")[[1]], LETTERS))

library(gsubfn)
strapply(s, ".", ~ match(x, LETTERS), simplify = sum)

Jagat....@wellsfargo.com

unread,
Nov 24, 2008, 10:22:24 AM11/24/08
to Rory.W...@rbs.com, marc_s...@comcast.net, r-h...@r-project.org
You can use Mark's code by giving levels to the factor, e.g.

as.numeric(factor(unlist(strsplit("ABCDAXYZ", "")), levels=LETTERS))

Hi Mark

[1] 24 25 26

HTH,

Marc Schwartz

This e-mail message is confidential and for use by\ the=...{{dropped:10}}

Richard...@hsl.gov.uk

unread,
Nov 24, 2008, 10:24:21 AM11/24/08
to Rory.W...@rbs.com, marc_s...@comcast.net, r-h...@r-project.org, r-help-...@r-project.org
> Thanks, that's almost exactly what I need...theres just a slight
> difference with my requirement, in that I am looking for the actual
> index value in the alphabetical sequence, so that instead of:
>
> as.numeric(factor(unlist(strsplit("XYZ",""))))
> [1] 1 2 3
>
> I would expect to see
>
> [1] 24 25 26

A minor modeification of Mark's solution works in this case:

as.numeric(factor(unlist(strsplit("XYZ", "")), levels=LETTERS))
# [1] 24 25 26

Regards,
Richie.

Mathematical Sciences Unit
HSL

------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

Marc Schwartz

unread,
Nov 24, 2008, 10:23:59 AM11/24/08
to Rory.W...@rbs.com, r-h...@r-project.org
Yep, my error...it should be:

> as.numeric(factor(unlist(strsplit("ABCDA", "")), levels = LETTERS))


[1] 1 2 3 4 1

> as.numeric(factor(unlist(strsplit("XYZ", "")), levels = LETTERS))
[1] 24 25 26

The step that I missed was setting the factor levels to the full set of
LETTERS.

HTH,

Marc

______________________________________________

Stefan Evert

unread,
Nov 24, 2008, 10:27:49 AM11/24/08
to R-help Mailing List

>
> Thanks, that's almost exactly what I need...theres just a slight
> difference with my requirement, in that I am looking for the actual
> index value in the alphabetical sequence, so that instead of:
>
> as.numeric(factor(unlist(strsplit("XYZ",""))))
> [1] 1 2 3
>
> I would expect to see
>
> [1] 24 25 26
>

How about this?

as.numeric(factor(unlist(strsplit("ECX", "")), levels=LETTERS))


Best regards,
Stefan Evert

[ stefan...@uos.de | http://purl.org/stefan.evert ]

Berwin A Turlach

unread,
Nov 24, 2008, 10:28:31 AM11/24/08
to Rory.W...@rbs.com, r-h...@r-project.org
G'day Rory,

On Mon, 24 Nov 2008 14:57:57 +0000
<Rory.W...@rbs.com> wrote:

> If I have a string, say "ABCDA", and I want to convert this to the
> sum of the letter values, e.g.
>
> A -> 1
> B -> 2
>
> etc, so "ABCDA" = 1+2+3+4+1 = 11
>

> Is there an elegant way to do this? [...]

R> sum(as.numeric(factor(unlist(strsplit("ABCDA","")), levels=LETTERS)))
[1] 11
R> sum(as.numeric(factor(unlist(strsplit("ABCEA","")), levels=LETTERS)))
[1] 12

HTH.

Best wishes,

Berwin

=========================== Full address =============================
Berwin A Turlach Tel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability +65 6515 6650 (self)
Faculty of Science FAX : +65 6872 3919
National University of Singapore
6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg
Singapore 117546 http://www.stat.nus.edu.sg/~statba

Rory.W...@rbs.com

unread,
Nov 24, 2008, 10:26:41 AM11/24/08
to marc_s...@comcast.net, r-h...@r-project.org
Thanks a lot for the solutions everyone...really appreciated.

Cheers
Rory

HTH,

Marc

***********************************************************************************


The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised and regulated by the Financial Services Authority

This e-mail message is confidential and for use by the=2...{{dropped:22}}

______________________________________________

William Dunlap

unread,
Nov 24, 2008, 2:06:38 PM11/24/08
to Rory.W...@rbs.com, R help
Rory Winston wrote:
> I have got it to work in a fairly non-elegant manner, using the
following code:
>
> sum ( unlist(lapply(strsplit("TESTING",""), function(x)
match(x,LETTERS) )) )
>
> And over a list of names, this becomes:
>
> lapply(namelist, function(Z) { sum ( unlist(lapply(strsplit(Z,""),
function(x) match(x,LETTERS) )) ) } )
>
> But this is kind of ugly....
>
> Rory Winston
> RBS Global Banking & Markets
> Office: +44 20 7085 4476

Do you mean that the nested lapply's are kind of ugly. You don't
need them. I think the following does the same as what you wrote

f1 <- function(namelist)lapply(strsplit(namelist,""), function(x)
sum(match(x,LETTERS)))

where your code as a function would be

f0 <- function(namelist)lapply(namelist, function(Z) { sum (


unlist(lapply(strsplit(Z,""), function(x) match(x,LETTERS) )) ) } )

(Since f0() and f1() return lists of scalar integers, it might make more
sense to call unlist() on their outputs before returning them.)

Another approach is to use a named vector of character values to map
characters to values, such as in

f2 <- function(namelist) {
values <- c(seq_along(LETTERS), seq_along(letters), 0L, 0L, 0L)
names(values) <- c(LETTERS, letters, " ", "-", ".")
lapply(strsplit(namelist,""), function(characters,
values)sum(values[characters]), values)
}

E.g.,
> f2(c("Mary Jean", "Maryjean", "Mary-Jean", "MARYJEAN"))
[[1]]
[1] 87

[[2]]
[1] 87

[[3]]
[1] 87

[[4]]
[1] 87

That approach lets you map several characters to the same value, and the
values are not restricted to the small positive integers
1:length(possibleCharacters).

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

Reply all
Reply to author
Forward
0 new messages