You can extract the desired number with some regular expression wizardry. You can use either STREGEX or IDL_String.EXTRACT, they both work with the same regex.
The regular expression I found that works is '.*([0-9][a-z][0-9]?)(\.\([0-9][A-Z]\*\))?$'. Let's break this down and explain what it is doing.
There are 4 components to the expression:
- .*
- ([0-9][a-z][0-9]?)
- (\.\([0-9][A-Z]\*\))?
- $
The first allows any number of characters to precede the rest. This handles the case of multiple pieces separated by dots in your strings.
The second will match any "number/letter" or "number/letter/number" string. Some of your examples were only 2 characters, number/letter, while others were 3 number/letter/number, so I made the third character optional with the ?. If you need upper case support too, then you can replace [a-z] with [a-zA-Z]. This entire subexpression is surrounded in parentheses, so that it can be extracted.
The third is will match any ".(number/letter*)" string. I had to escape the dot, since that normally means any character. It looks like the * strings were always surrounded in parentheses, so I added them, escaping each with \ to make it mean just that parenthesis. I also had to escape the *, since that is normally a cardinality modifier meaning 0 or more occurrences of the previous character. In this case, I made the character uppercase only, but if you need both upper and lower case, then replace [A-Z] with [a-zA-Z]. This subexpression is also surrounded in parentheses, so that it can be made optional by the ?. It will also be extracted, but we don't need that string.
The
fourth
is $, which means that the string needs to end with either the second or second and third subexpressions.
For strings that do not have a starred substring, then the third subexpression will not be present, and we will extract only the second subexpression, which is what we want.
For strings that do have the starred substring, then the third subexpression will be present, and we will extract both the second and third subexpressions, and you want the second.
In both cases, STREGEX(/EXTRACT, /SUBEXPR) will return the full string first, and the subexpression(s) after. In both cases, it is the first subexpression that you want, which would be the second element returned by STREGEX. Once you have that subexpression, you can just pass it into FIX() to convert to an int. It will stop when it sees the first non-digit, which will yield the number preceding that letter.
Here is the whole code (showing how to use IDL_STRING::EXTRACT instead of STREGEX if you choose):
function get_last_number_from_string, str
compile_opt idl2
res = STREGEX(str, '.*([0-9][a-z][0-9]?)\.?(\([0-9][A-Z]\*\))?$', /EXTRACT, /SUBEXPR)
; res = str.Extract('.*([0-9][a-z][0-9]?)(\.\([0-9][a-zA-Z]\*\))?$', /SUBEXPR)
if (N_Elements(res) eq 1) then begin
Message, 'No match found.'
endif
return, FIX(res[1])
end
pro newsgroup_read_number_from_string
compile_opt idl2
lines = ['3d6.4s2', $
'7s6.(4F).3p1', $
'3t6.6d3', $
'2l6.(5G).4s2', $
'3d6.(5D).4s.3p.(5P*)', $
'3d7.(2G).2s', $
'3d6.(5D).4s.4p.(3P*)']
foreach l, lines do begin
print, l, get_last_number_from_string(l)
endforeach
end
Brian Griglak
IDL Tech Lead