Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

converting DNA sequences to numbers or letters & numbers (puzzle)

1,279 views
Skip to first unread message

Adam Funk

unread,
Apr 7, 2015, 4:00:05 PM4/7/15
to
I'm trying to figure out a "mystery" geocache whose puzzle consists of
a sequence of the characters A, T, C, & G. I expect to get some
numbers or numbers & a few letters to get coördinates of the form
<N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
Just <### ###> for the decimas of minutes would suffice, actually.

Is there any standard or semi-standard way of representing DNA as
numbers? I found this (page 2)
<http://www.mrsec.psu.edu/education/nano-activities/dna/dnas_secret_code/dnas_secret_code.pdf>
but it seems to be a toy educational example & gave gibberish for the
sequence I'm looking at.


--
Men, there is no sacrifice greater than someone else's.
--- Skipper

Jerry Friedman

unread,
Apr 7, 2015, 4:54:05 PM4/7/15
to
On Tuesday, April 7, 2015 at 2:00:05 PM UTC-6, Adam Funk wrote:
> I'm trying to figure out a "mystery" geocache whose puzzle consists of
> a sequence of the characters A, T, C, & G. I expect to get some
> numbers or numbers & a few letters to get coördinates of the form
> <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> Just <### ###> for the decimas of minutes would suffice, actually.
>
> Is there any standard or semi-standard way of representing DNA as
> numbers? I found this (page 2)
> <http://www.mrsec.psu.edu/education/nano-activities/dna/dnas_secret_code/dnas_secret_code.pdf>
> but it seems to be a toy educational example & gave gibberish for the
> sequence I'm looking at.

I don't know of any standards. Is the number of bases a multiple of 3?
If so, you might try converting them to the one-letter codes for the
amino acids.

--
Jerry Friedman

Adam Funk

unread,
Apr 9, 2015, 7:30:07 AM4/9/15
to
Aha, you mean this?

<https://en.wikipedia.org/wiki/DNA_codon_table>

Great suggestion!-- but unfortunately I still get gibberish out.


--
"It is the role of librarians to keep government running in difficult
times," replied Dramoren. "Librarians are the last line of defence
against chaos." (McMullen 2001)

J. J. Lodder

unread,
Apr 9, 2015, 3:27:58 PM4/9/15
to
Adam Funk <a24...@ducksburg.com> wrote:

> I'm trying to figure out a "mystery" geocache whose puzzle consists of
> a sequence of the characters A, T, C, & G. I expect to get some
> numbers or numbers & a few letters to get coördinates of the form
> <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> Just <### ###> for the decimas of minutes would suffice, actually.
>
> Is there any standard or semi-standard way of representing DNA as
> numbers? I found this (page 2)
> <http://www.mrsec.psu.edu/education/nano-activities/dna/dnas_secret_code/dnas_
secret_code.pdf>
> but it seems to be a toy educational example & gave gibberish for the
> sequence I'm looking at.

There are four letters, hence 16 pairs,
which suggest coding in hexadecimal,

Jan

Jerry Friedman

unread,
Apr 10, 2015, 12:56:11 AM4/10/15
to
On 4/9/15 5:18 AM, Adam Funk wrote:
> On 2015-04-07, Jerry Friedman wrote:
>
>> On Tuesday, April 7, 2015 at 2:00:05 PM UTC-6, Adam Funk wrote:
>>> I'm trying to figure out a "mystery" geocache whose puzzle consists of
>>> a sequence of the characters A, T, C, & G. I expect to get some
>>> numbers or numbers & a few letters to get coördinates of the form
>>> <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
>>> Just <### ###> for the decimas of minutes would suffice, actually.
>>>
>>> Is there any standard or semi-standard way of representing DNA as
>>> numbers? I found this (page 2)
>>> <http://www.mrsec.psu.edu/education/nano-activities/dna/dnas_secret_code/dnas_secret_code.pdf>
>>> but it seems to be a toy educational example & gave gibberish for the
>>> sequence I'm looking at.
>>
>> I don't know of any standards. Is the number of bases a multiple of 3?
>> If so, you might try converting them to the one-letter codes for the
>> amino acids.
>
> Aha, you mean this?
>
> <https://en.wikipedia.org/wiki/DNA_codon_table>

Yep.

> Great suggestion!-- but unfortunately I still get gibberish out.

Oh, well. You could try reading backwards and with the bases reversed,
A<>T and C<>G. It would be a strong hint if some such reading ended
with a stop codon.

In addition to Jan's suggestion, a way of encoding a numerical message
in such a string is by the positions of one of the letters. For
instance, AGTTCGTGAACTCCCA could code for 1, 9, 10, 16, the positions of
the A's.

But a lot depends on how much work a puzzle setter might expect you to do.
--
Jerry Friedman

Adam Funk

unread,
Apr 13, 2015, 9:30:05 AM4/13/15
to
I'll take a look at that, but I'd be surprised at that level of
indirection.


--
I take no pleasure in being right in my dark predictions about the
fate of our military intervention in the heart of the Muslim world. It
is immensely depressing to me. Nobody likes to be betting against the
Home team, no matter how hopeless they are. --- Hunter S Thompson

Adam Funk

unread,
Apr 13, 2015, 9:30:05 AM4/13/15
to
On 2015-04-10, Jerry Friedman wrote:

> On 4/9/15 5:18 AM, Adam Funk wrote:
>> On 2015-04-07, Jerry Friedman wrote:

>>> I don't know of any standards. Is the number of bases a multiple of 3?
>>> If so, you might try converting them to the one-letter codes for the
>>> amino acids.
>>
>> Aha, you mean this?
>>
>> <https://en.wikipedia.org/wiki/DNA_codon_table>
>
> Yep.
>
>> Great suggestion!-- but unfortunately I still get gibberish out.
>
> Oh, well. You could try reading backwards and with the bases reversed,
> A<>T and C<>G. It would be a strong hint if some such reading ended
> with a stop codon.

Well, I found these 2 links for generating even more versions, but I'm
still not getting it. I tried using `tr [ATCG] [TAGC]` too. I
haven't tried reversing the DNA string yet.

<http://db.systemsbiology.net:8080/proteomicsToolkit/DNATranslator.html>
<http://web.expasy.org/translate/>


> In addition to Jan's suggestion, a way of encoding a numerical message
> in such a string is by the positions of one of the letters. For
> instance, AGTTCGTGAACTCCCA could code for 1, 9, 10, 16, the positions of
> the A's.
>
> But a lot depends on how much work a puzzle setter might expect you to do.

I dunno. I googled 'geocaching dna puzzle' & found evidence of a few
where the codons make Roman numerals. Maybe I should try squinting at
the amino acid lists with my reading glasses off!

(I solved one a few weeks ago using the periodic table, e.g., "Fe.CoN"
means "26.277".)


--
"Mrs CJ and I avoid clichés like the plague."

Jerry Friedman

unread,
Apr 13, 2015, 10:12:48 AM4/13/15
to
On 4/13/15 7:26 AM, Adam Funk wrote:
> On 2015-04-09, J. J. Lodder wrote:
>
>> Adam Funk <a24...@ducksburg.com> wrote:
>>
>>> I'm trying to figure out a "mystery" geocache whose puzzle consists of
>>> a sequence of the characters A, T, C, & G. I expect to get some
>>> numbers or numbers & a few letters to get coördinates of the form
>>> <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
>>> Just <### ###> for the decimas of minutes would suffice, actually.
>>>
>>> Is there any standard or semi-standard way of representing DNA as
>>> numbers? I found this (page 2)
>>> <http://www.mrsec.psu.edu/education/nano-activities/dna/dnas_secret_code/dnas_
>> secret_code.pdf>
>>> but it seems to be a toy educational example & gave gibberish for the
>>> sequence I'm looking at.
>>
>> There are four letters, hence 16 pairs,
>> which suggest coding in hexadecimal,
>
> I'll take a look at that, but I'd be surprised at that level of
> indirection.

Anyway, isn't the same as converting each of the bases to two bits (in
any of the 24 possible ways)?

--
Jerry Friedman

Adam Funk

unread,
Apr 13, 2015, 10:45:05 AM4/13/15
to
Yes. But I'd expect this kind of a puzzle to have a "sort of
deterministic" solution, i.e.: there is one correct way to decode the
clue, & the coördinates are there once you hit on it (as with the
atomic numbers). OTOH, this approach involves writing a computer
program to generate 24 different but equally valid decoding schemes, &
seeing which one produces coördinates in the vicinity of the given
waypoint. That's why I was asking about a "standard or semi-standard
way of representing DNA as numbers".


--
svn ci -m 'come back make, all is forgiven!' build.xml

Iain Archer

unread,
Apr 18, 2015, 11:26:04 AM4/18/15
to
Adam Funk <a24...@ducksburg.com> wrote on Mon, 13 Apr 2015 at 14:26:00:
>I dunno. I googled 'geocaching dna puzzle' & found evidence of a few
>where the codons make Roman numerals. Maybe I should try squinting at
>the amino acid lists with my reading glasses off!
>
>(I solved one a few weeks ago using the periodic table, e.g., "Fe.CoN"
>means "26.277".)

How long is the given sequence?
--
Iain Archer

Dr. HotSalt

unread,
Apr 20, 2015, 5:15:47 AM4/20/15
to
On Monday, April 13, 2015 at 7:45:05 AM UTC-7, Adam Funk wrote:
> On 2015-04-13, Jerry Friedman wrote:
>
> > On 4/13/15 7:26 AM, Adam Funk wrote:
> >> On 2015-04-09, J. J. Lodder wrote:
> >>
> >>> Adam Funk <a24...@ducksburg.com> wrote:
> >>>
> >>>> I'm trying to figure out a "mystery" geocache whose puzzle consists of
> >>>> a sequence of the characters A, T, C, & G. I expect to get some
> >>>> numbers or numbers & a few letters to get coördinates of the form
> >>>> <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> >>>> Just <### ###> for the decimas of minutes would suffice, actually.
> >>>>
> >>>> Is there any standard or semi-standard way of representing DNA as
> >>>> numbers? I found this (page 2)
> >>>> <http://www.mrsec.psu.edu/education/nano-activities/dna/dnas_secret_code/dnas_
> >>> secret_code.pdf>
> >>>> but it seems to be a toy educational example & gave gibberish for the
> >>>> sequence I'm looking at.

Yeah, I suppose lat and long in a mix of numbers and upper and lower case letters isn't very useful.

> >>> There are four letters, hence 16 pairs,
> >>> which suggest coding in hexadecimal,
> >>
> >> I'll take a look at that, but I'd be surprised at that level of
> >> indirection.

Do you know anything of the encoder's inclinations and areas of interest?

> > Anyway, isn't the same as converting each of the bases to two bits (in
> > any of the 24 possible ways)?

Sixty-four permutations of three in the table on page two, but you really need a system that only yields ten.

How do you combine four elements into only ten arrangements?

> Yes. But I'd expect this kind of a puzzle to have a "sort of
> deterministic" solution, i.e.: there is one correct way to decode the
> clue, & the coördinates are there once you hit on it (as with the
> atomic numbers). OTOH, this approach involves writing a computer
> program to generate 24 different but equally valid decoding schemes, &
> seeing which one produces coördinates in the vicinity of the given
> waypoint. That's why I was asking about a "standard or semi-standard
> way of representing DNA as numbers".

Well, would the coder have any specialist knowledge in that area? Maybe he/she might think of what he/she considers that standard to be something "everybody knows".

How far apart are waypoints, usually? If one coordinate where you found the geocache is (coincidentally) 26.277, then you can guess the likely range of the same coordinate for the next. That will give you a quick and dirty go/no go indication for any scheme; just see if the first candidate decoded number is in or near that range (assuming things like kayaking are Allowed for). If so, keep using that algorithm- if not, skip to the next.

You know, in actual DNA the sequence given will be paired with a different sequence; A with T, C with G, etc. Is the coder fond of two-step procedures?

Sorry, that's the best I can do right now.


Dr. HotSalt

Jack Campin

unread,
Apr 20, 2015, 5:48:36 AM4/20/15
to
> I'm trying to figure out a "mystery" geocache whose puzzle consists
> of a sequence of the characters A, T, C, & G. I expect to get some
> numbers or numbers & a few letters to get coördinates of the form
> <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> Just <### ###> for the decimas of minutes would suffice, actually.

I use the BarFly application to process the ABC textual musical
notation. Its developer was a molecular biologist. One of his
demos was a stretch of rat DNA auto-converted into ABC (with
the composer labelled as "God"). It sounded like the world's
dullest waltz.

Maybe yours encodes a tune like "Staten Island", "Rocky Top" or
"Sourwood Mountain"?

-----------------------------------------------------------------------------
e m a i l : j a c k @ c a m p i n . m e . u k
Jack Campin, 11 Third Street, Newtongrange, Midlothian EH22 4PU, Scotland
mobile 07800 739 557 <http://www.campin.me.uk> Twitter: JackCampin

Dr. HotSalt

unread,
Apr 20, 2015, 7:16:13 AM4/20/15
to
On Monday, April 20, 2015 at 2:48:36 AM UTC-7, Jack Campin wrote:
> > I'm trying to figure out a "mystery" geocache whose puzzle consists
> > of a sequence of the characters A, T, C, & G. I expect to get some
> > numbers or numbers & a few letters to get coördinates of the form
> > <N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> > Just <### ###> for the decimas of minutes would suffice, actually.
>
> I use the BarFly application to process the ABC textual musical
> notation. Its developer was a molecular biologist. One of his
> demos was a stretch of rat DNA auto-converted into ABC (with
> the composer labelled as "God"). It sounded like the world's
> dullest waltz.

Cool, I didn't know about this. I found these:

http://whozoo.org/mac/Music/samples.htm

This one:

http://whozoo.org/mac/Music/CaveFish.mp3

is (like most of the examples) insipid, vapid, and (if I have the BrE usage right) twee as all hell.

Play it twice as fast on a fuzzed-out Stratocaster and it would rawk.

> Maybe yours encodes a tune like "Staten Island", "Rocky Top" or
> "Sourwood Mountain"?

I think those are too vague for geocaching.

It could be the literal DNA of an organism but it would have to be unique in some way; the world's largest fungus in Oregon, a huge cluster of cloned trees in (I forget- Virginia or somewhere close), a famous old oak or whatever. Too much to hope for but I don't know what kind of weirdo the coder is.

So, how many characters in the sequence? For that matter, just share it; my puzzle bump itches terribly now.


Dr. HotSalt

Adam Funk

unread,
Apr 20, 2015, 9:45:04 AM4/20/15
to
72 characters (A, T, C, & G). One of the results I got out
DNATranslator consisted of 6 amino acid letter sequences separated by
5 stop codons (plus a stop at the end), so I though maybe the numbers
of letters were meaningful, but when I replaced the decimals of
minutes with them, I got a point too far away. (The description of
the cache suggests the real coördinates are in the same park as the
listed ones. Geocaching waypoints are generally given in the format
N/S DD° MM.mmm E/W DDD° MM.mmm, & sometimes the puzzles or clues in
intermediate waypoints just give you the 6 mmm mmm digits to
substitute in.)

Sorry I'm not providing more information, such as a link to the puzzle
itself, but I've only been seriously geocaching for a little while &
I'm not sure it would be ethical. I appreciate all the suggestions,
though.


--
The generation of random numbers is too important to be left to
chance. [Robert R. Coveyou]
Message has been deleted

Adam Funk

unread,
Sep 15, 2015, 5:15:06 AM9/15/15
to
On 2015-04-07, Adam Funk wrote:

> I'm trying to figure out a "mystery" geocache whose puzzle consists of
> a sequence of the characters A, T, C, & G. I expect to get some
> numbers or numbers & a few letters to get coördinates of the form
><N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> Just <### ###> for the decimas of minutes would suffice, actually.

I finally figured this out with a hint from someone else who'd solved
it. It turns out that you split the sequence into groups of 3, treat
each group of 3 as one symbol, & do cryptanalysis on that. You can
assume certain parts of the plaintext because the published
coördinates have to be within 2 miles of the real ones (usually they
are closer than 1 mile). It also helped that in this puzzle the same
symbol was used for spaces, decimal points, & degree symbols. So the
72 character sequence is really 24 symbols & maps to (for example;
these aren't the real numbers):

"N 52 3# ### W 000 2# ###"

where '#' are the unknowns. Then I worked out that the symbols for
the numbers were in alphabetical order aligned with numerical order.


--
"Mandrake, have you never wondered why I drink only distilled water,
or rain water, and only pure grain alcohol?" [Dr Strangelove]

snide...@gmail.com

unread,
Sep 15, 2015, 2:01:31 PM9/15/15
to
On Tuesday, September 15, 2015 at 2:15:06 AM UTC-7, Adam Funk wrote:
> On 2015-04-07, Adam Funk wrote:
>
> > I'm trying to figure out a "mystery" geocache whose puzzle consists of
> > a sequence of the characters A, T, C, & G. I expect to get some
> > numbers or numbers & a few letters to get coördinates of the form
> ><N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
> > Just <### ###> for the decimas of minutes would suffice, actually.
>
> I finally figured this out with a hint from someone else who'd solved
> it. It turns out that you split the sequence into groups of 3, treat
> each group of 3 as one symbol,

That makes sense to me; aren't codons in DNA a 3-letter (er, base) sequence?

> & do cryptanalysis on that. You can
> assume certain parts of the plaintext because the published
> coördinates have to be within 2 miles of the real ones (usually they
> are closer than 1 mile). It also helped that in this puzzle the same
> symbol was used for spaces, decimal points, & degree symbols. So the
> 72 character sequence is really 24 symbols & maps to (for example;
> these aren't the real numbers):
>
> "N 52 3# ### W 000 2# ###"
>
> where '#' are the unknowns. Then I worked out that the symbols for
> the numbers were in alphabetical order aligned with numerical order.

Like the weather report that always began "Heil Hitler!"

Congrats! I'd still be scratching my head, and trying to figure out
which pub the after had been at.

/dps

Adam Funk

unread,
Sep 16, 2015, 7:00:08 AM9/16/15
to
On 2015-09-15, snide...@gmail.com wrote:

> On Tuesday, September 15, 2015 at 2:15:06 AM UTC-7, Adam Funk wrote:
>> On 2015-04-07, Adam Funk wrote:
>>
>> > I'm trying to figure out a "mystery" geocache whose puzzle consists of
>> > a sequence of the characters A, T, C, & G. I expect to get some
>> > numbers or numbers & a few letters to get coördinates of the form
>> ><N##°##.###' W###°##.##'> (degrees, minutes, & decimals of minutes).
>> > Just <### ###> for the decimas of minutes would suffice, actually.
>>
>> I finally figured this out with a hint from someone else who'd solved
>> it. It turns out that you split the sequence into groups of 3, treat
>> each group of 3 as one symbol,
>
> That makes sense to me; aren't codons in DNA a 3-letter (er, base) sequence?

Yes. Originally I'd taken the DNA too literally & tried runnig the
whole sequence through various codon translators on the WWW, hoping to
get some Roman numerals or other letters interpretable as numbers out
of the amino acid codes.

>> & do cryptanalysis on that. You can
>> assume certain parts of the plaintext because the published
>> coördinates have to be within 2 miles of the real ones (usually they
>> are closer than 1 mile). It also helped that in this puzzle the same
>> symbol was used for spaces, decimal points, & degree symbols. So the
>> 72 character sequence is really 24 symbols & maps to (for example;
>> these aren't the real numbers):
>>
>> "N 52 3# ### W 000 2# ###"
>>
>> where '#' are the unknowns. Then I worked out that the symbols for
>> the numbers were in alphabetical order aligned with numerical order.
>
> Like the weather report that always began "Heil Hitler!"

Yes, I mean "Ja wohl!"


--
The internet is quite simply a glorious place. Where else can you find
bootlegged music and films, questionable women, deep seated xenophobia
and amusing cats all together in the same place? [Tom Belshaw]
0 new messages