translating Python to Assembler

ov...@thepond.com

unread,

Jan 22, 2008, 5:24:24 PM1/22/08

to

My expertise, if any, is in assembler. I'm trying to understand Python
scripts and modules by examining them after they have been
disassembled in a Windows environment.

I'm wondering if a Python symbols file is available. In the Windows
environment, a symbol file normally has a PDB extension. It's a little
unfortunate that Python also uses PDB for its debugger. Google, for
whatever reason, wont accept queries with dots, hyphens, etc., in the
query line. For example a Google for "python.pdb" returns +python
+pdb, so I get a ridiculous number of returns referring to the python
debugger. I have mentioned this to Google several times, but I guess
logic isn't one of their strong points. :-)

ov...@thepond.com

unread,

Jan 22, 2008, 5:29:54 PM1/22/08

to

John Machin

unread,

Jan 22, 2008, 5:51:36 PM1/22/08

to

On Jan 23, 9:24 am, o...@thepond.com wrote:
> My expertise, if any, is in assembler. I'm trying to understand Python
> scripts and modules by examining them after they have been
> disassembled in a Windows environment.
>

DB "Wrong way. Go back. Read the tutorials."
RET

James Matthews

unread,

Jan 22, 2008, 7:31:51 PM1/22/08

to Wim Vander Schelden, pytho...@python.org

The reason you were finding a Python Debugger when looking for the PDB
files is because PDB is Python DeBugger! Also why would you be looking
for a PDB file if you can read the C source!

On Jan 22, 2008 11:55 PM, Wim Vander Schelden <w...@fixnum.org> wrote:
> Python modules and scripts are normally not even compiled, if they have
> been,
> its probably just the Python interpreter packaged with the scripts and
> resources.
>
> My advice is that if you want to learn Python, is that you just read a book
> about
> it or read only resources. Learning Python from assembler is kind of...
> strange.
>
> Not only are you skipping several generations of programming languages,
> spanned
> over a period of 40 years, but the approach to programming in Python is so
> fundamentally different from assembler programming that there is simply no
> reason
> to start looking at if from this perspective.
>
> I truly hope you enjoy the world of high end programming languages, but
> treat them
> as such. Looking at them in a low-level representation or for a low-level
> perspective
> doesn't bear much fruits.
>
> Kind regards,
>
> Wim

> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

--
http://search.goldwatches.com/?Search=Movado+Watches
http://www.jewelerslounge.com
http://www.goldwatches.com

Luis Zarrabeitia

unread,

Jan 22, 2008, 8:28:39 PM1/22/08

to pytho...@python.org

I second Wim's opinion. Learn python as a high level language, you won't regret it.

About google, I'll give you a little gtip:

> > For example a Google for "python.pdb" returns +python
> > +pdb, so I get a ridiculous number of returns referring to the python
> > debugger. I have mentioned this to Google several times, but I guess
> > logic isn't one of their strong points. :-)

Instead of searching 'python.pdb' try the query "filetype:pdb python", or even
"python pdb" (quoted). The first one whould give you files with pdb extension
and python in the name or contents, and the second one (quoted) should return
pages with both words together, except for commas, spaces, dots, slashs, etc.

However... one of the second query results is this thread in google groups...
not a good sign.

--
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie

Quoting Wim Vander Schelden <w...@fixnum.org>:

> Python modules and scripts are normally not even compiled, if they have
> been,
> its probably just the Python interpreter packaged with the scripts and
> resources.
>
> My advice is that if you want to learn Python, is that you just read a book
> about
> it or read only resources. Learning Python from assembler is kind of...
> strange.
>
> Not only are you skipping several generations of programming languages,
> spanned
> over a period of 40 years, but the approach to programming in Python is so
> fundamentally different from assembler programming that there is simply no
> reason
> to start looking at if from this perspective.
>
> I truly hope you enjoy the world of high end programming languages, but
> treat them
> as such. Looking at them in a low-level representation or for a low-level
> perspective
> doesn't bear much fruits.
>
> Kind regards,
>
> Wim
>
> On 1/22/08, ov...@thepond.com <ov...@thepond.com> wrote:
> >

> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
>

--
"Al mundo nuevo corresponde la Universidad nueva"
UNIVERSIDAD DE LA HABANA
280 aniversario

Grant Edwards

unread,

Jan 22, 2008, 11:58:02 PM1/22/08

to

On 2008-01-22, ov...@thepond.com <ov...@thepond.com> wrote:

> My expertise, if any, is in assembler. I'm trying to
> understand Python scripts and modules by examining them after
> they have been disassembled in a Windows environment.

You can't dissassemble them, since they aren't ever converted
to assembler and assembled. Python is compiled into bytecode
for a virtual machine (either the Java VM or the Python VM or
the .NET VM).

> I'm wondering if a Python symbols file is available.

You're way off track.

> In the Windows environment, a symbol file normally has a PDB
> extension. It's a little unfortunate that Python also uses PDB
> for its debugger. Google, for whatever reason, wont accept
> queries with dots, hyphens, etc., in the query line. For
> example a Google for "python.pdb" returns +python +pdb, so I
> get a ridiculous number of returns referring to the python
> debugger. I have mentioned this to Google several times, but I
> guess logic isn't one of their strong points. :-)

Trying to find assembly language stuff to look at is futile.
Python doesn't get compiled into assembly language.

If you want to learn Python, then read a book on Python.

--
Grant Edwards grante Yow! I am NOT a nut....
at
visi.com

Steven D'Aprano

unread,

Jan 23, 2008, 12:50:44 AM1/23/08

to

On Wed, 23 Jan 2008 04:58:02 +0000, Grant Edwards wrote:

> On 2008-01-22, ov...@thepond.com <ov...@thepond.com> wrote:
>
>> My expertise, if any, is in assembler. I'm trying to understand Python
>> scripts and modules by examining them after they have been disassembled
>> in a Windows environment.
>
> You can't dissassemble them, since they aren't ever converted to
> assembler and assembled. Python is compiled into bytecode for a virtual
> machine (either the Java VM or the Python VM or the .NET VM).

There is the Python disassembler, dis, which dissassembles the bytecode
into something which might as well be "assembler" *cough* for the virtual
machine.

--
Steven

Christian Heimes

unread,

Jan 23, 2008, 2:49:20 AM1/23/08

to pytho...@python.org

Wim Vander Schelden wrote:
> Python modules and scripts are normally not even compiled, if they have
> been,
> its probably just the Python interpreter packaged with the scripts and
> resources.

No, that is not correct. Python code is compiled to Python byte code and
execute inside a virtual machine just like Java or C#. It's even
possible to write code with Python assembly and compile the Python
assembly into byte code.

You most certainly meant: Python code is not compiled into machine code.

Christian

Bjoern Schliessmann

unread,

Jan 23, 2008, 8:02:39 AM1/23/08

to

ov...@thepond.com wrote:

> My expertise, if any, is in assembler. I'm trying to understand
> Python scripts and modules by examining them after they have been
> disassembled in a Windows environment.

IMHO, that approach doesn't make sense to understand scripts or
modules (except if you have some kind of super brain -- because
Python is _very_ high level). It only does if you want to
understand the Python compiler/interpreter you use.

For compilers that output machine code directly this *may* make
sense (but for more complex programs it will become very
difficult).

If you'd like to get a "low level" look into how things are done in
Python, try the dis module. Using dis.dis, you can look at
disassembled Python byte code.

Regards,

Björn

--
BOFH excuse #251:

Processes running slowly due to weak power supply

Bjoern Schliessmann

unread,

Jan 23, 2008, 8:04:15 AM1/23/08

to

Grant Edwards wrote:

> Trying to find assembly language stuff to look at is futile.
> Python doesn't get compiled into assembly language.

So, how do processors execute Python scripts? :)

> If you want to learn Python, then read a book on Python.

ACK.

Regards,

Björn

--
BOFH excuse #198:

Post-it Note Sludge leaked into the monitor.

Christian Heimes

unread,

Jan 23, 2008, 8:21:11 AM1/23/08

to pytho...@python.org

Wim Vander Schelden wrote:
> I didn't know that python uses a VM, I thought it still used an
> interpretter! You
> learn something new everyday :)

still? I don't think Python ever used a different model. Most modern
languages are using an interpreted byte code approach:

http://en.wikipedia.org/wiki/Interpreted_language#Languages_usually_compiled_to_a_virtual_machine_code

IMHO .NET/C# is missing from the list.

Christian

Tim Roberts

unread,

Jan 24, 2008, 3:02:06 AM1/24/08

to

Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com> wrote:

>Grant Edwards wrote:
>
>> Trying to find assembly language stuff to look at is futile.
>> Python doesn't get compiled into assembly language.
>
>So, how do processors execute Python scripts? :)

Is that a rhetorical question? Grant is quite correct; Python scripts (in
the canonical CPython) are NOT compiled into assembly language. Scripts
are compiled to an intermediate language. Processors execute Python
scripts when the interpreter, written in a high-level language and compiled
to assembly, interprets the intermediate language created by the Python
"compiler".
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Bjoern Schliessmann

unread,

Jan 24, 2008, 10:14:28 AM1/24/08

to

Tim Roberts wrote:
> Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com>

>> So, how do processors execute Python scripts? :)
>
> Is that a rhetorical question?

A little bit.

> Grant is quite correct; Python scripts (in the canonical CPython)
> are NOT compiled into assembly language. Scripts are compiled to
> an intermediate language. Processors execute Python scripts when
> the interpreter, written in a high-level language and compiled to
> assembly, interprets the intermediate language created by the
> Python "compiler".

So in the end, the program defined in the Python script _is_
compiled to the CPU's language. But never mind, it depends on how
you define "compile" in the end.

Regards,

Björn

--
BOFH excuse #225:

It's those computer people in X {city of world}. They keep stuffing
things up.

Carl Banks

unread,

Jan 24, 2008, 12:17:17 PM1/24/08

to

On Jan 24, 10:14 am, Bjoern Schliessmann <usenet-
mail-0306.20.chr0n...@spamgourmet.com> wrote:

> Tim Roberts wrote:
> > Grant is quite correct; Python scripts (in the canonical CPython)
> > are NOT compiled into assembly language. Scripts are compiled to
> > an intermediate language. Processors execute Python scripts when
> > the interpreter, written in a high-level language and compiled to
> > assembly, interprets the intermediate language created by the
> > Python "compiler".
>
> So in the end, the program defined in the Python script _is_
> compiled to the CPU's language.

I would say it's compiled to an intermediate language ("bytecode"),
and then that intermediate language is interpreted.

> But never mind, it depends on how
> you define "compile" in the end.

If you define "compile" as "interpret", yeah.

Calr Banks

Chris Mellon

unread,

Jan 24, 2008, 12:26:59 PM1/24/08

to pytho...@python.org

On Jan 24, 2008 9:14 AM, Bjoern Schliessmann

<usenet-mail-03...@spamgourmet.com> wrote:
> Tim Roberts wrote:
> > Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com>
>
> >> So, how do processors execute Python scripts? :)
> >
> > Is that a rhetorical question?
>
> A little bit.
>
> > Grant is quite correct; Python scripts (in the canonical CPython)
> > are NOT compiled into assembly language. Scripts are compiled to
> > an intermediate language. Processors execute Python scripts when
> > the interpreter, written in a high-level language and compiled to
> > assembly, interprets the intermediate language created by the
> > Python "compiler".
>
> So in the end, the program defined in the Python script _is_
> compiled to the CPU's language. But never mind, it depends on how
> you define "compile" in the end.
>

This is true if and only if you would agree that Powerpoint
presentations, Word documents, and PNG images are likewise compiled to
machine code.

Torsten Bronger

unread,

Jan 24, 2008, 12:26:37 PM1/24/08

to

Hallöchen!

Carl Banks writes:

> On Jan 24, 10:14 am, Bjoern Schliessmann <usenet-
> mail-0306.20.chr0n...@spamgourmet.com> wrote:
>

>> [...]

>>
>> But never mind, it depends on how you define "compile" in the
>> end.
>
> If you define "compile" as "interpret", yeah.

Well, it is just-in-time-compiled command by command. :o)

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: bro...@jabber.org
(See http://ime.webhop.org for further contact info.)

Bruno Desthuilliers

unread,

Jan 25, 2008, 8:05:09 AM1/25/08

to

Christian Heimes a écrit :

> Wim Vander Schelden wrote:
>> Python modules and scripts are normally not even compiled, if they have
>> been,
>> its probably just the Python interpreter packaged with the scripts and
>> resources.
>
> No, that is not correct. Python code is compiled to Python byte code and
> execute inside a virtual machine just like Java or C#.

I'm surprised you've not been flamed to death by now - last time I
happened to write a pretty similar thing, I got a couple nut case
accusing me of being a liar trying to spread FUD about Java vs Python
respective VMs inner working, and even some usually sensible regulars
jumping in to label my saying as "misleading"...

Paul Boddie

unread,

Jan 25, 2008, 9:45:45 AM1/25/08

to

On 25 Jan, 14:05, Bruno Desthuilliers <bruno.
42.desthuilli...@wtf.websiteburo.oops.com> wrote:
> Christian Heimes a écrit :

>
> > No, that is not correct. Python code is compiled to Python byte code and
> > execute inside a virtual machine just like Java or C#.
>
> I'm surprised you've not been flamed to death by now - last time I
> happened to write a pretty similar thing, I got a couple nut case
> accusing me of being a liar trying to spread FUD about Java vs Python
> respective VMs inner working, and even some usually sensible regulars
> jumping in to label my saying as "misleading"...

Well, it is important to make distinctions when people are wondering,
"If Python is 'so slow' and yet everyone tells me that the way it is
executed is 'just like Java', where does the difference in performance
come from?" Your responses seemed to focus more on waving that issue
away and leaving the whole topic in the realm of mystery. The result:
"Python is just like Java apparently, but it's slower and I don't know
why."

It's true in one sense that the statement "Python modules and scripts
are normally not even compiled" is incorrect, since modules at least
are compiled to another representation. However, if we grant the
author of that statement the benefit of his ambiguity, we can also
grant his statement a degree of tolerance by observing that modules
are not compiled to native code, which is the only experience some
people have with compilation and its results.

As was pointed out in that previous discussion, CPython instructions
are arguably less well-suited than Java instructions for translation
to CPU instructions. This alone should make people wonder about how
close CPython and the more prominent Java virtual machines are, as
well as the considerations which led the Java virtual machine
architecture to be designed in the way that it was.

Paul

ov...@thepond.com

unread,

Jan 25, 2008, 8:10:26 PM1/25/08

to

Intel processors can only process machine language, which is
essentially binary 1's and 0's. All a processor understands is
voltages, either 0 Volts or 5 volts on older systems, or 3.3 volts and
less on newer systems. Generally, a positive voltage is a logical 1
and 0 volts is a logical 0. There's no way for a processor to
understand any higher level language, even assembler, since it is
written with hexadecimal codes and basic instructions like MOV, JMP,
etc. The assembler compiler can convert an assembler file to a binary
executable, which the processor can understand.

If you look at the Python interpreter, Python.exe, or Pythonw, the
Windows interface, or the Python24.dll, the main library for python,
you will see they are normal 32 bit PE files. That means they are
stored on disk in codes of 1's and 0's, and decompile into assembler.
You can't decompile them into Python directly, although I'm sure
someone is trying. No compiled file can be decompiled into it's
original format easily or automatically, although there are
decompilers that will convert them to a reasonable assembler
decompilation.

If a Python script was understood directly by the processor, no
interpreter would be necessary. Ask yourself what the interpreter is
doing. It's taking the scripting language and converting to the
language of the operating system. However, it's the processor that
dictates how programs are written, not the OS. That's why a Linux OS
will run on an Intel machine, as a Windows OS does. Both Linux and
Windows compile down to binary files, which are essentially 1's and
0's arranged in codes that are meaningful to the processor.

Once a python py file is compiled into a pyc file, I can disassemble
it into assembler. Assembler is nothing but codes, which are
combinations of 1's and 0's. You can't read a pyc file in a hex
editor, but you can read it in a disassembler. It doesn't make a lot
of sense to me right now, but if I was trying to trace through it with
a debugger, the debugger would disassemble it into assembler, not
python.

ajaksu

unread,

Jan 25, 2008, 8:36:06 PM1/25/08

to

On Jan 25, 11:10 pm, o...@thepond.com wrote:
> Once a python py file is compiled into a pyc file, I can disassemble
> it into assembler. Assembler is nothing but codes, which are
> combinations of 1's and 0's. You can't read a pyc file in a hex
> editor, but you can read it in a disassembler. It doesn't make a lot
> of sense to me right now, but if I was trying to trace through it with
> a debugger, the debugger would disassemble it into assembler, not
> python.

Please, tell me you're kidding...

ajaksu

unread,

Jan 25, 2008, 8:44:07 PM1/25/08

to

On Jan 25, 11:36 pm, ajaksu <aja...@gmail.com> wrote:
> On Jan 25, 11:10 pm, o...@thepond.com wrote:

[...]

Gaah, is this what's going on?

ajaksu@Belkar:~$ cat error.txt
This is not assembler...

ajaksu@Belkar:~$ ndisasm error.txt
00000000 54 push sp
00000001 686973 push word 0x7369
00000004 206973 and [bx+di+0x73],ch
00000007 206E6F and [bp+0x6f],ch
0000000A 7420 jz 0x2c
0000000C 61 popa
0000000D 7373 jnc 0x82
0000000F 656D gs insw
00000011 626C65 bound bp,[si+0x65]
00000014 722E jc 0x44
00000016 2E db 0x2E
00000017 2E db 0x2E
00000018 0A db 0x0A

:/

Steven D'Aprano

unread,

Jan 25, 2008, 10:09:05 PM1/25/08

to

On Wed, 23 Jan 2008 08:49:20 +0100, Christian Heimes wrote:

> It's even
> possible to write code with Python assembly and compile the Python
> assembly into byte code.

Really? How do you do that?

I thought it might be compile(), but apparently not.

--
Steven

Chris Mellon

unread,

Jan 25, 2008, 10:33:22 PM1/25/08

to pytho...@python.org

On Jan 25, 2008 9:09 PM, Steven D'Aprano

There are tools for it in the undocumented compiler.pyassem module.
You have to pretty much know what you're doing already to use it - I
spent a fun (but unproductive) week figuring out how to use it and
generated customized bytecode for certain list comps. Malformed
hand-generated bytecode stuffed into code objects is one of the few
ways I know of to crash the interpreter without resorting to calling C
code, too.

Christian Heimes

unread,

Jan 25, 2008, 10:54:37 PM1/25/08

to pytho...@python.org

Paul Boddie wrote:
> Well, it is important to make distinctions when people are wondering,
> "If Python is 'so slow' and yet everyone tells me that the way it is
> executed is 'just like Java', where does the difference in performance
> come from?" Your responses seemed to focus more on waving that issue
> away and leaving the whole topic in the realm of mystery. The result:
> "Python is just like Java apparently, but it's slower and I don't know
> why."

Short answer: Python doesn't have a Just In Time (JIT) compiler. While
Java's JIT optimizes the code at run time Python executes the byte code
without additional optimizations.

Christian

Grant Edwards

unread,

Jan 26, 2008, 12:02:43 AM1/26/08

to

On 2008-01-26, ov...@thepond.com <ov...@thepond.com> wrote:

> Once a python py file is compiled into a pyc file, I can disassemble
> it into assembler.

No you can't. It's not native machine code. It's byte code
for a virtual machine.

> Assembler is nothing but codes, which are combinations of 1's
> and 0's. You can't read a pyc file in a hex editor, but you
> can read it in a disassembler.

NO YOU CAN'T.

> It doesn't make a lot of sense to me right now,

That's because IT'S NOT MACHINE CODE.

> but if I was trying to trace through it with a debugger,

That wouldn't work.

> the debugger would disassemble it into assembler,
> not python.

You can "disassemble" random bitstreams into assembler. That
doesn't make it a useful thing to do.

[Honestly, I think you're just trolling.]

--
Grant Edwards grante Yow! Yow! Is this sexual
at intercourse yet?? Is it,
visi.com huh, is it??

Grant Edwards

unread,

Jan 26, 2008, 12:03:06 AM1/26/08

to

I think we've been trolled. Nobody could be that stubbornly
ignorant.

--
Grant Edwards grante Yow! This is PLEASANT!
at
visi.com

Marc 'BlackJack' Rintsch

unread,

Jan 26, 2008, 8:36:14 AM1/26/08

to

Maybe `bytecodehacks`_ + `psyco`, or PyPy!?

.. _bytecodehacks: http://sourceforge.net/projects/bytecodehacks/

Ciao,
Marc 'BlackJack' Rintsch

Bjoern Schliessmann

unread,

Jan 26, 2008, 8:47:50 AM1/26/08

to

ov...@thepond.com wrote:

> Intel processors can only process machine language[...] There's no

> way for a processor to understand any higher level language, even
> assembler, since it is written with hexadecimal codes and basic
> instructions like MOV, JMP, etc. The assembler compiler can
> convert an assembler file to a binary executable, which the
> processor can understand.

This may be true, but I think it's not bad to assume that machine
language and assembler are "almost the same" in this context, since
the translation between them is non-ambiguous (It's
just "recoding"; this is not the case with HLLs).

> Both Linux and Windows compile down to binary files, which are
> essentially 1's and 0's arranged in codes that are meaningful to
> the processor.

(Not really -- object code files are composed of header data and
different segments, data and code, and only the code segments are
really meaningful to the processor.)

> Once a python py file is compiled into a pyc file, I can
> disassemble it into assembler.

But you _do_ know that pyc files are Python byte code, and you could
only directly disassemble them to Python byte code directly?

> Assembler is nothing but codes, which are combinations of 1's and
> 0's.

No, assembly language source is readable text like this (gcc):

.LCFI4:
movl $0, %eax
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret

Machine language is binary codes, yes.

> You can't read a pyc file in a hex editor,

By definition, you can read every file in a hex editor ...

> but you can read it in a disassembler. It doesn't make a lot of
> sense to me right now, but if I was trying to trace through it
> with a debugger, the debugger would disassemble it into
> assembler, not python.

Not at all. Again: It's Python byte code. Try experimenting with
pdb.

Regards,

Björn

--
BOFH excuse #340:

Well fix that in the next (upgrade, update, patch release, service
pack).

Jeroen Ruigrok van der Werven

unread,

Jan 26, 2008, 8:48:47 AM1/26/08

to pytho...@python.org

-On [20080125 14:07], Bruno Desthuilliers (bruno.42.de...@wtf.websiteburo.oops.com) wrote:
>I'm surprised you've not been flamed to death by now - last time I
>happened to write a pretty similar thing, I got a couple nut case
>accusing me of being a liar trying to spread FUD about Java vs Python
>respective VMs inner working, and even some usually sensible regulars
>jumping in to label my saying as "misleading"...

I think your attitude in responding did not help much Bruno, if you want a
honest answer. And now you are using 'nut case'. What's with you using ad
hominems so readily?

Just an observation from peanut gallery. :)

--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーンラウフロックヴァンデルウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
We have met the enemy and they are ours...

ov...@thepond.com

unread,

Jan 27, 2008, 3:55:15 AM1/27/08

to

On Fri, 25 Jan 2008 17:36:06 -0800 (PST), ajaksu <aja...@gmail.com>
wrote:

hehe...which part am I kidding about? The explanation was for someone
who thought python scripts were translated directly by the processor.
I had no idea how much he knew, so I kept it basic (no pun intended).

Or...do you disagree with what I'm saying? You didn't say much. I have
already disassembled a pyc file as a binary file. Maybe I was using
the term assembler too broadly. A binary compiled from an assembler
source would look similar in parts to what I disassembled.

That's not the point, however. I'm trying to say that a processor
cannot read a Python script, and since the Python interpreter as
stored on disk is essentially an assembler file, any Python script
must be sooner or later be converted to assembler form in order to be
read by its own interpreter. Whatever is typed in a Python script must
be converted to binary code.

ov...@thepond.com

unread,

Jan 27, 2008, 3:58:01 AM1/27/08

to

On Fri, 25 Jan 2008 17:44:07 -0800 (PST), ajaksu <aja...@gmail.com>
wrote:

not sure what you're saying. Sure looks like assembler to me. Take the
'54 push sp'. The 54 is an assembler opcode for push and the sp is
the stack pointer, on which it is operating.

thebjorn

unread,

Jan 27, 2008, 4:07:38 AM1/27/08

to

go troll somewhere else (you obviously don't know anything about
assembler and don't want to learn anything about Python).

-- bjorn

Steven D'Aprano

unread,

Jan 27, 2008, 4:44:49 AM1/27/08

to

Deary deary me...

Have a close look again at the actual contents of the file:

$ cat error.txt
This is not assembler...

If you run the text "This is not assembler..." through a disassembler, it
will obediently disassemble the bytes "This is not assembler..." into a
bunch of assembler opcodes. Unfortunately, although the individual
opcodes are "assembly", the whole set of them together is nonsense.
You'll see that it is nonsense the moment you try to execute the supposed
assembly code.

It would be a fascinating exercise to try to generate a set of bytes
which could be interpreted as both valid assembly code *and* valid
English text simultaneously. For interest, here you will find one quine
(program which prints its own source code) which is simultaneously valid
in C and TCL, and another which is valid in C and Lisp:

http://www.uwm.edu/~chruska/recursive/selfish.html

--
Steven

Bjoern Schliessmann

unread,

Jan 27, 2008, 5:37:50 AM1/27/08

to

ov...@thepond.com wrote:

> hehe...which part am I kidding about? The explanation was for
> someone who thought python scripts were translated directly by the
> processor.

Who might this have been? Surely not Tim.

> I have already disassembled a pyc file as a binary file.

Have you? How's it look?

> Maybe I was using the term assembler too broadly. A binary
> compiled from an assembler source would look similar in parts to
> what I disassembled.

What is this supposed to mean?

> That's not the point, however. I'm trying to say that a processor
> cannot read a Python script, and since the Python interpreter as
> stored on disk is essentially an assembler file,

It isn't; it's an executable.

> any Python script must be sooner or later be converted to
> assembler form in order to be read by its own interpreter.

This "assembler form" is commonly referred to as "Python byte code".

> Whatever is typed in a Python script must be converted to binary
> code.

That, however, is true, though blurred.

Regards,

Björn

--
BOFH excuse #120:

we just switched to FDDI.

ov...@thepond.com

unread,

Jan 27, 2008, 5:55:20 AM1/27/08

to

On Sat, 26 Jan 2008 14:47:50 +0100, Bjoern Schliessmann
<usenet-mail-03...@spamgourmet.com> wrote:

>ov...@thepond.com wrote:
>
>> Intel processors can only process machine language[...] There's no
>> way for a processor to understand any higher level language, even
>> assembler, since it is written with hexadecimal codes and basic
>> instructions like MOV, JMP, etc. The assembler compiler can
>> convert an assembler file to a binary executable, which the
>> processor can understand.
>
>This may be true, but I think it's not bad to assume that machine
>language and assembler are "almost the same" in this context, since
>the translation between them is non-ambiguous (It's
>just "recoding"; this is not the case with HLLs).

I have no problem with your explanation. It's nearly impossible to
program in machine code, which is all 1's and 0's. Assembler makes it
infinitely easier by converting the machine 1's and 0's to their
hexadecimal equivalent and assigning an opcode name to them, like
PUSH, MOV, CALL, etc.

Still, the older machine-programmable processors used switches to set
the 1's and 0's. Or, the machine code was fed in on perforated cards
or tapes that were read. The computer read the switches, cards or
tapes, and set voltages according to what it scanned.

the difference is that machine code can be read directly, whereas
assembler has to be compiled in order to convert the opcodes to binary
data.

>
>> Both Linux and Windows compile down to binary files, which are
>> essentially 1's and 0's arranged in codes that are meaningful to
>> the processor.
>
>(Not really -- object code files are composed of header data and
>different segments, data and code, and only the code segments are
>really meaningful to the processor.)

I agree that the code segments, and the data, are all that's
meaningful to the processor. There are a few others, like interrupts
that affect the processor directly.

I understand what you're saying but I'm refering to an executable file
ready to be loaded into memory. It's stored on disk in a series of 1's
and 0's. As you say, there are also control codes on disk to separate
each byte along with CRC codes, timing codes, etc. However, that is
all stripped off by the hard drive electronics.

The actual file on disk is in a certain format that only the operating
system understands. But once the code is read in, it goes into memory
locations which hold individual arrays of bits. Each memory location
holds a precise number of bits corresponding to the particular code it
represents. For example, the ret instruction you mention below is
represent by hex C3 (0xC3), which represents the bits 11000011.

That's a machine code, since starting at 00000000 to 11111111, you
have 256 different codes available. When those 1's and 0's are
converted to volatges, the computer can analyze them and set circuits
in action which will bring about the desired operation. Since Linux is
written in C, it must convert down to machine code, just as Windows
must.

>
>> Once a python py file is compiled into a pyc file, I can
>> disassemble it into assembler.
>
>But you _do_ know that pyc files are Python byte code, and you could
>only directly disassemble them to Python byte code directly?

that's the part I did not understand, so thanks for pointing that out.
What I disassembled did not make sense. I was looking for assembler
code, but I do understand a little bit about how the interpreter reads
them.

For example, from os.py, here's part of the script:

# Note: more names are added to __all__ later.
__all__ = ["altsep", "curdir", "pardir", "sep", "pathsep", "linesep",
"defpath", "name", "path", "devnull"]

here's the disassembly from os.pyc:

00000C04 06 00 00 00 dd 6
00000C08 61 6C 74 73 65 70 74 db 'altsept'
00000C0F 06 00 00 00 dd 6
00000C13 63 75 72 64 69 72 74 db 'curdirt'
00000C1A 06 00 00 00 dd 6
00000C1E 70 61 72 64 69 72 74 db 'pardirt'
00000C25 03 00 00 00 dd 3
00000C29 73 65 70 db 'sep'
00000C2C 74 07 00 00 dd 774h
00000C30 00 db 0
00000C31 70 61 74 68 73 65 70 db 'pathsep'
00000C38 74 07 00 00 dd 774h
00000C3C 00 db 0
00000C3D 6C 69 6E 65 73 65 70 db 'linesep'
00000C44 74 07 00 00 dd 774h
00000C48 00 db 0
00000C49 64 65 66 70 61 74 68 db 'defpath'
00000C50 74 04 00 00 dd offset unk_474
00000C54 00 db 0
00000C55 6E 61 6D 65 db 'name'
00000C59 74 04 00 00 dd offset unk_474
00000C5D 00 db 0
00000C5E 70 61 74 68 db 'path'
00000C62 74 07 00 00 dd 774h
00000C66 00 db 0
00000C67 64 65 76 6E 75 6C 6C db 'devnull'

you can see all the ASCII names in the disassembly like altsep,
curdir, etc. I'm not clear as to why they are all terminated with 0x74
= t, or if that's my poor interpretation. Some ASCII strings don't use
a 0 terminator. The point is that all the ASCII strings have numbers
between them which mean something to the interpreter. Also, they are
at a particular address. The interpreter has to know where to find
them.

The script is essentially gone. I'd like to know how to read the pyc
files, but that's getting away from my point that there is a link
between python scripts and assembler. At this point, I admit the code
above is NOT assembler, but sooner or later it will be converted to
machine code by the interpreter and the OS and that can be
disassembled as assembler.

I realize this is a complicated process and I can understand people
thinking I'm full of beans. Python needs an OS like Windows or Linux
to interface it to the processor. And all a processor can understand
is machine code.

>
>> Assembler is nothing but codes, which are combinations of 1's and
>> 0's.
>
>No, assembly language source is readable text like this (gcc):
>
>.LCFI4:
> movl $0, %eax
> popl %ecx
> popl %ebp
> leal -4(%ecx), %esp
> ret
>

Yes, the source is readable like that, but the compiled binary is not.
A disaasembly shows both the source and the opcodes. The ret
statement above is a mneumonic for hex C3 in assembler. You have left
out the opcodes. Here's another example of assembler which is
disassembled from python.exe:

1D001250 FF 74 24 04 push [esp+arg_0]
1D001254 E8 D1 FF FF FF call 1D00122A
1D001259 F7 D8 neg eax
1D00125B 1B C0 sbb eax, eax
1D00125D F7 D8 neg eax
1D00125F 59 pop ecx
1D001260 48 dec eax
1D001261 C3 retn

the first column is obviously the address in memory. The second column
are opcodes, and the third column are mneumonics, English words
attached to the codes to give them meaning. The second and third
column mean the same thing.

A single opcode instruction like 59 = pop ecx and 48 = dec eax, are
self-explanatory. 59 is hexadecimal for binary 01011001, which is a
binary code. When a processor receives that binary as voltages, it is
wired to push the contents of the ecx register onto the stack.

The second instruction, call 1D00122A is not as straight forward. it
is made up of two parts: E8 = the opcode for CALL and the rest 'D1 FF
FF FF' is the opcode operator, or the data which the call is
referencing. In this case it's an address in memory that holds the
next instruction being called. It is written backward, however, which
is convention in certain assemblers. D1 FF FF FF actually means FF FF
FF D1.

This instruction uses F's to negate the instruction, telling the
processor to jump back. The signed number FFFFFFD1 = -2E. A call
counts from the end of it's opcode numbers which is 1D001258, and
1D001258 - 2E = 1D00122A, the address being called.

As you can see, it's all done with binary codes. The English
statements are purely for the convenience of the programmer. If you
look at the Intel definitons for assembler instructions, it lists both
the opcodes and the mneumonics.

I would agree with what you said earlier, that there is a similarity
between machine code and assembler. You can actually write in machine
code, but it is often entered in hexadecimal, requiring a hex to
binary interpreter. In tht case, the similarity to compiled assembler
is quite close.

>Machine language is binary codes, yes.
>
>> You can't read a pyc file in a hex editor,
>

if I knew what the intervening numbers meant I could. :-)

>By definition, you can read every file in a hex editor ...
>
>> but you can read it in a disassembler. It doesn't make a lot of
>> sense to me right now, but if I was trying to trace through it
>> with a debugger, the debugger would disassemble it into
>> assembler, not python.
>
>Not at all. Again: It's Python byte code. Try experimenting with
>pdb.

I will eventually...thanks for reply.

ov...@thepond.com

unread,

Jan 27, 2008, 6:23:20 AM1/27/08

to

>>
>> >ajaksu@Belkar:~$ ndisasm error.txt
>> >00000000 54 push sp
>> >00000001 686973 push word 0x7369
>> >00000004 206973 and [bx+di+0x73],ch
>> >00000007 206E6F and [bp+0x6f],ch
>> >0000000A 7420 jz 0x2c
>> >0000000C 61 popa
>> >0000000D 7373 jnc 0x82
>> >0000000F 656D gs insw
>> >00000011 626C65 bound bp,[si+0x65]
>> >00000014 722E jc 0x44
>> >00000016 2E db 0x2E
>> >00000017 2E db 0x2E
>> >00000018 0A db 0x0A
>>
>> >:/
>>
>> not sure what you're saying. Sure looks like assembler to me. Take the
>> '54 push sp'. The 54 is an assembler opcode for push and the sp is
>> the stack pointer, on which it is operating.
>
>go troll somewhere else (you obviously don't know anything about
>assembler and don't want to learn anything about Python).
>
>-- bjorn

before you start mouthing off, maybe you should learn assembler. If
you're really serious, go to the Intel site and get it from the horses
mouth. The Intel manual on assembler lists the mneumonics as well as
the opcodes for each instruction. It's not called the Intel Machine
Code and Assembler Language Manual. It's the bible on assembly
language, written by Intel.

If you're not so serious, here's a URL explaining it, along with an
excerpt from the article:

http://en.wikipedia.org/wiki/X86_assembly_language

Each x86 assembly instruction is represented by a mnemonic, which in
turn directly translates to a series of bytes which represent that
instruction, called an opcode. For example, the NOP instruction
translates to 0x90 and the HLT instruction translates to 0xF4. Some
opcodes have no mnemonics named after them and are undocumented.
However processors in the x86-family may interpret undocumented
opcodes differently and hence might render a program useless. In some
cases, invalid opcodes also generate processor exceptions.

As far as this line from your code above:

00000001 686973 push word 0x7369

68 of 686973 is the opcode for PUSH. Go on, look it up. The 6973 is
obviously the word address, 0x7369. Or, do you think that's
coincidence?

Don't fucking tell me about assembler, you asshole. I can read
disassembled code in my sleep.

ov...@thepond.com

unread,

Jan 27, 2008, 6:53:25 AM1/27/08

to

>
>> That's not the point, however. I'm trying to say that a processor
>> cannot read a Python script, and since the Python interpreter as
>> stored on disk is essentially an assembler file,
>
>It isn't; it's an executable.

I appreciated the intelligent response I received from you earlier,
now we're splitting hairs. :-) Assembler, like any other higher
level language is written as a source file and is compiled to a
binary. An executable is one form of a binary, as is a dll. When you
view the disassembly of a binary, there is a distinct difference
between C, C++, Delphi, Visual Basic, DOS, or even between the
different file types like PE, NE, MZ, etc. But they all decompile to
assembler.

While they are in the binary format, they are exactly that...binary.
Who would want to interpret a long string of 1's and 0's. Binaries are
not stored in hexadecimal on disk nor are they in hexadecimal in
memory. But, all the 1's and 0's are in codes when they are
instructions or ASCII strings. No other high level language has the
one to one relationship that assembler has to machine code, the actual
language of the computer.

Dissassemblers can easily convert a binary to assembler due to the one
to one relationship between them. That can't be said for any other
higher level language. Converting back to C or Python would be a
nightmare, although it's becoming a reality. Converting a compiled
binary back to hexadecimal is basically a matter of converting the
binary to hexadecimal, as in a hex editor. There are exceptions to
that, of course, especially with compound assembler statements that
use extensions to differentiate between registers.

>
>> any Python script must be sooner or later be converted to
>> assembler form in order to be read by its own interpreter.
>
>This "assembler form" is commonly referred to as "Python byte code".
>

thanks for pointing that out. It lead me to this page:

http://docs.python.org/lib/module-dis.html

where it is explained that the opcodes are in Include/opcode.h. I'll
take a look at that.

The light goes on. From opcode.h:

#define PRINT_NEWLINE_TO 74

All the ASCIi strings end with 0x74 in the disassembly. I have noted
that Python uses a newline as a line feed/carriage return. Now I'm
getting it. It could all be disassembled with a hex editor, but a
disassembler is better for getting things in order.

OK. So the pyc files use those defs...that's cool.

Marc 'BlackJack' Rintsch

unread,

Jan 27, 2008, 7:40:27 AM1/27/08

to

On Sun, 27 Jan 2008 11:23:20 +0000, over wrote:

> Don't fucking tell me about assembler, you asshole. I can read
> disassembled code in my sleep.

Yes you can read it, but obviously you don't understand it.

Ciao,
Marc 'BlackJack' Rintsch

Marc 'BlackJack' Rintsch

unread,

Jan 27, 2008, 7:51:47 AM1/27/08

to

On Sun, 27 Jan 2008 10:55:20 +0000, over wrote:

> On Sat, 26 Jan 2008 14:47:50 +0100, Bjoern Schliessmann
> <usenet-mail-03...@spamgourmet.com> wrote:
>
> The script is essentially gone. I'd like to know how to read the pyc
> files, but that's getting away from my point that there is a link
> between python scripts and assembler. At this point, I admit the code
> above is NOT assembler, but sooner or later it will be converted to
> machine code by the interpreter and the OS and that can be
> disassembled as assembler.

No it will not be converted to assembler. The byte code is *interpreted*
by Python, not compiled to assembler. If you want to know how this
happens get the C source code of the interpreter and don't waste your time
with disassembling `python.exe`. C is much easier to read and there are
useful comments too.

Ciao,
Marc 'BlackJack' Rintsch

Steven D'Aprano

unread,

Jan 27, 2008, 8:41:54 AM1/27/08

to

On Sun, 27 Jan 2008 10:55:20 +0000, over wrote:

> I can understand people thinking I'm full of beans.

Oh no, not full of beans. Full of something, but not beans.

Everything you have written about assembly, machine code, compilers,
Linux, Python and so forth has been a confused mish-mash of half-truths,
distortions, vaguely correct factoids and complete nonsense.

I'm starting to wonder if it is possible for somebody to be
simultaneously so self-assured and so ignorant, or if we're being trolled.

--
Steven

Marc 'BlackJack' Rintsch

unread,

Jan 27, 2008, 10:54:21 AM1/27/08

to

I recently learned that this is called the Dunning-Kruger effect:

The Dunning-Kruger effect is the phenomenon wherein people who have
little knowledge think that they know more than others who have much
more knowledge.

[…]

The phenomenon was demonstrated in a series of experiments performed by
Justin Kruger and David Dunning, then both of Cornell University. Their
results were published in the Journal of Personality and Social
Psychology in December 1999.

http://en.wikipedia.org/wiki/Dunning-Kruger_effect

See, there's almost always a rational explanation. ;-)

Ciao,
Marc 'BlackJack' Rintsch

Grant Edwards

unread,

Jan 27, 2008, 10:56:25 AM1/27/08

to

On 2008-01-27, ov...@thepond.com <ov...@thepond.com> wrote:

> Whatever is typed in a Python script must be converted to
> binary code.

Python scripts _are_ in a binary code when the start out.

--
Grant Edwards grante Yow! What UNIVERSE is
at this, please??
visi.com

Grant Edwards

unread,

Jan 27, 2008, 11:19:06 AM1/27/08

to

On 2008-01-27, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:

>> I'm starting to wonder if it is possible for somebody to be
>> simultaneously so self-assured and so ignorant, or if we're
>> being trolled.
>
> I recently learned that this is called the Dunning-Kruger effect:
>
> The Dunning-Kruger effect is the phenomenon wherein people who have
> little knowledge think that they know more than others who have much
> more knowledge.
>

> [?]

>
> The phenomenon was demonstrated in a series of experiments performed by
> Justin Kruger and David Dunning, then both of Cornell University. Their
> results were published in the Journal of Personality and Social
> Psychology in December 1999.

I remember reading that paper about a year ago and it sure
seemd to explain the behavior of a number of people I've known.
Not only is it possible to be simultaneously self-assured and
ignorant, that appears to be the normal way that the human mind
works.

... must restist ... urge... to mention... Bush...

Damn.

--
Grant Edwards grante Yow! You can't hurt
at me!! I have an ASSUMABLE
visi.com MORTGAGE!!

Wildemar Wildenburger

unread,

Jan 27, 2008, 12:21:39 PM1/27/08

to

Grant Edwards wrote:
> On 2008-01-27, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> The Dunning-Kruger effect is the phenomenon wherein people who have
>> little knowledge think that they know more than others who have much
>> more knowledge.

>> [snip]
> [snip as well]

> ... must restist ... urge... to mention... Bush...
>

Well, I think that G.W. Bush knows perfectly well that he is not really
up to the task. I still suspect that it never really was his decision to
become president, if you follow me.

/W
(What do I care, he's not my president after all ... although, in a way
... YYAAAARRRGGGGHHHH!)

John Machin

unread,

Jan 27, 2008, 5:20:03 PM1/27/08

to

What was originally posted was:

"""
ajaksu@Belkar:~$ cat error.txt
This is not assembler...

ajaksu@Belkar:~$ ndisasm error.txt

00000000 54 push sp
00000001 686973 push word 0x7369
00000004 206973 and [bx+di+0x73],ch

[snip]
"""

Read it again -- he's "disassembled" the text "This is not
assembler..."

54 -> "T"
686973 -> "his"
206973 -> " is"

but you say "68 of 686973 is the opcode for PUSH. Go on, look it up.

The 6973 is obviously the word address, 0x7369. Or, do you think
that's coincidence?"

You are a genius of a kind encountered only very rarely. Care to share
with us your decryption of the Voynich manuscript?

ajaksu

unread,

Jan 27, 2008, 5:58:23 PM1/27/08

to

This message got huge :/

Sorry for being so cryptic and unhelpful. I now believe that you're
incurring in a (quite deep) misunderstanding and wish to make things
clear for both of us :)

What I did above was:
1- create a file called "error.txt" that contains the string "This is
not assembler..."
2- show the contents of the file ("cat" being the relevant command)
3- run the NetWideDisassembler (ndisasm) on error.txt
4- watch as it "disassembled" the text file (in fact, "assembling" the
code above reconstructs part of the string!)
5- conclude that you were misguided by this behavior of
disassemblers, for AFAIK .pyc files contain Python
"opcodes" (bytecode), that in no way I can think of could be parsed by
a generic disassembler
6- form a belief that you were trying to understand meaningless
"assembler" like the above (that would have no bearing on what Python
does!)

Now, it seems that we're in flaming mode and that is unfortunate,
because I do believe in your expertise. In part, because my father was
a systems analyst for IBM mainframes and knows (a huge) lot about
informatics. However, I've seen him, due to simple misunderstandings
like this, building a complex scenario to explain his troubles with
MSWord. I believe this is what's happening here, so I suggest that we
take a step back and stop calling names.

Given that you're in the uncomfortable place of the "troll assigned by
votes" outsider in this issue, let me expose some relevant data. The
people you're pissed off with (and vice-versa) are very competent and
knowledgeable Python (and other languages) programmers, very kind to
newcomers and notably helpful (as you might find out lurking in this
newsgroup or reading the archives). They spend time and energy helping
people to solve problems and understand the language. Seriously, they
know about assembler (a lot more than I do) and how Python works. And
they know and respect each other.

Now, your attitude and faith in your own assumptions (of which,
"the .pyc contains assembler" in special) was both rude and upsetting.
This doesn't mean that you're not an assembler expert (I believe you
are). But it seemed like you were trying to teach us how Python works,
and that was considered offensive, specially due to your words.

OTOH, my responses were cryptic, unhelpful and smell of "mob
thinking". While Steven D'Aprano and others showed a lot more of
patience and willingness to help. So please forgive me and please PAY
ATTENTION to those trying to HELP and make things clearer to you.

As a simple example of my own e Dunning-Kruger effect, I was sure I'd
get errors on trying to "assemble" the output of the disassembling,
but it does roundtrip part of the string and I was baffled. I'd guess
you know why, I have no idea. The 0x74 finding was also curious, you
are indeed getting part of the binary format of bytecode, but (AFAICT)
you won't find real assembler there.

In summary, you can show us what you know and put your knowledge
(instead of what you got wrong and how you upset people) in focus. Try
to set things right. Believe me, this here community packs an uncommon
amount of greatness and openness.

HTH,
Daniel

Bjoern Schliessmann

unread,

Jan 28, 2008, 9:35:20 AM1/28/08

to

ov...@thepond.com wrote:
> On Sat, 26 Jan 2008 14:47:50 +0100, Bjoern Schliessmann

>> This may be true, but I think it's not bad to assume that machine

>> language and assembler are "almost the same" in this context,
>> since the translation between them is non-ambiguous (It's
>> just "recoding"; this is not the case with HLLs).
>
> I have no problem with your explanation. It's nearly impossible to
> program in machine code, which is all 1's and 0's.

Not really; it's "voltage" or "no voltage" at different signal lines
in the processor. The dual system is just one representation you
could choose. More common (and practical) are hexadecimal or octal.

> the difference is that machine code can be read directly, whereas
> assembler has to be compiled in order to convert the opcodes to
> binary data.

As I said before, IMHO this "compilation" if trivial compared to HLL
compilation, since it's just a translation from opcodes to numbers
and labels to addresses, respectively.

HLL compilers do much more; they translate high-level control
structures to low-level implementation (which is ambiguous). Often,
optimisation is employed, which may e. g. cause that a loop is
unrolled (vanishes in assembly).

>> (Not really -- object code files are composed of header data and
>> different segments, data and code, and only the code segments are
>> really meaningful to the processor.)
>
> I agree that the code segments, and the data, are all that's
> meaningful to the processor. There are a few others, like
> interrupts that affect the processor directly.

Interrupts and segments are orthogonal, don't you think?

> I understand what you're saying but I'm refering to an executable
> file ready to be loaded into memory.

Obviously not, since I was referring to such a file, too. Try
reading about "real" executable formats like ELF.

> It's stored on disk in a series of 1's and 0's.

No, it's stored using a complex chain of magnetic fields. You _can_
interpret it as dual numbers, yes. But it's impractical and the
choice is up to the viewer.

> The actual file on disk is in a certain format that only the
> operating system understands. But once the code is read in, it
> goes into memory locations which hold individual arrays of bits.

I agree. (Before, you wrote differently:

> Both Linux and Windows compile down to binary files, which are
> essentially 1's and 0's arranged in codes that are meaningful to
> the processor.

E. g. the ELF header and data segments mean nothing of sense to the
processor itself.)

> That's a machine code, since starting at 00000000 to 11111111, you
> have 256 different codes available.

I'm afraid it's not that simple. IA-32 opcodes, for example, are
complex bit sequences and don't always have the same length.
Primary opcodes consist of up to three bytes in this architecture.

With some RISC CPUs, there is a machine instruction length
limitation of e. g. one word. But the IA-32 doesn't have this
limitation.

>> But you _do_ know that pyc files are Python byte code, and you
>> could only directly disassemble them to Python byte code
>> directly?
>
> that's the part I did not understand, so thanks for pointing that
> out. What I disassembled did not make sense. I was looking for
> assembler code, but I do understand a little bit about how the
> interpreter reads them.
>
> For example, from os.py, here's part of the script:
>
> # Note: more names are added to __all__ later.
> __all__ = ["altsep", "curdir", "pardir", "sep", "pathsep",
> "linesep",
> "defpath", "name", "path", "devnull"]
>
> here's the disassembly from os.pyc:

... which is completely pointless because this is no IA-32 code
segment which the processor could execute, but a custom data file
format. I'd rather try this, for example:

>>> def increment(i):
... i += 1
... return argument
...
>>> dis.dis(increment)
2 0 LOAD_FAST 0 (i)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (i)

3 10 LOAD_GLOBAL 0 (argument)
13 RETURN_VALUE
>>>

The Python VM, though, is stack-based, not register-based as most
CPUs. That's why the opcodes are quite different.

> The script is essentially gone. I'd like to know how to read the
> pyc files, but that's getting away from my point that there is a
> link between python scripts and assembler. At this point, I admit
> the code above is NOT assembler, but sooner or later it will be
> converted to machine code by the interpreter and the OS and that
> can be disassembled as assembler.

Yes. But the interpreter doesn't convert the entire file to machine
language. It reads one instruction after another and, amongst other
things, outputs corresponding machine code which "does" what's
intended by the byte code instruction.

> Python needs an OS like Windows or Linux to interface it to the
> processor.

Not really. The CPython executable contains machine code directly
executable by the host processor. The OS just

* provides routines for accessing peripherals and allocating memory,
* makes it possible that multiple programs can run side by side,
* and loads the executable and sets it up in memory for execution.

> Yes, the source is readable like that, but the compiled binary is
> not.

For a machine, it is. The translation is 1:1, trivial.

> A disaasembly shows both the source and the opcodes.

The output I posted was directly from the GNU C compiler (compiled
from an empty "main" function). I got it by using a parameter that
tells the compiler to leave out the last step of generating machine
code from assembly, and save the source.

A "disassembly" is the other way round. The hexadecimal
representation of the source in the leftmost columns is completely
redundant and practically irrelevant for a human being.

> The second column are opcodes,

Not only. It's machine code instructions, i. e. opcodes and
operands.

> and the third column are mneumonics, English words attached to the
> codes to give them meaning.

They're mn_e_monics, and they're not really english (what kind of
english words would RET, JLE or CMP be?).

> The second and third column mean the same thing.

Not at all! They're the operands and can be memory addresses,
registers or fixed values.

> A single opcode instruction like 59 = pop ecx and 48 = dec eax,
> are self-explanatory.

It's a machine instruction which consists of the opcode POP and the
operand ECX.

> The second instruction, call 1D00122A is not as straight forward.
> it is made up of two parts: E8 = the opcode for CALL and the rest
> 'D1 FF FF FF' is the opcode operator

I'm afraid not -- it's the operand.

> I would agree with what you said earlier, that there is a
> similarity between machine code and assembler.

Is there, actually? :)

> You can actually write in machine code, but it is often entered in
> hexadecimal, requiring a hex to binary interpreter.

IMHO, this makes no sense. For example, the memory contents
represented by binary 1000 and 0x10 are exactly the same. Thus, it
doesn't matter at all how you enter or view it, and it's completely
up to the user. The CPU understands both *exactly* the same way,
since they are the same: voltage levels at signal lines.

>>> You can't read a pyc file in a hex editor,
>
> if I knew what the intervening numbers meant I could. :-)

(*You* wrote the above. Please don't drop quoting headers if you
quote this deep.)

Regards,

Björn

--
BOFH excuse #11:

magnetic interference from money/credit cards

Bjoern Schliessmann

unread,

Jan 28, 2008, 10:00:25 AM1/28/08

to

ov...@thepond.com wrote:
> Bjoern Schliessmann wrote:
>> ov...@thepond.com wrote:

>>> That's not the point, however. I'm trying to say that a
>>> processor cannot read a Python script, and since the Python
>>> interpreter as stored on disk is essentially an assembler file,
>>
>> It isn't; it's an executable.
>
> I appreciated the intelligent response I received from you
> earlier, now we're splitting hairs. :-)

Not at all. Assembly source is ASCII text. An executable commonly
consists of a binary header (which contains various information
=> man elf) as well as code and data segments. Normally, you're only
guaranteed to find machine language inside the code segments.

> Assembler, like any other higher level language

Assembler is _no_ high level language, though there are some
assembly languages striving for resembling HLLs.

http://webster.cs.ucr.edu/AsmTools/HLA/index.html

> is written as a source file and is compiled to a binary.

BMPs are binaries, too. Assembly code is compiled to object code
files.

> An executable is one form of a binary, as is a dll. When you view
> the disassembly of a binary, there is a distinct difference
> between C, C++, Delphi, Visual Basic, DOS,

I don't think so. How a HLL source is translated to machine code
depends on the compiler, and there are cross compilers.

> or even between the different file types like PE, NE, MZ, etc.

Yes.

> But they all decompile to assembler.

No. They all _contain_ code segments (which contain machine code),
but also different data.

> While they are in the binary format, they are exactly
> that...binary.

http://en.wikipedia.org/wiki/Binary_data

> Who would want to interpret a long string of 1's and 0's. Binaries
> are not stored in hexadecimal on disk nor are they in hexadecimal
> in memory. But, all the 1's and 0's are in codes when they are
> instructions or ASCII strings.

No -- they're voltages or magnetic fields. (I never saw "0"s or "1"s
in a memory chip or on a hard disk.) The representation of this
data is up to the viewing human being to choose.

> No other high level language has the one to one relationship that
> assembler has to machine code, the actual language of the
> computer.

Yes. That's why Assembly language is not "high level", but "low
level".

> All the ASCIi strings end with 0x74 in the disassembly.

*sigh*

> I have noted that Python uses a newline as a line feed/carriage
> return.

(The means of line separation is not chosen just like this by Python
users. It's convention depending on the OS and the application.)

> Now I'm getting it. It could all be disassembled with a hex
> editor, but a disassembler is better for getting things in order.

Argl. A hex editor just displays a binary file as hexadecimal
numbers, it does _not_ disassemble.

"Disassembly" refers to _interpreting_ a file as machine
instructions of one particular architecture. This, of course, only
makes sense if this binary file actually contains machine
instructions that make sense, not if it's really a picture or a
sound file.

Regards,

Björn

--
BOFH excuse #130:

new management

Bruno Desthuilliers

unread,

Jan 28, 2008, 10:43:26 AM1/28/08

to

Paul Boddie a écrit :
> On 25 Jan, 14:05, Bruno Desthuilliers <bruno.
> 42.desthuilli...@wtf.websiteburo.oops.com> wrote:
>> Christian Heimes a écrit :
>>
>>> No, that is not correct. Python code is compiled to Python byte code and
>>> execute inside a virtual machine just like Java or C#.

>> I'm surprised you've not been flamed to death by now - last time I
>> happened to write a pretty similar thing, I got a couple nut case
>> accusing me of being a liar trying to spread FUD about Java vs Python
>> respective VMs inner working, and even some usually sensible regulars
>> jumping in to label my saying as "misleading"...
>

> Well, it is important to make distinctions when people are wondering,
> "If Python is 'so slow' and yet everyone tells me that the way it is
> executed is 'just like Java', where does the difference in performance
> come from?" Your responses seemed to focus more on waving that issue
> away and leaving the whole topic in the realm of mystery. The result:
> "Python is just like Java apparently, but it's slower and I don't know
> why."

I'm afraid you didn't read the whole post :

"""
So while CPython may possibly be too slow for your application (it can
indeed be somewhat slow for some tasks), the reasons are elsewhere
(hint: how can a compiler safely optimize anything in a language so
dynamic that even the class of an object can be changed at runtime ?) ."""

I may agree this might not have been stated explicitily enough, but this
was about JIT optimizing compilers. Also, a couple posts later - FWIW,
to answer the OP "how does it comes it slower if it's similar to Java"
question :

"""
Java's declarative static typing allow agressive just-in-time
optimizations - which is not the case in Python due to it's higly
dynamic nature.
"""

Bruno Desthuilliers

unread,

Jan 28, 2008, 11:03:28 AM1/28/08

to

Jeroen Ruigrok van der Werven a écrit :

> -On [20080125 14:07], Bruno Desthuilliers (bruno.42.de...@wtf.websiteburo.oops.com) wrote:
>> I'm surprised you've not been flamed to death by now - last time I
>> happened to write a pretty similar thing, I got a couple nut case
>> accusing me of being a liar trying to spread FUD about Java vs Python
>> respective VMs inner working, and even some usually sensible regulars
>> jumping in to label my saying as "misleading"...
>
> I think your attitude in responding did not help much Bruno, if you want a
> honest answer.

Possibly, yes. Note that being personnally insulted for stating
something both technically correct *and* (as is the case here) commonly
stated here doesn't help either.

Grant Edwards

unread,

Jan 28, 2008, 12:15:31 PM1/28/08

to

>> The script is essentially gone. I'd like to know how to read
>> the pyc files, but that's getting away from my point that
>> there is a link between python scripts and assembler. At this
>> point, I admit the code above is NOT assembler, but sooner or
>> later it will be converted to machine code by the interpreter

No it won't. In any of the "normal" implementations, bytecodes
are not converted to machine code by the interpreter. Rather,
the interpreter simulates a machine that runs the byte codes.

>> and the OS and that can be disassembled as assembler.

No it can't. The result of feeding bytecodes to the VM isn't
output of machine code. It's changes in state of _data_
structures that are independate of the processor's instruction
set.

> Yes. But the interpreter doesn't convert the entire file to machine
> language. It reads one instruction after another and, amongst other
> things, outputs corresponding machine code which "does" what's
> intended by the byte code instruction.

No, it doesn't output corresponding machine code (that's what
some Java JIT implementations do, but I'm not aware of any
Python implementations that do that). The virtual machine
interpreter just does the action specified by the bytecode.

--
Grant Edwards grante Yow! Nipples, dimples,
at knuckles, NICKLES,
visi.com wrinkles, pimples!!

Message has been deleted

Bjoern Schliessmann

unread,

Jan 28, 2008, 3:27:15 PM1/28/08

to

Grant Edwards wrote:
> No, it doesn't output corresponding machine code (that's what
> some Java JIT implementations do, but I'm not aware of any
> Python implementations that do that). The virtual machine
> interpreter just does the action specified by the bytecode.

By "outputs corresponding machine code" I meant "feeds corresponding
machine code to the CPU" to make the analogy clearer. Which can
mean a function call.

Regards,

Björn

--
BOFH excuse #325:

Your processor does not develop enough heat.

Grant Edwards

unread,

Jan 28, 2008, 3:49:35 PM1/28/08

to

On 2008-01-28, Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com> wrote:
> Grant Edwards wrote:
>> No, it doesn't output corresponding machine code (that's what
>> some Java JIT implementations do, but I'm not aware of any
>> Python implementations that do that). The virtual machine
>> interpreter just does the action specified by the bytecode.
>
> By "outputs corresponding machine code" I meant "feeds corresponding
> machine code to the CPU" to make the analogy clearer. Which can
> mean a function call.

OK, but I think you're reaching a little. :) It's pretty hard
for me to think of a program as something that's "feeding
machine code to the CPU".

In my mind, the VM is a program that's reading data from one
source (the bytecode files) and performing operations on a
second set of data (in-memory structures representing Python
objects) based on what is found in that first set of data.

--
Grant Edwards grante Yow! Is it clean in other
at dimensions?
visi.com

Albert van der Horst

unread,

Feb 3, 2008, 5:57:56 PM2/3/08

to

In article <qs0lp39tl2e7tr3lj...@4ax.com>,
<ov...@thepond.com> wrote:

<SNIP>

>
>Once a python py file is compiled into a pyc file, I can disassemble

>it into assembler. Assembler is nothing but codes, which are
>combinations of 1's and 0's. You can't read a pyc file in a hex
>editor, but you can read it in a disassembler. It doesn't make a lot

>of sense to me right now, but if I was trying to trace through it with
>a debugger, the debugger would disassemble it into assembler, not
>python.

You know that python byte code is portable across architectures.

So you are disassembling using an Intel disassembler?
How can that make sense if you are on a SUN work station with a
non-Intel processor?

Groetjes Albert

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

thebjorn

unread,

Feb 11, 2008, 8:45:46 AM2/11/08

to

On Jan 27, 12:23 pm, o...@thepond.com wrote:
me:

> >go troll somewhere else (you obviously don't know anything about
> >assembler and don't want to learn anything about Python).
>
> >-- bjorn
>
> before you start mouthing off, maybe you should learn assembler.

I suppose I shouldn't feed the trolls... but what the heck ;-P I
could of course try to be helpful, but I don't think I have the skillz
needed.

I might know a thing or two about assembly though, I started out on
the Commodore 64, then I wrote TSR programs (both .com and .exe ;-)
for my IBM AT, and I wrote a compiler for a scheme-like functional
language (with SML-like syntax) that targeted the Motorola 68040
(which was inside my NeXTstation...).

[snip]

> Don't fucking tell me about assembler, you asshole. I can read
> disassembled code in my sleep.

Watch the language, fucktard. Perhaps you should try _writing_
something in assembly for a change? How about linking up a "hello
world" executable? You seem too clueless to be for real though, so my
original advice stands.

-- bjorn