Weird bug (AKA undocumented feature) with file / url object passing

29 views
Skip to first unread message

TL

unread,
Sep 2, 2012, 8:49:00 PM9/2/12
to pyth...@googlegroups.com
For some reason, Pythonect seems to automatically invoke _.read() on file and url objects passed, without being asked to do so.

>>> print open("file.txt")
<MainProcess:Thread-1> : <open file 'file.txt', mode 'r' at 0x018D8A70>

>>> open("file.txt") -> print
<MainProcess:Thread-2> : foobarbaz

>>> print __import__("urllib2").urlopen("http://localhost/draft/")
<MainProcess:Thread-1> : <addinfourl at 33303088 whose fp = <socket._fileobject object at 0x018D8D30>>

>>> __import__("urllib2").urlopen("http://localhost/draft/") -> print
<MainProcess:Thread-2> : foobarbaz

Note: I'm running this on Windows via the hacky eval script, so this might be the cause.

TL

unread,
Sep 2, 2012, 9:48:47 PM9/2/12
to pyth...@googlegroups.com
More weird behaviour... I'm assuming the output is somehow being _.read().split(), causing the interpreter to handle each line with a different thread.
This is some trippy stuff so get a cup of coffee / your favorite programming beverage before diving into it.

Test case:
www/draft/index.php
line1
line2
line3

>>> "http://localhost/draft" -> urllib2.urlopen -> print type(_)
<MainProcess:Thread-2> : <type 'str'>
<MainProcess:Thread-3> : <type 'str'>
<MainProcess:Thread-4> : <type 'str'>

>>> "http://localhost/draft" -> urllib2.urlopen -> print
<MainProcess:Thread-2> : line1

<MainProcess:Thread-3> : line2

<MainProcess:Thread-4> : line3

>>> urllib2.urlopen("http://localhost/draft") -> type
Exception in thread Thread-2:
Traceback (most recent call last):
  File "*snip*\Python27\lib\threading.py", line 552, in __bootstrap_inner
    self.run()
  File "*snip*\Python27\lib\threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build\bdist.win32\egg\pythonect\internal\eval.py", line 106, in __run
    object_or_objects = python.eval(atom, globals_, locals_)
  File "<string>", line 1
    'line1
         ^
SyntaxError: EOL while scanning string literal

Exception in thread Thread-3:
Traceback (most recent call last):
  File "*snip*\Python27\lib\threading.py", line 552, in __bootstrap_inner
    self.run()
  File "*snip*\Python27\lib\threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build\bdist.win32\egg\pythonect\internal\eval.py", line 106, in __run
    object_or_objects = python.eval(atom, globals_, locals_)
  File "<string>", line 1
    'line2
         ^
SyntaxError: EOL while scanning string literal

>>> urllib2.urlopen("http://google.com") -> print
*Same as previous.*

>>> "http://localhost/draft/" -> urllib2.urlopen(_).read
Exception in thread Thread-1:
Traceback (most recent call last):
  File "*snip*\Python27\lib\threading.py", line 552, in __bootstrap_inner
    self.run()
  File "*snip*\Python27\lib\threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build\bdist.win32\egg\pythonect\internal\eval.py", line 341, in __run
    __run(expression[1:], copy.copy(globals_), copy.copy(locals_), return_value_
queue, True)
  File "build\bdist.win32\egg\pythonect\internal\eval.py", line 259, in __run
    output = output(input)
  File "*snip*\Python27\lib\socket.py", line 373, in read
    left = size - buf_len
TypeError: unsupported operand type(s) for -: 'str' and 'int'

Itzik Kotler

unread,
Sep 3, 2012, 3:48:59 AM9/3/12
to pyth...@googlegroups.com
Pythonect tries to iterate every atom in the expression by default.

Having said that, there is a "blacklist" that can be used to escape/prevent it from iterating specific object types. e.g. Python str is iter-able but is not iterated by default.

That's why "Hello world" -> print prints "Hello, world" and not "H", "e", "l", "l", ...

If the latter is wanted, it's simple enough to bypass the blacklist by explicitly specifying iter() i.e. iter("Hello, world") -> print will print "H", "e", "l", "l", ...

Now, It appears that file object (as returned by open()) is iter-able and as such the behaviour you explain occurs.

The question is whether this is a wanted behaviour, if not, it will be added to the blacklist, if so, the Exceptions below should be examined and fixed.

It's open for debate. Any suggestions? file() should be iterated by default or not?

--
 
 

nir izraeli

unread,
Sep 3, 2012, 7:17:12 AM9/3/12
to pyth...@googlegroups.com

iterable file means its iterable char by char. itll allow better control for later stages - group all chars up to first new line for example could execute before the entire file was read.
it would also ease blocking the reads when the entire file is too big to read at once

--
 
 

TL

unread,
Sep 3, 2012, 8:12:29 AM9/3/12
to pyth...@googlegroups.com
Wow, I never actually noticed 'file' was an iterable object.
I dont think it should be blacklisted (specifically because many modules define their own object classes and it's impossible to blacklist everything).

Itzik Kotler

unread,
Sep 3, 2012, 8:32:06 AM9/3/12
to pyth...@googlegroups.com
Will you say that this behaviour should also apply on socket.socket()?

--
 
 

Guy Adini

unread,
Sep 3, 2012, 8:43:57 AM9/3/12
to pyth...@googlegroups.com, pyth...@googlegroups.com
I really think that you want to whitelist lists and tuples.

I'll write more strongly about this when I have a keyboard :)

BTW - the file iterator iterates over lines:
for line in open(fn):
    ...


Sent from my iPhone
--
 
 

Guy Adini

unread,
Sep 3, 2012, 10:43:24 AM9/3/12
to pyth...@googlegroups.com
Now that I have a keyboard, here comes the rant:

A lot of the suggestions over the last couple of days propose implicit behavior.
Implicit behavior is bad. It increases the amount of knowledge required to use a language.
It goes against pep 20 - the zen of python. It makes code harder to debug.

I thought of whitelisting originally, but the white list will also be pretty complex: 
list, tuple, iter(*), generator expressions, and probably other stuff.

There are two nice and explicit solutions that I can think of:
  1. Use -*> to iterate through any iterable (as TL proposed), 
  2. Blacklist anything that inherits from object. I think that should cover all of our current problem cases.

I like it how people are passionate about syntax :-)

Itzik Kotler

unread,
Sep 3, 2012, 11:13:31 AM9/3/12
to pyth...@googlegroups.com
'->' and '|' are standing for async data-forward operator and sync data-forward operator respectively.

I have 3 main comments regarding '-*>':

If we're making '-*>', we also need '*|' otherwise we're reducing the functionally (sync data-forward), and in my opinion '*|' is not very intuitive.

If we're making '-*>', we need to remove '|' (as it's now a single thread application and there's no meaning to async/sync), and as such we are creating inconsistency between multi-thread and single-thread applications.

If we're making '-*>', we making it impossible to scale from single to multi-thread in runtime. Right now, if a function returns single item or escaped multiple items - it's single thread, multiple items - it's multi-thread.

This is because it's not up to the operator to decided, if we make it operator decision, we will then need to make a way for runtime operator query or modification to give the same ability.

Open for debate.

--
 
 

nir izraeli

unread,
Sep 3, 2012, 11:31:01 AM9/3/12
to pyth...@googlegroups.com

operator is not the way to go in either case.
it should be a wrapper function. I agree that different behavior based on type is wrong. forcing each usage of a string object to be wrapped is also weird and will cause a lot of coding errors (since although string is iterable it is mostly used as a single object) which are worse that implicit behavior.

--
 
 

Guy Adini

unread,
Sep 3, 2012, 1:09:47 PM9/3/12
to pyth...@googlegroups.com
What about the "blacklist everything that extends object"?
Will that solve our problems?

--
 
 

nir izraeli

unread,
Sep 3, 2012, 1:17:29 PM9/3/12
to pyth...@googlegroups.com

is there anything that doesn't extend object in python? :-)

--
 
 

Itzik Kotler

unread,
Sep 3, 2012, 1:20:05 PM9/3/12
to pyth...@googlegroups.com
You suggesting to blacklist all Python objects with the exception of List and Tuple?

--
 
 

Guy Adini

unread,
Sep 3, 2012, 1:24:03 PM9/3/12
to pyth...@googlegroups.com
@nir: 
Sure - there are old style classes.
But unfortunately, you're right - everything of relevance extends object (turns out
you check it with object.__subclasses__(), which basically returns "Everything" :)).

@Itsik: 
No, I'm stuck with this line of thought.
I was expecting everything that is "basic" not to be a new style class, but this isn't the case.

Intuitively, I want not to split up anything that is a "real class".
But it looks like there's no clear definition of what a real class is.


--
 
 

nir izraeli

unread,
Sep 3, 2012, 1:29:02 PM9/3/12
to pyth...@googlegroups.com
both list and tuple objects are instances of the object class.
new style classes made everything an object, which is extremely cool :D

Itzik:
>>> isinstance(list(), object)
True
>>> isinstance(tuple(), object)
True

i suggest implementing the "intuitive" default behavior for each object type, and support (or even favor) explicit declarations using either object level operators (different type of operators that operate on iterable objects, as opposed to creating more data-flow operators) or wrapper functions.

--
 
 

Itzik Kotler

unread,
Sep 3, 2012, 1:29:57 PM9/3/12
to pyth...@googlegroups.com
Pythonect currently differentiates between return values (as a result of a function, or expression evaluation), and literal (i.e. list, tuple) that appears in the expression.

For example, it's possible to define that file won't be iterated as a literal, but will be iterable if returned by a function.

--
 
 

nir izraeli

unread,
Sep 3, 2012, 1:31:12 PM9/3/12
to pyth...@googlegroups.com
Won't it make everything even more complex?..

--
 
 

Guy Adini

unread,
Sep 3, 2012, 1:32:30 PM9/3/12
to pyth...@googlegroups.com
It will, IMO.


--
 
 

Itzik Kotler

unread,
Sep 3, 2012, 1:41:44 PM9/3/12
to pyth...@googlegroups.com
I don't know, it is currently the case with String and on one has yet complained about the lack of consistency. (why str is not iterable by default, while list is).

But the language don't have many users - so I can't say for sure it's acceptable by all.

Let's explore Nir's idea.

Let's say:

[1,2,3] -> print

Will be print: [1,2,3]

While

*[1,2,3] -> print

Will print 1,2,3

(C-style, deference pointer syntax).

The problems I see here is:

A function won't be able to scale on the fly (i.e. MapReduce) because it won't have any way to flag to the Pythonect interpreter that it's multi-thread-iterable.

It will either have to include a 'return *' statement, which is incompatible with Python.

Or, it will have to return the return value wrapped in a Pythonect object (i.e. PythonectIterableObject) that will be recognized by the interpreter, but then, already-existing Python code won't benefit from Pythonect.


--
 
 

nir izraeli

unread,
Sep 3, 2012, 1:46:23 PM9/3/12
to pyth...@googlegroups.com

whats wrong with *(map (hex, range (...))) ?

--
 
 

nir izraeli

unread,
Sep 3, 2012, 1:47:57 PM9/3/12
to pyth...@googlegroups.com

got it, nvm...

nir izraeli

unread,
Sep 3, 2012, 2:25:40 PM9/3/12
to pyth...@googlegroups.com
what about adding an >- operator?
to implement a multi-threaded reduce-like function on all the inputs?
is there a simpler way to implement it using existing features? am i missing something?

in case i wanted to do something like:
*range(32) -> hex >- list to implement map()'s behavior in parallel
or maybe:
*range(4) -> _*2 >- reduce(pow, _) to implement reduce()'s behavior.

Itzik Kotler

unread,
Sep 3, 2012, 2:55:55 PM9/3/12
to pyth...@googlegroups.com
Good point.

Right now, the language makes use of Python's __iter__() to map items to different threads/processes.

In the (near) future, there will be a need for a way to reduce the items back into a single flow.

I think '>-' will be a great candidate for a reduce syntax.

Functionally-wise, I imagine such thing to be backend by a Queue that all threads/processes will be written into.

Once all the threads/processes are done, the Queue will group the return values and forward them to the next function.

Alternatively, you can use expression substitution. i.e.: sum(`range(1,3) -> _**_`)

--
 
 

nir izraeli

unread,
Sep 3, 2012, 3:03:08 PM9/3/12
to pyth...@googlegroups.com

just keep in mind that most of the reduction process can also run in parallel... the queue can be popped every time there're two or more items, pushing back the reduced value

--
 
 

Itzik Kotler

unread,
Sep 3, 2012, 3:31:12 PM9/3/12
to pyth...@googlegroups.com
True.

--
 
 

Reply all
Reply to author
Forward
0 new messages