RFC: Multiple Data Stream in One Data Flow

20 views
Skip to first unread message

Itzik Kotler

unread,
Sep 2, 2012, 10:07:33 AM9/2/12
to pyth...@googlegroups.com
Hi all,

Taking a very simple dataflow oriented task such as: process a text file, search and replace certain patterns, save it (to the same file):

    open('foobar.txt','r') -> _.read() -> _.replace('Zoom', 'Zuboom') -> open('foobar.txt', 'w').write(_)

But what if the filename is not fixed, how will the latter open() will know which filename to open and write into?

I have drafted 4 implementations (some already-existing, some not) that make it work, feel free to comment, as well as to suggest new implementations:

1. Using Variables

    Using variables to store objects, so accessing it directly is made possible:

    f = open('foobar.txt', 'rw') -> f.read() -> _.replace('Zoom', 'Zuboom') -> f.write(_)

    Note: This method is already working today.

    Also, to maintain brevity and readability, a new C-style pointer dereference syntax will be introduced to dereference a variable assignment:

    *[f = open('foobar.txt', 'rw')].read() -> _.replace('Zoom', 'Zuboom') -> f.write(_)

2. Using Metadata

    Metadata will contain the latest filename opened on the flow, so passing as a function argument is made possible (using Python String Formatting Syntax, or otherwise):

    glob.glob('*.TXT') -> open(_, 'r').read() -> _.replace('Zoom', 'Zuboom') -> open(%(FILENAME)s, 'w').write(_)

    Alternative (or, in addition) metadata will contain the returned file object, so accessing it directly is made possible:

    glob.glob('*.TXT') -> open(_, 'r').read() -> _.replace('Zoom', 'Zuboom') -> %(FILEOBJECT).write(_)

3. Using Stack

    A LIFO stack that will be made available per flow to store items on the flow, so accessing them later will be made possible:

    push(open('foobar.txt','rw')) -> _.read() -> _.replace('Zoom', 'Zuboom') -> pop().write(_)

    Note: This method already working today, using a list() as a stack. The only difference is introducting the push() and pop() as official keywords.

4. Using Advanced Data Structures

    Wrapping open() and related functions in a wrapper that returns an advanced data structures that is able to contain multiple data types. e.g.:


    def FileReader(filename, modes):

        return {'FILENAME': filename, 'DATA': open(filename, modes).read()}


    def FileWriter(context, data):

        open(context['FILENAME'], 'w').write(context['DATA'])


    def ReplaceData(context, from, to):

        return dict(context, DATA=context['DATA'].replace(from, to))


    Then:

    FileReader('foobar.txt', 'r') -> ReplaceData(_, 'Zoom', 'Zuboom') -> FileWriter


Regards,
Itzik Kotler

nir izraeli

unread,
Sep 2, 2012, 1:31:06 PM9/2/12
to pyth...@googlegroups.com
What about the following syntax:
"file.txt" -> {"FILE_HANDLE": open(_)} -> _.read() -> _.replace("a", "b") -> pop("FILE_HANDLE").write(_)

that way you get a dictionary instead of a stack - position is not relevant. and its very readable.
the object used as the _ for the next operation might be marked by prefixing it with an "_" or something.
obviously the dict can be wrapped with a push() or extend() or something, and a peek function should also exist (to use an object more than once.
the syntax is more explicit than using vars (and vars are a bit strange when writing a data-flow language) and that way you can also implicitly free them.
on the other hand using vars is simpler so it might be preferred.

 - Nir
 

--
 
 

Guy Adini

unread,
Sep 2, 2012, 2:59:27 PM9/2/12
to pyth...@googlegroups.com
My 2 cents:

  1. _{1} or _<1>: I really like both.
  2. Nir's suggestion - I like the syntax a lot. I don't see why we need pop though. We could just use _{"FILE_HANDLE"} to access these string-formatting like variables, and _<j> to access the j'th underscore.
  3. Variables and dereferencing: I don't like anything that has variables in it. It doesn't feel right, at least to me.
  4. Metadata, advanced data structures: nice, but too implicit for my taste (you need to really know a lot to know what's being referenced. Might make other people's code unreadable).
    These might be a nice as an addition to some other method though, IMHO.

--
 
 

nir izraeli

unread,
Sep 2, 2012, 3:28:43 PM9/2/12
to pyth...@googlegroups.com
_{"FILE_HANDLE"} should still allow using only the "_", and than your syntax will be confused with using _ as a dict.
you could only use <> to access the older pushed variables and by forcing the pushed dict to only have strings will let you use numbers to access by position (thought it's more prone to errors so you might won't want to allow it)
so these are valid:
{"a": blah.geta()}  -> {"0": blah.get1()}
while this isnt:
{0: blah.get0()}

<"0"> will give blah.ge01(),
<"a"> and <0> will both give blah.geta() since geta() was pushed first and is in the zeroth position in list.

 - Nir

--
 
 

Guy Adini

unread,
Sep 2, 2012, 3:43:56 PM9/2/12
to pyth...@googlegroups.com
Sounds good to me.
Even though _{"bla"} isn't a dict syntax - but I agree that it looks too similar.

--
 
 

Itzik Kotler

unread,
Sep 2, 2012, 3:46:04 PM9/2/12
to pyth...@googlegroups.com
Interesting.

Right now, literal dict() is treated as a switch statement w/o fall through. i.e.

1 -> {1: 'One', 2: 'Two'} -> print

will print "One".

Having said that, I have made an exception for a private case where dict() is a flow-stater. i.e.

{'foobar': 'foobar'} -> print

Will print "{'foobar':'foobar'}".

Adjusting it to fit your example will result in:

{"FILE_HANDLE": open('file.txt')} -> _.read() -> _.replace("a", "b") -> pop("FILE_HANDLE").write(_)

But for a glob.glob() case it will won't work, as the glob.glob() output will be regarded as a input for switch.

There are a few ways around it:

1. Replace '{}' with '<>' an imaginary dict per flow (or per program).

2. Replace '{}' with '%()' syntax (e.g. "open('foobar.txt', 'r') -> %(FILEOBJECT) -> ... -> %(FILEOBJECT).close()')

3. Using functions like var() to store/get data, but that's an alternative syntax for var assignment.

On Sun, Sep 2, 2012 at 8:31 PM, nir izraeli <nir...@gmail.com> wrote:
--
 
 

Reply all
Reply to author
Forward
0 new messages