iterselect

瀏覽次數:333 次
跳到第一則未讀訊息

Paolo Valleri

未讀,
2015年3月26日 上午11:23:052015/3/26
收件者:web2py-d...@googlegroups.com
Hi all, 
I've implemented an iterator version for select called iterselect.
It basically calls the new iterparse() which parses and yields one row at a time.

I've refactored and split the current parse() in order to re-use the common things between parse() and iterparse(); tha new version passed all the current tests, I'll write more later on.

let me know any comment you might have.

 Paolo

Niphlod

未讀,
2015年3月26日 下午4:38:302015/3/26
收件者:web2py-d...@googlegroups.com
good until here, but........
the "real deal" of "iterselecting" is not to parse a row at a time, it's to fetch one at a time too... i.e. fetchone() instead of fetchall().
"iterselecting" should bring the possibility to fetch millions of rows and only loading into memory one at a time, in a "streaming" fashion. "exporters" such as "as_csv()" would then be able to be created and used without killing the process.

Returning an "IterRows()" too is kinda of needed, because with iterselect it shouldn't be possible to fetch the "previous row" ("generator-style").
The total length of the IterRows() should be the cursor.rowcount, so you can know beforehand how many items you are iterating beforehand.

Paolo Valleri

未讀,
2015年3月26日 下午5:22:082015/3/26
收件者:web2py-d...@googlegroups.com
Hi niphlod,
In theory the main advantage of iterselect is to lower the memory footprint, only one row at a time is parsed and in theory 'consumed' by the application. It is a basic iterator and so, in this first version if you want to know the total length or access previous row, then you have to use common select.

Regarding fetchone, do you know any driver that support such streaming operation ? Otherwise I don't think that running many "select-one-record" instead of running "select-all-records" can be always better, it is a trade-off of many factors such as at least operation_time/network_latency/network_bandwidth and so on. Instead I'd fetch a predefined amount of rows, and the number of rows should also depend on the number of fields selected too. Anyway, this can be a second step.

 Paolo

--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Niphlod

未讀,
2015年3月26日 下午5:33:082015/3/26
收件者:web2py-d...@googlegroups.com

On Thursday, March 26, 2015 at 10:22:08 PM UTC+1, Paolo Valleri wrote:
Hi niphlod,
In theory the main advantage of iterselect is to lower the memory footprint, only one row at a time is parsed and in theory 'consumed' by the application. It is a basic iterator and so, in this first version if you want to know the total length or access previous row, then you have to use common select.

Exactly. try to fetch 1 million rows without parsing it. It still burns your memory to an insane amount.
The process - as it is right now - works more or less like this:
- ask the database a million rows
- fetchall() stores in RAM a million-long list of tuples
- parse() goes through each row and accumulates records to the Rows object <-- 2x or more memory occupied at this point
- parse() finishes, the fechall()ed resultset is - hopefully - garbage collected
- you're still holding in memory a million-long list of Row objects, that should take 1.x amount of memory (if 1million "raw" set occupies, e.g., 500MB of ram, the million-long Rows object is surely on the 600-700MB range)

iterselect should, instead:
- ask the database a million rows
- yield (fetchone()) a row
- yield (parse()) a Row <<-- "fetchoned" is garbage collected
- code does it stuff on it, like turning it to a csv "segment" and sends it to the client
- hopefully the Row is garbage collected
- the database sends the last line
and in all of this your memory stays as low as "one row" needs
 

Regarding fetchone, do you know any driver that support such streaming operation ?

it's on the DBAPI spec.

Massimo DiPierro

未讀,
2015年3月26日 下午6:01:312015/3/26
收件者:web2py-d...@googlegroups.com

I agree. We need this and it is not too hard.

--

Massimo DiPierro

未讀,
2015年3月26日 下午6:27:472015/3/26
收件者:web2py-d...@googlegroups.com

To be more precise. It should paginate select and loop parse and yield records. Fetch next page as needed.

Michele Comitini

未讀,
2015年3月26日 下午6:31:442015/3/26
收件者:web2py-developers
One good step to start with would be starting to have lazy Rows as result of a select().  Instantiantion or Row is really heavy, both memory and cpu wise.

Massimo DiPierro

未讀,
2015年3月26日 下午6:35:152015/3/26
收件者:web2py-d...@googlegroups.com

I disagree. This will only make things slower because of you are fetching you are planning to use them.

Niphlod

未讀,
2015年3月26日 下午6:37:072015/3/26
收件者:web2py-d...@googlegroups.com
I meant that as a "path to follow", and surely I have a pretty larger road-to-happyness on how DAL should ultimately behave :P I'm sure a bit of iterations on the code would finally lead to the "perfect way to handle everything".

<offtopic>
Did anyone try to parse a 7GB xml file in the usual way (parse()) ? And then finding about iterparse() ? Those were happy times: 20MB of used memory instead of 9GB.
</offtopic>

goes without saying, iterselect being a generator can only do "forwards": the only "nice addition" is that the lenght is pre-known, as the adapter is so kind to return the information beforehand.
iterselect will enable users to do heavy stuff on dbs, at the cost of not being able to cycle the resultset backwards or to fetch the previous record, or to cycle the resultset multiple times.... but that goes along with big statements and with the impossibility to support different behaviours in the code.
In the "big data world", scenarios like filtering through records using some heavy routines in python and storing just the rows we'd like in a simple list is becoming a popular demand.... storing the list for later usage would still be on charge of the application code (or on "helper methods" on the "IterRows" object, but that's another point).


Of course implementing something that fetches one-million-raw-rows and "evaluates" lazily the tuple to a row object only when needed is surely a step forward instead of instantiating a million of Row (Storage()) beforehand.

<dreams_for_the_future>
Then, we (and I've planned it for the future) should speed up the parse() process even more - expecting a zillion times faster - because parse() is really adapter-dependant, and we have only one parse() for them all. If you profile a large query, most of the time is spent on isinstance()ing the field and in the 95% of the cases it's not needed at all.
Every adapter docs publish a columntype --> pythontype mapping that should be used as a blueprint to alter DAL mappings of column types to python entities. Let's take psycopg2 as a reference.
Right now we assume every adapter is realllly dumb and we do lots of unnecessary checks. For every column. for every row.

E.g. psycopg2 for a datetime field returns a datetime object. Right now we do:
- value: is it unicode? nope!
- type: is it a custom_type ? nope!
- type: it's not a str ? nope!
- type: it's a str, let's mix and match some regexes.....is the type blablabla ?
- type: it's something we can parse with a parse_something() function!
- it's parse_datetime()!
- value: is it a datetime, really ?
- ok, let's leave it untouched
</dreams_for_the_future>

Niphlod

未讀,
2015年3月26日 下午6:40:122015/3/26
收件者:web2py-d...@googlegroups.com


On Thursday, March 26, 2015 at 11:27:47 PM UTC+1, Massimo Di Pierro wrote:

To be more precise. It should paginate select and loop parse and yield records. Fetch next page as needed.

Not sure if I understand.... I don't want to issue for a 40k recordset 40 queries asking for 1k rows at a time. paginating through a complex orderby is totally unefficient for lots of backends, AND, more importantly, it would break isolation

Michele Comitini

未讀,
2015年3月26日 晚上7:13:582015/3/26
收件者:web2py-developers
 if you make a select with execute SQL you go 100x faster than going through the standard DAL select().

One can do:

a_set = db(db.millionrows_table)

#fast
db.executesql(a_set.select())

#slow
a_set.select()

the profiler leaves no doubt on where the problem is: instantiation of Row




--

Michele Comitini

未讀,
2015年3月26日 晚上7:14:432015/3/26
收件者:web2py-developers
oops  sorry! read:

#fast
db.executesql(a_set._select())

Massimo DiPierro

未讀,
2015年3月26日 晚上7:34:182015/3/26
收件者:web2py-d...@googlegroups.com
I think there are two issues:
1) speed: problem is instantiation of rows
2) memory: the problem is fetching all rows at once

iterating over the results of executesql solves 2.
the only way to solve 1 is to not instantiate the rows (and we should have an option to do it)

doing it lazy only adds extra logic because if one is going to instantiate them anyway, then one does not solve the speed problem. One should not fetch rows one is not planning to use and we should not encourage that. There is value in not instantiating rows and looping over the raw data. I know people who do that for speed.

Massimo

Niphlod

未讀,
2015年3月26日 晚上7:59:132015/3/26
收件者:web2py-d...@googlegroups.com
uhm... we are making this very difficult to understand properly without naming things as not-so-uniquely-identifiable, and we're a bit digressing from the "iterselect" approach.

recap:

1) speed: istantiation of rows
2) memory: having the entire resultset on memory

iterating over the resultset fetched by the cursor (and iterating over the single Row, and iterating over the representation, and iterating.....optionally even further yielding the response in the wsgi interface.......until the final result is served to the client) solves 2.

saying "executesql" could point to iterating through what db.executesql() is right, but it's not: right now it's fetchall()ing, which translates to filling the memory for no reason on a "generator-like" resultset.

---------- end_of_iterselect_closely_related_things ---------------

speed can be solved in various ways (the simplest of all methods is lazying some things), but it's not going to be reaaaally faster for every scenario, just skippier for some of them, i.e. Rows[3] == Rows[5]

Michele Comitini

未讀,
2015年3月26日 晚上8:30:112015/3/26
收件者:web2py-developers
I suppose that a Rows object with all proxy/lazy elements is way way smaller than one full of Row instances with a large benefit in terms of RAM.
Rows could be the equivalent of a tuple with the Row class and the list of record tuples.  To avoid doubling records one should have the record tuples as proxies to the dbapi driver record instances, but I deem this last step as difficult to implement, due to DBAPI limitations.
 
<Rows> = (<Row>, <[[]]>)

Michele Comitini

未讀,
2015年3月26日 晚上8:32:322015/3/26
收件者:web2py-developers
I should have written:

To avoid doubling records *in RAM* one should have the record tuples as proxies to the dbapi driver record instances

Paolo Valleri

未讀,
2015年3月27日 凌晨3:51:312015/3/27
收件者:web2py-d...@googlegroups.com
Hi all,
@michele, Row should be a basic container, we should delve into the details to understand why it is so slow according to your figures.
A first step can be a revised (and backward compatible) version that instantiate only one Row per db_row; at the moment for each db_row we instantiate two or more Row objects (it depends on the number of tables involved).
However, If your query involves only one table, the class Rows has the method __getitem__ to translate
<Row {'t0': {'id': 1L, 'name': 'web2py'}}>
in
<Row {'id': 1L, 'name': 'web2py'}>
We could try to avoid the creation of  the two Row objects. This can be a first speed up.
Mind that Rows take a compact parameter, but parse instantiate it as True, the user can still set it to False.

In addition, Row is very similar to Storage, I while ago Storage got some performance improvements, I don't think Row was involved. @Massimo what do you think about making Row a sub class of Storage? (Mind that Storage is part of web2py)

Regarding iterselect, I'll see how to integrate the fetchone method of the DBAPI, in this case, I don't think it is simple to cache the result.

 Paolo

Michele Comitini

未讀,
2015年3月27日 清晨5:23:372015/3/27
收件者:web2py-developers

I agree Row should be reimplemented as a Storage (dict) subclass if they are not already, that would make them thinner.
Figures are easy to replicate: populate() a table with 100K rows table and do a select with dal and without. 

Paolo Valleri

未讀,
2015年3月27日 清晨5:41:512015/3/27
收件者:web2py-d...@googlegroups.com
Ok but Storage is defined in web2py. How should we use it in pydal ? 

I've updated iterselect to use fetchone, it is still based on the yield keyboard. Please have a look at/test it.

If the network latency is high, a further option for iterselect could be the use of fetchmany instead of fetchone ?(https://www.python.org/dev/peps/pep-0249/#fetchmany).

 Paolo

Niphlod

未讀,
2015年3月27日 清晨6:43:182015/3/27
收件者:web2py-d...@googlegroups.com


On Friday, March 27, 2015 at 10:41:51 AM UTC+1, Paolo Valleri wrote:
Regarding iterselect, I'll see how to integrate the fetchone method of the DBAPI, in this case, I don't think it is simple to cache the result.

You can't cache a generator. caching an iterselect should raise an error.
 

If the network latency is high, a further option for iterselect could be the use of fetchmany instead of fetchone ?(https://www.python.org/dev/peps/pep-0249/#fetchmany).

network latency doesn't matter. the connection is always kept open.

Paolo Valleri

未讀,
2015年3月30日 凌晨2:25:102015/3/30
收件者:web2py-d...@googlegroups.com
Someone has tested it with a lot of rows?
@massimo if you agree I'll post a PR after the new release

 Paolo

--

Massimo DiPierro

未讀,
2015年3月30日 凌晨2:28:272015/3/30
收件者:web2py-d...@googlegroups.com
what would be in the PR exactly?

Paolo Valleri

未讀,
2015年3月30日 凌晨2:32:552015/3/30
收件者:web2py-d...@googlegroups.com
Basically this branch https://github.com/ilvalle/pydal/tree/refactor-parse
It introduces iterselect() to lazy fetch (and parse) rows from the database

 Paolo

Niphlod

未讀,
2015年3月30日 下午3:14:322015/3/30
收件者:web2py-d...@googlegroups.com
I'm going to test it in a taddle bit. Reading the diff seems lacking of the "iterrows" discussed ealier, but let's see how it behaves.

Niphlod

未讀,
2015年3月30日 下午6:30:122015/3/30
收件者:web2py-d...@googlegroups.com
usual testbed, *buntu 12.04, python 2.7.8 64 bit.
test table with 2 string fields, with random data, 655360 records, 129 MB of raw postgresql data.

for row in db(db.table.id>0).select():
       rtn = row

vs

for row in db(db.table.id>0).iterselect():
     rtn = row

speed: 56 sec "select()", 49 sec "iterselect()".
That's pretty much expected. Creating half-a-zillion Row objects is not going to save from any troubles with speed, but we already know that's probably for another topic.
The speed difference is probably due to the fact that python has to build one more list on "select()" rather than on "iterselect()", and of course addressing memory (even if my rig is on the "high range" of hardware) costs time.

memory consumption: 1.6GB "select()", 89MB "iterselect()".
I'd say the path is layed down and as it is right now it's an enormous +1, at least on my POV. Thanks to @paolo for finally taking a "long-lived-chance-for-improvement-suggestion" and make it happen.

Massimo DiPierro

未讀,
2015年3月30日 下午6:38:042015/3/30
收件者:web2py-d...@googlegroups.com
OK. This makes sense. We should got for it. We can improve it later in order to work on speed.

Niphlod

未讀,
2015年3月30日 下午6:51:302015/3/30
收件者:web2py-d...@googlegroups.com


On Tuesday, March 31, 2015 at 12:38:04 AM UTC+2, Massimo Di Pierro wrote:
OK. This makes sense. We should got for it. We can improve it later in order to work on speed.

IMHO , apart from marking it clearly as EXPERIMENTAL, we still need a different "resultset holder" object instead of a pure generator.

Massimo DiPierro

未讀,
2015年3月30日 下午6:52:532015/3/30
收件者:web2py-d...@googlegroups.com
why not a generator? perhaps the resultset object should be iterable?

Niphlod

未讀,
2015年3月30日 晚上7:04:382015/3/30
收件者:web2py-d...@googlegroups.com
my main concern, although not really a limit-per-se, is that a generator has no length.

Working with a generator when you expect something like a resultset isn't entirely newbie-friendly.....
you're forced to do a .next() to see if there are results in it, and either save the result somewhere (or you loose it), and wrap it in a try:except in case you face StopIteration right away....see my point ?

we can have a "masking" object that just has the length (an information readily available in the cursor) and that behaves as a generator when you iter over it (and, since we're there, also handy __nonzero__ and first(), that should be perfectly doable).

Massimo DiPierro

未讀,
2015年3月30日 晚上7:39:412015/3/30
收件者:web2py-d...@googlegroups.com
This is easy. Why don’t we do it now? I can do it after the merge, unless you want to do it.

Paolo Valleri

未讀,
2015年3月31日 凌晨2:54:362015/3/31
收件者:web2py-d...@googlegroups.com
Hi Niphold and thanks for the test.

- the parse method actually creates 2 or more Row objects for each record: 1 Row is the container + 1 Row for each involved table. However if you select one table, only the internal Row is returned to the user, the other one is removed (unless compact is False, which is more than rare). In you test, for half-a-million db records you have basically created 1 million Row objects (half-a-million Rows for nothing). Given that, I'll try to work on an other patch to create only one Row object per record iff only one table is selected (if compact will be set to False, a new Row will be created on the fly). this could be a speed up for both select() and iterselect().

- I wrote an iterselect that uses fetchmany(), it takes the parameters fetchmany, would you mind to test it with different values (10,100,1000)? https://github.com/ilvalle/pydal/tree/refactor-parse-fmany 

 Paolo

Niphlod

未讀,
2015年3月31日 凌晨3:36:062015/3/31
收件者:web2py-d...@googlegroups.com
aside from the fact that I'm a good tester (pun intended) ....... did anyone else test it ? it's not that difficult.


PS: fetchmany exists, but what's the rationale behind fetchmany if we have fetchone ?

Leonel Câmara

未讀,
2015年3月31日 上午10:03:212015/3/31
收件者:web2py-d...@googlegroups.com
Looks good to me. There's something that annoys me about _parse as I think it repeats a lot of work for every row, but it's not an iterselect/iterparser specific problem.

Paolo Valleri

未讀,
2015年4月4日 清晨5:49:502015/4/4
收件者:web2py-d...@googlegroups.com
Rows has the parameter 'compact', if it's set to False, the user has to write
row.table.field
even if only one table is involved in the select.

The current implementation of iterselect doesn't have any "compact" parameter. 
If only one table is involved in the iterselect, the user  has to write:
row.field

Should we want to support it in iterselect? something like ?
db(db.table).iterselect(compact=False) ?



 Paolo

2015-03-31 16:03 GMT+02:00 Leonel Câmara <leonel...@gmail.com>:
Looks good to me. There's something that annoys me about _parse as I think it repeats a lot of work for every row, but it's not an iterselect/iterparser specific problem.

--

Leonel Câmara

未讀,
2015年4月4日 清晨6:49:572015/4/4
收件者:web2py-d...@googlegroups.com
I think we will want to support more than just compact. I think we should return an IterRows object, a new class that would implement this iterator and would have many of the other Rows methods namely I would like it to have render implemented. I do think the current implementation is ok for now and we can work on its limitations in the future.

Paolo Valleri

未讀,
2015年4月8日 清晨7:32:032015/4/8
收件者:web2py-d...@googlegroups.com
@niphold, I've worked on iterRows in order to get the length of rows in the iterator. 
Unfortunately not all adapters update correctly cursor.rowcount (https://www.python.org/dev/peps/pep-0249/#rowcount)
I've tested the following drivers: sqlite3, pypyodbc, pg8000&psycopg2. Only pg8000&psycopg2 update cursor.rowcount, the other return -1.
Do you think is it still worth to have something like iterRows?

 Paolo

2015-04-04 12:49 GMT+02:00 Leonel Câmara <leonel...@gmail.com>:
I think we will want to support more than just compact. I think we should return an IterRows object, a new class that would implement this iterator and would have many of the other Rows methods namely I would like it to have render implemented. I do think the current implementation is ok for now and we can work on its limitations in the future.

--

Niphlod

未讀,
2015年4月8日 中午12:14:082015/4/8
收件者:web2py-d...@googlegroups.com
groan....... having the length beforehand was a no-brainer implementation-wise.......still wrapping a generator inside something that doesn't except when is empty, or being able to call, e.g., first() will be more pleasant than a pure generator (also __nonzero__ )

Paolo Valleri

未讀,
2015年4月10日 凌晨1:31:462015/4/10
收件者:web2py-d...@googlegroups.com
@hiphlod,@all have a look at https://github.com/ilvalle/pydal/tree/iterRows
iterselect now returns an instance of iterRows. for the time being iterRows is basically an empty class with no particular methods. 
Which methods do you foresee for iterRows?
I can either publish iterRows as is so you can directly add the methods you are interested in, or I can try to extend it as well.

 Paolo

Niphlod

未讀,
2015年4月10日 凌晨2:48:462015/4/10
收件者:web2py-d...@googlegroups.com
unfortunately, if I had time to dedicate you'd have seen it sooner....

first()
__iter__
__getitem__
__nonzero__

Paolo Valleri

未讀,
2015年4月10日 凌晨3:37:532015/4/10
收件者:web2py-d...@googlegroups.com
__item__ is not so common for an iterator. How should you implement it ?
lets take an IterRows of 100items.
if you do IterRows[50], what happen to the items from 0 up to 49? Should I drop them or I've to save them somewhere.

regarding first(). if I do:
IterRows.first()
for e in IterRows:
   # loop
The loop starts from the first or from the second element in the list?

regarding __nonzero__, I'll pre-fetch the first element, and reuse it as first element in the loop.

 Paolo

Leonel Câmara

未讀,
2015年4月10日 清晨5:12:402015/4/10
收件者:web2py-d...@googlegroups.com
I would say go ahead and publish the simplified version. I would like to add a few methods myself.

Niphlod

未讀,
2015年4月10日 清晨5:17:432015/4/10
收件者:web2py-d...@googlegroups.com


On Friday, April 10, 2015 at 9:37:53 AM UTC+2, Paolo Valleri wrote:
__item__ is not so common for an iterator. How should you implement it ?
lets take an IterRows of 100items.
if you do IterRows[50], what happen to the items from 0 up to 49? Should I drop them or I've to save them somewhere.

drop them. IterRows[50] returns the 50th element. IterRows[49] is doomed (and should raise an exception). IterRows[75] still works.
 

regarding first(). if I do:
IterRows.first()
for e in IterRows:
   # loop
The loop starts from the first or from the second element in the list?

from the first. When someone calls "first()", that only row is stored somewhere in IterRows ("_head" ?) and returned. When someone does a loop, the first element is the first fetchone()d if there's no _head, _head if it's there already. 
 

regarding __nonzero__, I'll pre-fetch the first element, and reuse it as first element in the loop.

Nope. If someone does an iterselect and doesn't use "if resultset" there's no need to pre-fetch anything.

Paolo Valleri

未讀,
2015年4月10日 下午1:27:542015/4/10
收件者:web2py-d...@googlegroups.com
https://github.com/web2py/pydal/pull/134 contains the mentioned methods. Let me know any further suggestion you might have.

 Paolo

Leonel Câmara

未讀,
2015年4月10日 下午1:49:452015/4/10
收件者:web2py-d...@googlegroups.com
What's the use case for this weird __getitem__?

Niphlod

未讀,
2015年4月13日 下午3:11:382015/4/13
收件者:web2py-d...@googlegroups.com
being more resemblant to a list.

Leonel Câmara

未讀,
2015年4月13日 下午4:15:002015/4/13
收件者:web2py-d...@googlegroups.com
I don't think that's a good reason. The name clearly indicates it's a generator, people can just make it a list if they want to, no generator ever implements a __getitem__. Even __nonzero__ which I consider useful sort of annoys me as it introduces the overhead of checking for the head to every step and forces the implementation of a custom __iter__.  
  
The reason I wanted an IterRows class was so that it could implement some methods that are useful in Rows like render, as_dict, as_json, etc. Not to make it look like a list. Render would be a prime example of where IterRows would shine since you usually aren't interested in the original Rows if you are rendering.

BTW: The code for Rows.render is quite hard to read thanks to the stupid pep 8 line width limit. I'm all for pep8 but the 80 char line width limit is idiotic if it's going to make things harder for a human to parse.

Niphlod

未讀,
2015年4月13日 下午4:33:482015/4/13
收件者:web2py-d...@googlegroups.com


On Monday, April 13, 2015 at 10:15:00 PM UTC+2, Leonel Câmara wrote:
I don't think that's a good reason. The name clearly indicates it's a generator, people can just make it a list if they want to, no generator ever implements a __getitem__. Even __nonzero__ which I consider useful sort of annoys me as it introduces the overhead of checking for the head to every step and forces the implementation of a custom __iter__.  

people (me too) are used to do

if result:
   blablabla...

with a pure generator, you'd be forced to wrap each time a loop wihin a try:except StopIteration. I don't think the 'overhead' of inspecting _head is that much you'd like to clutter your app's code... or am I missing some way to check it in python with a concise syntax ?

 
  
The reason I wanted an IterRows class was so that it could implement some methods that are useful in Rows like render, as_dict, as_json, etc. Not to make it look like a list. Render would be a prime example of where IterRows would shine since you usually aren't interested in the original Rows if you are rendering.


me too. Grid's exporters would enormely benefit from Iterselect, and I made a PR that reduces the calls to database for references records already. That doesn't prevent anything:  a generator with a __getitem__ is NOT a list. Even if you can still do result[5]. The point is that IF you do result[5], you won't be able to do result[4]. But why obscuring the fact that YOU CAN slice with a simple object that masks the raw generator ?
 
BTW: The code for Rows.render is quite hard to read thanks to the stupid pep 8 line width limit. I'm all for pep8 but the 80 char line width limit is idiotic if it's going to make things harder for a human to parse.

Preaching to a quire, I'm all for NOT following all that pep8 dictates. Still, there are users complaining about their IDE "marking code as red because not pep8 compatible" that sends PR and they get merged...... not sure why.
It's not the first time either that I complain about unreadable-code which I like to call "let's win a contest to make it less LOC possible" (see parse_as_rest() as an example): that goes quite on the opposite direction, but as a general rule I like readable code over pep8 or "less LOC" any time.


Paolo Valleri

未讀,
2015年4月14日 凌晨3:14:452015/4/14
收件者:web2py-d...@googlegroups.com
Currently IterRows contains a few methods,I've implemented the ones proposed by niphlod. I like __nonzero__, I agree that __item__ isn't so common, but surely niphold has his own use cases, given that it isn't a big deal to have it for someone else also.

Regarding as_list, as_dict etc, I'd wait https://github.com/web2py/pydal/pull/129 to be merged.

Mind that the current iterselect doesn't support old style virtual fields. Given that, if you want to use it in place of select you have to guarantee backward compatibility. However, it isn't difficult to support old style virtual fields, just let me know. In addition with iterselect you aren't able to cache the query.

 Paolo

2015-04-13 22:33 GMT+02:00 Niphlod <nip...@gmail.com>:


On Monday, April 13, 2015 at 10:15:00 PM UTC+2, Leonel Câmara wrote:
I don't think that's a good reason. The name clearly indicates it's a generator, people can just make it a list if they want to, no generator ever implements a __getitem__. Even __nonzero__ which I consider useful sort of annoys me as it introduces the overhead of checking for the head to every step and forces the implementation of a custom __iter__.  

people (me too) are used to do

if result:
   blablabla...

with a pure generator, you'd be forced to wrap each time a loop wihin a try:except StopIteration. I don't think the 'overhead' of inspecting _head is that much you'd like to clutter your app's code... or am I missing some way to check it in python with a concise syntax ?

 
  
The reason I wanted an IterRows class was so that it could implement some methods that are useful in Rows like render, as_dict, as_json, etc. Not to make it look like a list. Render would be a prime example of where IterRows would shine since you usually aren't interested in the original Rows if you are rendering.


me too. Grid's exporters would enormely benefit from Iterselect, and I made a PR that reduces the calls to database for references records already. That doesn't prevent that a generator with a __getitem__ is NOT a list. Even if you can still do result[5]. The point is that IF you do result[5], you won't be able to do result[4]. But why obscuring the fact that YOU CAN slice with a simple object that masks the raw generator ?
 
BTW: The code for Rows.render is quite hard to read thanks to the stupid pep 8 line width limit. I'm all for pep8 but the 80 char line width limit is idiotic if it's going to make things harder for a human to parse.

Preaching to a quire, I'm all for NOT following all that pep8 dictates. Still, there are users complaining about their IDE "marking code as red because not pep8 compatible" that sends PR and they get merged...... not sure why.
It's not the first time either that I complain about unreadable-code which I like to call "let's win a contest to make it less LOC possible" (see parse_as_rest() as an example): that goes quite on the opposite direction, but as a general rule I like readable code over pep8 or "less LOC" any time.


Giovanni Barillari

未讀,
2015年4月28日 清晨7:18:332015/4/28
收件者:web2py-d...@googlegroups.com
@paolo can you post a "status update" about this? Next steps?

/Giovanni

Il giorno giovedì 26 marzo 2015 16:23:05 UTC+1, Paolo Valleri ha scritto:
Hi all, 
I've implemented an iterator version for select called iterselect.
It basically calls the new iterparse() which parses and yields one row at a time.

I've refactored and split the current parse() in order to re-use the common things between parse() and iterparse(); tha new version passed all the current tests, I'll write more later on.

let me know any comment you might have.

 Paolo

Paolo Valleri

未讀,
2015年4月28日 清晨7:41:502015/4/28
收件者:web2py-d...@googlegroups.com
master branch contains IterRows and iterselect. Everyone can use and test it.
Possible next steps in pydal are:
- introduction of as_dict, as_list etc, (Can Serializable be used?)
- use iterselect in place of select (only if cache is None)

web2py:
- use iterselect in grid/smargrid (however this requires to add support for old style Virtual fields in iterselect). @Massimo what do you think? is it worth?

 Paolo

--
回覆所有人
回覆作者
轉寄
0 則新訊息