Problem with lists

125 views
Skip to first unread message

theiostream

unread,
Oct 31, 2011, 10:43:17 PM10/31/11
to web.py
Hi,

After doing some messing with lists which I get from a db.select, I
keep getting an exception which is "already passed [depending-int]"
everytime I want to do some comparison to it using the if statement.

Doing some quick looking on webpy's code, I could see it's an
exception thrown by the own framework, although I don't really get its
reason... And is there a fix?

This is my code:


db = web.database(dbn='postgres', user='postgres', pw='', db='mydb')
def GET(self, person):
global db
p = db.select(person)

i = p[len(p)-1].rname
if i==None: return 'Error.'

return p[len(p)-1].rname

andrei

unread,
Nov 6, 2011, 6:22:50 AM11/6/11
to web.py
db.select returns webpy's flavored iterator, to get a list from it
wrap the result in list()

p = list(db.select(person))

Anand Chitipothu

unread,
Nov 6, 2011, 12:19:33 PM11/6/11
to we...@googlegroups.com
2011/11/6 andrei <andr...@gmail.com>:

> db.select returns webpy's flavored iterator, to get a list from it
> wrap the result in list()
>
> p = list(db.select(person))

You can even do this:

p = db.select(person).list()

I find it more convenient than wrapping it in list(...).

Anand

Dragan Espenschied

unread,
Nov 6, 2011, 3:47:55 PM11/6/11
to we...@googlegroups.com
> db.select returns webpy's flavored iterator,

By the way, what is the background of this feature?
I do not understand what are the advantages of these iterators over lists.
Could somebody knowledgeable explain?

I think I am missing out on something. :)

Thanks,
Dragan

Justin Davis

unread,
Nov 10, 2011, 12:39:05 AM11/10/11
to web.py
The rational is to use the results from the database cursor directly
and incrementally so you don't need to load all the results into
memory before acting on them. For instance, let's say you have a
table that has 100,000 rows in it, and you need to return all of the
results in JSON in a single call (a contrived example, but bear with
me). Let's say your handler did this:

def GET(self):
web.header('Content', 'text/json')
rows = db.select('foo').list()
return json.dumps(rows)

What would happen is this: webpy's database wrapper would query your
database for all the rows and would load them into memory, very likely
causing your program to run out of memory. On linux, it would probably
be killed by the OOM functionality in the kernel after thrashing
around your swap space for a while (I don't really know how other
systems would behave). And if the table of 100,000 rows doesn't do
that, there is some point at which this will run out of memory.

If instead we use the database cursor to read rows in one at a time
and feed them back to the client as soon as we read them in, we only
load into memory one row at a time. Consider this handler:

def GET(self):
web.header('Content', 'text/json')
rows = db.select('foo')
yield '['
for row in rows:
yield json.dumps(row) + ', '
yield ']'

(Yes, it's not technically correct JSON due to the trailing comma in
out the output, but let's pretend I handle that...) Now for each row
in the table 'foo', we read in a row, dump it to json and flush it. We
never load in the entire database table, so it uses substantially less
memory in the python process.

And that's pretty the largest issue that the database returns an
iterator. There are some other less obvious reasons -- for instance,
by reading in the results all at once you put a brief but substantial
load on the database server which probably isn't necessary since it
can likely serve database queries faster than python can process them.

Is it worth it? Frankly I don't think it should be the default setting
in webpy because of the confusion that springs up as a result. It's
confusing to new users immediately and only really makes sense for
advanced users familiar with databases and dealing with heavy load.

Hope that helps!
Justin

Dragan Espenschied

unread,
Nov 11, 2011, 2:11:44 PM11/11/11
to we...@googlegroups.com
Thank you very much Justin, now i understand it. It is indeed a good solution to
a problem.

However, I think most websites do "paged" database queries anyway, because too
much database thrashing also leads to too much browser thrashing :)

You are right, this default behaviour caused some confusion with me, especially
when I needed to use a record twice inside one page. And I think this case is
more likely than somebody querying for a million records and expecting it to work.

Bests,
Dragan

--
http://noobz.cc/
http://digitalfolklore.org/
http://contemporary-home-computing.org/1tb/

theiostream

unread,
Nov 23, 2011, 5:30:11 PM11/23/11
to web.py
Thanks, I ended finding out more about how iterators behave and why
they're there during my absence here. Thanks for the support anyway!

~ theiostream.

On Nov 6, 3:19 pm, Anand Chitipothu <anandol...@gmail.com> wrote:
> 2011/11/6 andrei <andre...@gmail.com>:

Andre Smit

unread,
Jan 6, 2012, 4:12:32 PM1/6/12
to we...@googlegroups.com
This has me stumped. I'm trying to apply the function as posted and trying to get rid of the trailing comma to ensure valid json. The following doesn't work:
 
def GET(self):
  web.header('Content', 'text/json')
  rows = db.select('foo') 
  lrows = len(rows)
  yield '['
  for i,row in enumerate(rows): 
    if i < lrows:
       yield json.dumps(row) + ', ' 
    else:
       yield json.dumps(row)
  yield ']'
any suggestions on how to handle the trailing comma? 
TIA
 

Tom

unread,
Jan 7, 2012, 3:21:10 AM1/7/12
to web.py
Should it be

If i< lrows-1 ?

[1,2,3] has a length of 3 but the last comma would be after i = 1
since counting starts at 0.

-tom

Branko Vukelic

unread,
Jan 7, 2012, 10:06:45 AM1/7/12
to we...@googlegroups.com
On Fri, 2012-01-06 at 13:12 -0800, Andre Smit wrote:
> This has me stumped. I'm trying to apply the function as posted and
> trying to get rid of the trailing comma to ensure valid json. The
> following doesn't work:
>
> def GET(self):
> web.header('Content', 'text/json')
> rows = db.select('foo')
> lrows = len(rows)
> yield '['
> for i,row in enumerate(rows):
> if i < lrows:
> yield json.dumps(row) + ', '
> else:
> yield json.dumps(row)
> yield ']'

Try this:

# I think this the correct header
web.header('Content-type', 'application/json')

# First convert rows into a list of JSON dumps
json_rows = [json.dumps(row) for row in db.select('foo')]

# Then just join them.
return '[%s]' % ', '.join(json_rows)


Branko Vukelic

unread,
Jan 7, 2012, 10:11:46 AM1/7/12
to we...@googlegroups.com
On Sat, 2012-01-07 at 16:06 +0100, Branko Vukelic wrote:
> Try this:
>
> # I think this the correct header
> web.header('Content-type', 'application/json')
>
> # First convert rows into a list of JSON dumps
> json_rows = [json.dumps(row) for row in db.select('foo')]
>
> # Then just join them.
> return '[%s]' % ', '.join(json_rows)

Have you tried dumping the whole list? Does it work? Like this:

# Dumping a list of rows as JSON
return json.dumps([row for row in db.select('foo')])

Let me know if this works, I'm very curious.

Andre Smit

unread,
Jan 10, 2012, 10:18:27 AM1/10/12
to we...@googlegroups.com
Had difficulty debugging my cgi script so used logging that indicated IterBetter instance has no attribute '__len__'
 
Suggested fixes are:
 
 len(list(rows))
 
or
 

d = web.query("SELECT COUNT(*) AS count FROM mytable")
print d[0].count

both of which IMHO defeat the benefit of iterBetter. My hack was to include dummy data as the last json element as in:

class HMA:
    def GET(self):
        rows = db.select('TX_CIT_HMA', order='SM_HMA')
        web.header('Content-Type', 'application/json')
        yield '{identifier: "SM_HMA", items: ['


        for row in rows:
                yield json.dumps(row) + ','

        yield '{"SM_HMA": "None", "TX_HMA": "None"}]}'

Not the best fix I admit. 

 

 

Andre Smit

unread,
Jan 10, 2012, 10:19:28 AM1/10/12
to we...@googlegroups.com
Yes this will work but defeats the benefit of iterBetter and yield.

Sasha Hart

unread,
Jan 10, 2012, 10:46:30 AM1/10/12
to we...@googlegroups.com
list(rows) forcibly consumes the iterator and loads all the data in memory at once, which is also defeating the purpose of having an iterator.
If you don't care about query time, getting the count from the db will work. 
Otherwise, my previous suggestion on how to deal with the trailing comma without len should work - you lose the trailing comma without either reading it all in to memory, or doing an additional query.

On Tue, Jan 10, 2012 at 9:19 AM, Andre Smit <freev...@gmail.com> wrote:
Yes this will work but defeats the benefit of iterBetter and yield.

--
You received this message because you are subscribed to the Google Groups "web.py" group.
To view this discussion on the web visit https://groups.google.com/d/msg/webpy/-/SzTGYLI-0asJ.

To post to this group, send email to we...@googlegroups.com.
To unsubscribe from this group, send email to webpy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/webpy?hl=en.

Reply all
Reply to author
Forward
0 new messages