Unable to Import Large Decks

945 views
Skip to first unread message

Netta J.

unread,
Dec 7, 2008, 3:57:24 PM12/7/08
to mnemosyne-proj-users
(I have searched the discussions, but wasn't able to find what I'm
looking for. I apologize if i have overlooked.)

I downloaded a deck of cards (10000+ I believe) and have tried to
import it onto Mnemosyne, but when I click OK for it to load, the
screen freezes causing me to close the program and attempt to start
again. The file size is about 6.0 MB. I'm not sure if it is the file
size or something I'm not doing correctly. Is there a way to correct
this problem?

Thanks in advance.

Oisín Mac Fhearaí

unread,
Dec 7, 2008, 7:35:27 PM12/7/08
to mnemosyne-...@googlegroups.com
2008/12/7 Netta J. <cCNe...@gmail.com>:

Have you tried just waiting longer for it to complete the import?
10000 cards is quite a lot.

MacB

unread,
Dec 7, 2008, 7:40:03 PM12/7/08
to mnemosyne-proj-users
Yes, I had the same problem, with a deck of 7500.

It took ages to load the deck in to Mnemosyne.

And the reverse, when I tried to delete the entire 7500 deck, it took
even longer.

MacB


On Dec 8, 1:35 pm, "Oisín Mac Fhearaí" <denpasho...@gmail.com> wrote:
> 2008/12/7 Netta J. <cCNett...@gmail.com>:

Peter Bienstman

unread,
Dec 8, 2008, 1:40:03 AM12/8/08
to mnemosyne-...@googlegroups.com, MacB
Wait and it should finish eventually. The current algorithmn is not very
efficient.

Peter
--
------------------------------------------------
Peter Bienstman
Ghent University, Dept. of Information Technology
Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium
tel: +32 9 264 34 46, fax: +32 9 264 35 93
WWW: http://photonics.intec.UGent.be
email: Peter.B...@UGent.be
------------------------------------------------

duncan

unread,
Dec 9, 2008, 2:14:58 AM12/9/08
to mnemosyne-proj-users


On Dec 8, 1:40 am, Peter Bienstman <Peter.Bienst...@ugent.be> wrote:
> Wait and it should finish eventually. The current algorithmn is not very
> efficient.
>
> Peter
>
> On Monday 08 December 2008 01:40:03 MacB wrote:
>
> > Yes, I had the same problem, with a deck of 7500.
>
> > It took ages to load the deck in to Mnemosyne.
>
> > And the reverse, when I tried to delete the entire 7500 deck, it took
> > even longer.
>
> > MacB
>
> > On Dec 8, 1:35 pm, "Oisín Mac Fhearaí" <denpasho...@gmail.com> wrote:
> > > 2008/12/7 Netta J. <cCNett...@gmail.com>:
> > > > (I have searched the discussions, but wasn't able to find what I'm
> > > > looking for. I apologize if i have overlooked.)
>
> > > > I downloaded a deck of cards (10000+ I believe) and have tried to
> > > > import it onto Mnemosyne, but when I click OK for it to load, the
> > > > screen freezes causing me to close the program and attempt to start
> > > > again. The file size is about 6.0 MB. I'm not sure if it is the file
> > > > size or something I'm not doing correctly. Is there a way to correct
> > > > this problem?
>
> > > Have you tried just waiting longer for it to complete the import?
> > > 10000 cards is quite a lot.

I'll take a look at this. If it is normal for imports to stall on such
small input I think it must be the case that the algorithm used is O
(nsquared) or worse. 10000 cards is not a lot. Importing 10000
independent items should be bottlenecked by the time it takes to read
them from disk- i.e. it should take at most an infinitesimal fraction
of a second for datasets 10000 times as large as 10000 items. Without
even looking at the code I am almost sure that the import algorithm,
as it stands, must perform a scan of all previously imported items,
for each imported item.... that would explain the O(nsquared)
behaviour. If that's not the case something similar must be going on.
At any rate the import _should_ be linear in the number of items, but
it seems to be quadratic or worse- maybe even exponential given your
report.

Never let it be said that I lit a candle when I could have cursed the
darkness- I revel in infamy. But in this case I will have a look at
the code, as I revel in correct algorithms as much as I revel in the
dark arts.... It ought to be possible to import items in time linear
with the number of items to be imported. I'm all for taking certain
shortcuts, but... this is not the place to explain why Python's
deficiencies make things like O(nsquared) algorithms inevitable when
naiive users program in Python.

Peter Bienstman

unread,
Dec 9, 2008, 2:53:43 AM12/9/08
to mnemosyne-...@googlegroups.com
A big speedup could already be gained by switching from the old xml parser
that was included with Python to one of the more modern variants that comes
with the python library (Ixml I think, but I don't remember exactly)

Patches welcome!

Peter

duncan

unread,
Dec 9, 2008, 8:00:13 AM12/9/08
to mnemosyne-proj-users


On Dec 9, 2:53 am, Peter Bienstman <Peter.Bienst...@ugent.be> wrote:
> A big speedup could already be gained by switching from the old xml parser
> that was included with Python to one of the more modern variants that comes
> with the python library (Ixml I think, but I don't remember exactly)
>
> Patches welcome!

I'll patch this. I wouldn't have cracked on it as hard as I did if I
weren't planning on it, actually. I tend to think you should only be
really critical of (free) software you are willing to help improve. I
do have a question though. It does sound like the increase in time
with number of items is greater than linear, and if that's the case
there is probably only one spot that needs to be fixed. If this is
inherent in the xml parser you're using I would consider it a bug in
that library, and I would not bother to look for a cause in your code-
I'd just switch in a better library. Do you know that your bottleneck
is in the xml parser?

Thanks
Duncan

Peter Bienstman

unread,
Dec 9, 2008, 8:12:06 AM12/9/08
to mnemosyne-...@googlegroups.com
On Tuesday 09 December 2008 14:00:13 duncan wrote:

> I'll patch this. I wouldn't have cracked on it as hard as I did if I
> weren't planning on it, actually. I tend to think you should only be
> really critical of (free) software you are willing to help improve. I
> do have a question though. It does sound like the increase in time
> with number of items is greater than linear, and if that's the case
> there is probably only one spot that needs to be fixed. If this is
> inherent in the xml parser you're using I would consider it a bug in
> that library, and I would not bother to look for a cause in your code-
> I'd just switch in a better library. Do you know that your bottleneck
> is in the xml parser?

Not really, know, although I do know that the current parser we use is the
slowest of the lot.

Cheers,

Peter

Oisín Mac Fhearaí

unread,
Dec 9, 2008, 8:44:21 AM12/9/08
to mnemosyne-...@googlegroups.com
2008/12/9 duncan <dunca...@yahoo.com>:

> It does sound like the increase in time
> with number of items is greater than linear, and if that's the case
> there is probably only one spot that needs to be fixed. If this is
> inherent in the xml parser you're using I would consider it a bug in
> that library, and I would not bother to look for a cause in your code-
> I'd just switch in a better library. Do you know that your bottleneck
> is in the xml parser?

I think this assumption should be tested rather than... well, assumed
:) O(n^2) isn't all that bad really; in this case, it could very well
just be that the xml parsing bottleneck is a relatively large constant
(usually ignored in big-O notation but it could be the limiting factor
here).

You could write a script that generates random decks of 100, 500,
1000, 5000, 25000 cards etc., and time an import for each, to get a
quick estimate of the rate at which the runtime increases, rather than
spending possibly unnecessary time looking at the algorithm first. I
would if I knew any python. :)

Oisín

duncan

unread,
Dec 9, 2008, 11:52:43 AM12/9/08
to mnemosyne-proj-users


On Dec 9, 8:44 am, "Oisín Mac Fhearaí" <denpasho...@gmail.com> wrote:

> I think this assumption should be tested rather than... well, assumed
> :) O(n^2) isn't all that bad really; in this case, it could very well
> just be that the xml parsing bottleneck is a relatively large constant
> (usually ignored in big-O notation but it could be the limiting factor
> here).

> You could write a script that generates random decks of 100, 500,
> 1000, 5000, 25000 cards etc., and time an import for each, to get a
> quick estimate of the rate at which the runtime increases, rather than
> spending possibly unnecessary time looking at the algorithm first. I
> would if I knew any python. :)


I'm torn here- you might have made a very funny deadpan joke, one that
I appreciate. Or you might have missed the point. That's the problem
with deadpan humor... it's hard to tell when people are joking ;).

At any rate (assuming that you meant that as a joke) you're correct
when you point out that it is impossible to empirically determine the
(Big-O, say) running time of an algorithm by observing its
performance, in the general case- no matter what n you choose it is
possible that a constant factor will dominate, over any finite range.
But it is often the case that you can make a good guess at what is
going wrong when what should be a fast operation is very slow. If you
see that it is fast for a small n but very slow for a somewhat larger
n, and if it looks like the algorithm is not linear, it's often the
case that the algorithm is not linear.

Anyway, if you're not joking, there is no reason to test the algorithm
with successively greater numbers of cards when you know that it
shouldn't take more than a second to import a million cards. If it
takes minutes to do what should happen in less than a second you
should just go figure out why that is the case.

Oisín Mac Fhearaí

unread,
Dec 9, 2008, 12:21:20 PM12/9/08
to mnemosyne-...@googlegroups.com
2008/12/9 duncan <dunca...@yahoo.com>:

> But it is often the case that you can make a good guess at what is
> going wrong when what should be a fast operation is very slow. If you
> see that it is fast for a small n but very slow for a somewhat larger
> n, and if it looks like the algorithm is not linear, it's often the
> case that the algorithm is not linear.

This is why I suggested testing for different values of n; by
examining the rate at which the runtime increases you can quickly get
an idea of whether it is linear or not. Informed by this heuristic,
you could decide whether to first examine the constant factor (XML
processing library bottleneck) if linear growth is observed or the
rest of the algorithm (scheduler?).

> Anyway, if you're not joking, there is no reason to test the algorithm
> with successively greater numbers of cards when you know that it
> shouldn't take more than a second to import a million cards. If it
> takes minutes to do what should happen in less than a second you
> should just go figure out why that is the case.

I wasn't joking, and hopefully the above explanation clarifies what I
meant. If it takes minutes to do what should happen in less than a
second, yes you should go figure out why... but not spend 3 hours
looking at the scheduler if it turns out that it's caused by the XML
parsing library, nor spend 3 hours replacing the XML library with a
faster one if it turns out to be a polynomial or worse runtime in the
scheduler. If a (relatively) quick test might save you going down the
wrong path looking for optimisations, then it's worth doing.

Oisín

Łukasz Różycki

unread,
Dec 9, 2008, 2:22:56 PM12/9/08
to mnemosyne-...@googlegroups.com
2008/12/9 duncan <dunca...@yahoo.com>:

> Without even looking at the code I am almost sure that the import algorithm,
> as it stands, must perform a scan of all previously imported items,
> for each imported item.... that would explain the O(nsquared)
> behaviour. If that's not the case something similar must be going on.

Well, maybe option "Check for duplicates when adding new cards" is
answer. However it doesn't explain why deleting cards takes so much
time.

Peter Bienstman

unread,
Dec 9, 2008, 2:57:59 PM12/9/08
to mnemosyne-...@googlegroups.com
On Tuesday 09 December 2008 20:22:56 Łukasz Różycki wrote:
> However it doesn't explain why deleting cards takes so much
> time.

That is because the listview widget is so slow. Switching to model-view should
be the answer there.

Peter

Hugh Chen

unread,
Dec 15, 2008, 8:14:44 PM12/15/08
to mnemosyne-proj-users
I once imported a deck of 73,000 cards and it took all night. So I had
to leave my computer on the whole night.
Reply all
Reply to author
Forward
0 new messages