mnemosyne research data validation

24 views
Skip to first unread message

normunds

unread,
Feb 6, 2011, 10:41:44 AM2/6/11
to mnemosyne-proj-users
I wonder is there a consistent interpretation of mnemosyne gathered
data possible. Has anybody analysed it and what assumptions do you
make and how do you filter of "wrong data" from "regular ones". Or if
not, how do you try to account for presence of some unexpected use
patters.

With "wrong data" I mean the recall data that does not follow from
"normal use" of mnemosyne. I feel myself I'm creating two kinds of
"wrong data" (so I've disabled the log upload :-). One kind is:
1) I'm doing a course and using associated wordlists, then
2) I take another course, create new wordlists
3) some of the words overlap. So either
a) I continue using the old lists so getting some words that gets
acquired 2x as quick as they are listed in two lists
b) switch to the new list completely - still some words get learned
pretty quick as they were already part of know vocabulary
4) some time later I repeat the first course and so activate the
corresponding worlist. Now words that I have enforced through the
second course stay miraculously learned after many montsh of pause;
some that were not present in the 2nd wordlist fall down on level 1 -
even if afterwards I pick them up pretty quick again

Well I guess this first type could be resumed as influence of
"duplicate entries". I have them a lot - mostly for the reason the I
keep the wordlists matching the courses.

The other type is - I have struggled with with inverse entries. The
way I would prefer to do it is to study about "Assimil" style - where
you get 1st "passive" wave and you start second "active" wave after
you have completed half of the course in "passive mode". Translating
this to inverse entries, I would not like to learn them until well
later at "second wave" - maybe weeks or months since the first
introduction.

However with mnemosyne I do not have much choice - if I create 3-sided
cards and import the file, I get it both ways. So up till now my
method to "deal with it" is to mark all inverse entries as "well
learned". I imagine this kind of approach definitely screws up the
statistics and is a bother anyway. And I have little control about
when to start learning the inverse entry as for some time I just keep
pushing them in future:-)

About the only alternative for me is to create two two-sided sets, and
import the "inverse set" later when I want to start using them. I'm
about to try this now.

But my question was rather about how much of this kind of irregular
use patterns could invalidate the overall data?

I imagine even picking up the mnemosyne deck after a few years break
can be unpredictable. I could have learned the language well in the
meantime using other means and now pull out the mnemosyne to "check
myself", or inversely have done nothing (maybe have been studying
another language) and my recall is catastrophic. So what can I imply
about the long term memory if I do not know which one of these
patterns has been followed? I can probably filter off all cases of
long disuse of cards - use the data sequences only while the deck is
in active, being at least in weekly use.

Peter Bienstman

unread,
Feb 7, 2011, 3:45:28 AM2/7/11
to mnemosyne-...@googlegroups.com
On Sunday, February 06, 2011 04:41:44 pm normunds wrote:
> I wonder is there a consistent interpretation of mnemosyne gathered
> data possible. Has anybody analysed it

Not really, Mnemosyne 2.0 is my priority now.

> and what assumptions do you
> make and how do you filter of "wrong data" from "regular ones". Or if
> not, how do you try to account for presence of some unexpected use
> patters.

It's an enormous dataset, with many thousands of users. The assumption is that
any 'particularities' will be just noise and overshadowed by 'regular'
entries.

Cheers,

Peter

normunds

unread,
Feb 8, 2011, 5:57:37 AM2/8/11
to mnemosyne-proj-users
ok, that could be an assumption of course. I more or less expected
that. Anybody who uses mnemosyne regularly, adds all words he is
using, has just some some small random influence on particular items
by other methods is the "regular". and everybody else creates
noise...

Just thought it's pretty hard to assess the noise level - as checking
on myself I found that me alone am creating 3 types of "noise". But
then again, probably it is possible to identify what kind of noise
gets generated by the particular "misuse" pattern and try to asses the
level/filter it off. And datasets with big interruptions can be
included or excluded to see how they modify the picture. Good luck,
and I hope this data plan brings some interesting results.

In fact I think that info about item being a duplicate to other items
could be included in the upload as well. Of course there are still
ways to introduce them without getting "caught".. I keep different
courses in different databases, so duplicates would never show up as
such :-/
Reply all
Reply to author
Forward
0 new messages