On 09/11/10 09:40, Aiman Kennan wrote:
> Hello,
>
> First I would like to thank you all for this wonderful program, I new
> to it but I have some questions.
>
> 1) Can we set a limit to return results (ex in mysql LIMIT 200)
When are you wanting to limit the number of records - when importing?
When analysing? One option is to use the Filter button and to filter by
sofa_id <= 200. Is that what you need?
> 2) Can we remove the limit of 5000 cells
Yes - it is open source interpreted code so you can do lots of things as
required :-) On line 630 of report_table.py you can edit (carefully)
the following:
script_lst.append(u"max_cells = 5000")
NB no commas e.g. 5000 not 5,000.
NB It may take a while to display if there are lots and lots of cells.
All the best, Grant
> Regards,
>
> Aiman Kennan
>
That's a great idea and I have added it to the list of features I'd like
to add in a future version. The logical place to access it would
probably be the filter dialog. Implementing it would take a bit of
thought because of the massive performance differences likely between
different strategies. Making some sort of temporary table seems a
potentially good strategy, possibly into the database we are working
with e.g. MySQL*. It could also be possible to combine filters e.g. I
might want to look at gender = 1 for a 5% sample. It would make sense
to apply the sampling filter first (resulting in a one off temporary
table-making process) and then apply any filters onto that for specific
analyses just as we would off a normal, unsampled table.
All the best, Grant
* Note to self: Making a temporary table into SQLite could also work
(see http://www.sqlite.org/tempfiles.html) but SOFA would need to reset
the dbe, con, cur etc information whenever that table was selected.
Could be trivial but it would pay to be careful.
In the meantime, the best strategy might be to make such a table
yourself and then run SOFA on top of that. It looks quite simple:
> ORDER BY RAND() combined with LIMIT is useful for selecting a random
> sample from a set of rows:
>
> mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY
> RAND() LIMIT 1000;
>
> RAND() is not meant to be a perfect random generator. It is a fast way
> to generate random numbers on demand that is portable between
> platforms for the same MySQL version.
>
> http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html
It may be possible to implement this with a number of different types of
database:
http://www.carlj.ca/2007/12/16/selecting-random-records-with-sql/
Is that useful?
In any version implemented in SOFA I would need to think very hard about
performance issues and perform lots of tests. My instinct is towards
letting the SQL database engine do the heavy lifting for which it is
optimised.
All the best, Grant
On 16/11/10 04:56, spock wrote:
Hi Charles,
The relevant line is
MAX_CELLS_IN_REPORT_TABLE = 100_000 if debug else 5_000
I think it is in line 265 in conf.py in version 1.5.4
Change the 5_000 e.g. to 20_000
All the best,
Grant
--
---
You received this message because you are subscribed to the Google Groups "sofastatistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sofastatistic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sofastatistics/1db841bb-4256-4215-a381-6041c9d35cebn%40googlegroups.com.
I'll have to look on a Windows machine and get back to you. In
the meanwhile perhaps you use a find function to look for
'setup_sofastats.py' (a very distinctive file name). If you find
it, 'conf.py' is in the same folder.
To view this discussion on the web visit https://groups.google.com/d/msgid/sofastatistics/fc3ba678-8dfd-45ee-bc25-776c6ad433ben%40googlegroups.com.