Limit on SOFA

91 views
Skip to first unread message

Aiman Kennan

unread,
Nov 8, 2010, 3:40:13 PM11/8/10
to sofastatistics
Hello,

First I would like to thank you all for this wonderful program, I new
to it but I have some questions.

1) Can we set a limit to return results (ex in mysql LIMIT 200)
2) Can we remove the limit of 5000 cells

Regards,

Aiman Kennan

Grant Paton-Simpson

unread,
Nov 8, 2010, 3:57:40 PM11/8/10
to sofasta...@googlegroups.com
Hi Aiman,

On 09/11/10 09:40, Aiman Kennan wrote:
> Hello,
>
> First I would like to thank you all for this wonderful program, I new
> to it but I have some questions.
>
> 1) Can we set a limit to return results (ex in mysql LIMIT 200)

When are you wanting to limit the number of records - when importing?
When analysing? One option is to use the Filter button and to filter by
sofa_id <= 200. Is that what you need?


> 2) Can we remove the limit of 5000 cells

Yes - it is open source interpreted code so you can do lots of things as
required :-) On line 630 of report_table.py you can edit (carefully)
the following:

script_lst.append(u"max_cells = 5000")

NB no commas e.g. 5000 not 5,000.

NB It may take a while to display if there are lots and lots of cells.


All the best, Grant
> Regards,
>
> Aiman Kennan
>

Aiman Kennan

unread,
Nov 8, 2010, 4:08:01 PM11/8/10
to sofasta...@googlegroups.com
Thank you Grant,

You did answered my question.

Regards,

Aiman Kennan

unread,
Nov 8, 2010, 4:58:20 PM11/8/10
to sofasta...@googlegroups.com
Hello Grants,

Can we host this amazing  as a web application over apache?

thanks

Grant Paton-Simpson

unread,
Nov 8, 2010, 8:36:13 PM11/8/10
to sofasta...@googlegroups.com
Hi Aiman,

Technically, lots of things are possible with the appropriate code modifications.  What sort of thing are you thinking of?  SOFA Statistics has two layers - the GUI layer and the analysis/output layer.  The GUI layer uses the wxPython desktop toolkit and would not work over the web.  But a web server could presumably run the Python analysis/output scripts.  The question would be how to let people upload their data so that the web service could analyse it.

Offering SOFA as a service over the internet is certainly something we're looking at ourselves in the longer term, although at the moment our focus is on the desktop application.

NB: SOFA Statistics is released under the AGPL3 so as far as I understand it any modified source code would also need to be made available under the AGPL3. In particular, you'd need to look at clause 13 of the AGPL, which would apply if users were interacting with SOFA remotely through a computer network.

Please feel free to ask any specific technical questions which may help you.


All the best, Grant

spock

unread,
Nov 15, 2010, 10:56:22 AM11/15/10
to sofastatistics
Hi,

Sofa rocks!

Instead of a limit on import or arbitrary report run quantity, how
feasible is it
to implement a user definable sampling percentage (e.g. = 5%, 10%,
20%...50%)
that's random vs nth?

So, for example I have a 1.3m row analysis file in MySql that I need
to run Sofa
tests against, give the user the option to simply run reports/tests
against that
entire file for a sample of it at report run time.

Please advise if this is possible within the existing filtering
functionality

Thanks

Grant Paton-Simpson

unread,
Nov 15, 2010, 12:13:05 PM11/15/10
to sofasta...@googlegroups.com
Hi Aiman,

That's a great idea and I have added it to the list of features I'd like
to add in a future version. The logical place to access it would
probably be the filter dialog. Implementing it would take a bit of
thought because of the massive performance differences likely between
different strategies. Making some sort of temporary table seems a
potentially good strategy, possibly into the database we are working
with e.g. MySQL*. It could also be possible to combine filters e.g. I
might want to look at gender = 1 for a 5% sample. It would make sense
to apply the sampling filter first (resulting in a one off temporary
table-making process) and then apply any filters onto that for specific
analyses just as we would off a normal, unsampled table.


All the best, Grant

* Note to self: Making a temporary table into SQLite could also work
(see http://www.sqlite.org/tempfiles.html) but SOFA would need to reset
the dbe, con, cur etc information whenever that table was selected.
Could be trivial but it would pay to be careful.

Grant Paton-Simpson

unread,
Nov 15, 2010, 12:27:40 PM11/15/10
to sofasta...@googlegroups.com
Hi Aiman,

In the meantime, the best strategy might be to make such a table
yourself and then run SOFA on top of that. It looks quite simple:

> ORDER BY RAND() combined with LIMIT is useful for selecting a random
> sample from a set of rows:
>
> mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY
> RAND() LIMIT 1000;
>
> RAND() is not meant to be a perfect random generator. It is a fast way
> to generate random numbers on demand that is portable between
> platforms for the same MySQL version.
>
> http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html

It may be possible to implement this with a number of different types of
database:

http://www.carlj.ca/2007/12/16/selecting-random-records-with-sql/

Is that useful?

In any version implemented in SOFA I would need to think very hard about
performance issues and perform lots of tests. My instinct is towards
letting the SQL database engine do the heavy lifting for which it is
optimised.


All the best, Grant

On 16/11/10 04:56, spock wrote:

Charles Botha

unread,
Jun 7, 2022, 3:58:23 AM6/7/22
to sofastatistics
Hi Grant, 

Thanks for this reply, I'd also like to do this, but am running SOFA Statistics 1.5.4, and I think that maybe this info has changed since then? 

I can't seem to find that line to edit in  report_table.py on line 630 in this version...

Thanks,
Charles.

Grant Paton-Simpson

unread,
Jun 7, 2022, 7:18:01 AM6/7/22
to sofasta...@googlegroups.com

Hi Charles,

The relevant line is

MAX_CELLS_IN_REPORT_TABLE = 100_000 if debug else 5_000

I think it is in line 265 in conf.py in version 1.5.4

Change the 5_000 e.g. to 20_000

All the best,
Grant

--

---
You received this message because you are subscribed to the Google Groups "sofastatistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sofastatistic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sofastatistics/1db841bb-4256-4215-a381-6041c9d35cebn%40googlegroups.com.

Charles Botha

unread,
Jun 7, 2022, 8:15:31 AM6/7/22
to sofastatistics
Hi Grant,

Thanks so much for the quick reply & the help, it's really appreciated!

I can't seem to find the file conf.py in my install (version 1.5.4 running on Windows 10 x64).

Not sure whether I'm looking in the wrong place or something...

Cheers,

Charles.

Grant Paton-Simpson

unread,
Jun 7, 2022, 3:29:24 PM6/7/22
to sofasta...@googlegroups.com

I'll have to look on a Windows machine and get back to you. In the meanwhile perhaps you use a find function to look for 'setup_sofastats.py' (a very distinctive file name). If you find it, 'conf.py' is in the same folder.

Charles Botha

unread,
Jun 13, 2022, 8:53:53 AM6/13/22
to sofastatistics
Hi Grant,

Thanks for coming back to me, much appreciated!

I had a look & wasn't able to find that file unfortunately but no stress it's not an urgent issue so I can wait & will to install on Linux & test or wait for the next release.

Reply all
Reply to author
Forward
0 new messages