Parameterised test with csv containing high number of records

11 views
Skip to first unread message

Akshay Maldhure

unread,
Feb 3, 2020, 6:48:24 AM2/3/20
to lemoncheesecake
I'm trying to run parameterised tests by feeding the test data from a csv file as per the instructions at http://docs.lemoncheesecake.io/en/latest/parametrized.html.

The problem is that my csv file has more than 200000 records and unfortunately, I need to run my test on all of them. However, when I try to do that, my lcc test stalls while execution and nothing happens for a long time. I was able to confirm that the issue I'm facing is indeed due to the high number of records, because when I reduced the number of records, my test worked.

This might not be related to lcc, but could be an inherent limitation with the csv module. However, any help would be highly appreciated.

Akshay Maldhure

unread,
Feb 3, 2020, 7:05:17 AM2/3/20
to lemoncheesecake
One observation I have here is that if I write a simple test to read the same huge csv file with csv.DictReader class and print the first column contents, it works without any issues.

Could there be a problem with the way csv.DictReader class is being used and supplied to the parameterised test?


@lcc.test("simple csv read test")
def verify_read_csv(self):
with open(huge_csv_file_path, 'r') as csv_file:
dict_reader = csv.DictReader(csv_file)
for row in dict_reader:
print(row['inventoryNumber'])

Nicolas Delon

unread,
Feb 3, 2020, 7:16:41 AM2/3/20
to lemoncheesecake
Hello,

The issue is not from the CSV reading but from lemoncheesecake, itself. For instance, the following test suite will take minutes to load:
@lcc.suite("suite")
class suite:
    @lcc.test("test")
    @lcc.parametrized({"value": i} for i in range(200000))
    def test(selfvalue):
        lcc.log_info("value: %d" % value)


A test suite with 200.000 tests is more than huge. Are they "real" tests ? I'm wondering about the testing strategy here.

Best regards.

Akshay Maldhure

unread,
Feb 3, 2020, 7:37:10 AM2/3/20
to lemoncheesecake
Actually, I'm intending to do some operations on each of the records from the csv in a faster way by using lcc's parameterised tests with --threads option.

I'm generating this csv every time from a REST API's response (this API throws more than 200000 records at once) and this API does not currently support pagination.

Akshay Maldhure

unread,
Feb 3, 2020, 7:37:56 AM2/3/20
to lemoncheesecake
So if there are no plans to fix/improve in lcc, then I'll have to trim down on the number of rows I write to the csv to 10000 or something.

Nicolas Delon

unread,
Feb 3, 2020, 7:54:54 AM2/3/20
to lemoncheesecake
It seems to me a very "devious" usage of a test framework, because it does not seem to really deal with tests but more with tasks parallelism.
I would rather develop something specific using threads, asyncio, etc...

In lemoncheesecake, I already saw something that really slows down the project loading. If there are only few spots to optimize with low impact on the code, I'll try to bring some performance improvements.

Nicolas.

Akshay Maldhure

unread,
Feb 3, 2020, 8:06:36 AM2/3/20
to lemoncheesecake
Well, that sounds fair enough. Thanks for your inputs.

Nicolas Delon

unread,
Feb 9, 2020, 4:07:45 PM2/9/20
to lemoncheesecake
Hello,

I just released lemoncheesecake 1.4.2. It fixes the exponential tests running time when dealing with very large test suites.
However you will probably encounter other issues related to the very large report it will generate as output.
Some hints:
  • by default, lemoncheesecake save the report data on disk on test failures; it would a better choice to save it only once at the end of the tests using "--save-report at_end_of_tests"
  • with a report data file (report/report.js) that will probably take hundreds of MB or even several GB, I'm not sure that a browser will be able to render the HTML report; I think you will probably have more chance by viewing some tests in particular using the command "lcc report --path somesuite.sometest"
Best regards,

Nicolas.

Akshay Maldhure

unread,
Feb 10, 2020, 8:28:41 PM2/10/20
to lemoncheesecake
Thanks a lot Nicolas for the updates. Noted.
Reply all
Reply to author
Forward
0 new messages