Python and Excel

116 views
Skip to first unread message

Mike Barry

unread,
Nov 23, 2022, 7:09:42 AM11/23/22
to Excel-DNA
Hello, I am new to the group and an experienced user of direct to the customer RAW DNA results and |I have also used Python as an extension of my work with DNA profiles, Missing parent phasing, data cleaning, Whole family Phasing, Kit optimisations against other family members, Template re alignment for different company and generation results.  I am now looking into making a template for a super kit, using various generation results from 23andme , Ancestry and FTDNA, and as most will know the approx 1.1M row limitation is the main limitation given that each kit can have between 650K row to 900K+ rows, so a chromosome by chromosome break down seems to be the only solution at present. I have also sent a request to Microsoft for a method to Taylor the excel sheet size so that it would be possible to trade columns for rows so that a long thin sheet could be created and saved.  At present I do not use VBA or C+ but have learned Python and made some DNA tools which are graphic screens / sub windows with function buttons so the user can control the process step by step from choosing the sub director to load a .cvs file from to processing like phasing and finally to saving a ,csv file naming it and putting in a sub director of my choice.   Is there any scope to incorporate Python into this project or could it be added?  I am also struggling at the moment to get an over view of what a Excel-DNA sheet might be capable of, and if it would be more efficient use of my time, rather than making more Python tools?

Kedar Kulkarni

unread,
Nov 25, 2022, 4:49:07 PM11/25/22
to Excel-DNA
I am really doubtful that there would be python specific support from excel dna. You can try checking https://www.pyxll.com/ or https://www.xlwings.org/ or something similar that suits your requirement as they are useful to work with python scripts.  (i have used it a bit while studying python).

Just my opinion - I would ideally not recommend adding a million rows of data in one or more worksheets as the workbook size would be large and could go out of memory or make excel very slow. Just use it as an output for your charts or to show a screenful part of your in-memory data once. 

Mike Barry

unread,
Nov 26, 2022, 7:35:00 AM11/26/22
to Excel-DNA
Thank you for your reply, and URL suggestions. The file size can extend to hundreds on MB, just to get three data sets onto a sheet each, then three or more sheets with approx.. a third of the data from each of the other sheets to be handled separately to keep within the 1M+ row limitations, plus another sheet for each one third for the rationalisation of the rows to form the template, finally each third optimised exported to a csv file probably with in the order of 1.5M+ rows.  I'm thinking that using Python alone would remove these limitations although I'm not sure how to progress, after getting all the data in one data-frame, extracting and deleting the duplicate rows, and optimising when one or more locations is a no call so as to get a rsult in as many rows as possible, and the ultimate when there is dissimilar results at the same location dealing with this situation, I think I have some more research to do!!

Russel Webber

unread,
Dec 22, 2022, 12:49:25 PM12/22/22
to Excel-DNA
I used Excel DNA to create an Excel addin that supports Python. Have a look at https://www.xlslim.com

I expose the same Excel Object model as used in VBA code, this should be useful for building templates. This document should help https://russelwebber.github.io/xlslim-docs/html/user/vba_replacement.html

As mentioned, pandas dataframes may be a better way to manipulate the raw data, and then Excel can be used as the presentation layer. xlSlim fully supports dataframes, see https://russelwebber.github.io/xlslim-docs/html/user/dataframes.html and https://russelwebber.github.io/xlslim-docs/html/samples/yfinance.html as examples.

Hope that helps,
Russel
Reply all
Reply to author
Forward
0 new messages