new light weight excel reader/writer - pylightxl

325 views
Skip to first unread message

Viktor Kis

unread,
Jan 3, 2020, 6:54:08 AM1/3/20
to python-excel
Hello everyone,

I recently published my first open source library called pylightxl. It is a light weight excel reader (soon to be writer) that was written from bottom up with zero dependencies (only standard lib libraries used are: zipfile, re, os and sys). For a single source file (single version) it supports python 2.7 and 3+ so you don’t have to worry about compatibilities. It is 100% test driven development aimed to be as pythonic as possible without any bells or whistles; it just reads cell data and returns easy to work with lists and iterators for rows/cols.

 

I would be honored if you guys could also share it with your audience on python-excel.org! It has been a great pleasure to be able to contribute back to the python community and I have received so many great feedback in processes of developing this library.

 

Please take a look, hope you like it as well:

 

Docs: https://pylightxl.readthedocs.io

GitHub: https://github.com/PydPiper/pylightxl

Pypi: https://pypi.org/project/pylightxl/

 

Best,

Viktor Kis


Harald Armin Massa[legacy]

unread,
Jan 3, 2020, 7:22:49 AM1/3/20
to python...@googlegroups.com
Viktor,

thank you so much for your work and sharing it. 

Could I humbly ask you about this design choice:

"Worksheet data size is consistent for each row/col. Any data that is empty will return a ‘’  "

I read this as "empty cells will be returned as String with length zero aka empty string"

What make you choose empty string instead of None?

I am asking this because Oracle made a similar design choice (empty string equals NULL, (with NULL being the SQL-similiar to None)).
A design choice that was not shared by the SQL-Standard committee and which continues to hunt
developers forever.

Cheers

Harald

--
You received this message because you are subscribed to the Google Groups "python-excel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-excel...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/python-excel/d39eb609-d82a-4582-88db-90aca3a66642%40googlegroups.com.


--
LightningTalkMan
a brand of GHUM GmbH
Spielberger Straße 49
70435 Stuttgart
0173/9409607

Viktor Kis

unread,
Jan 3, 2020, 7:48:17 AM1/3/20
to python-excel
Hi Harald,

Thank you so much for taking a look at the library. I've been overwhelmed by all the positive and constructive feedback from all sources, I love it, and it speaks highly of the python community. Now for your question; I must say I was not aware that the SQL standard committee preferred to use Null/None for empty cell data, but from a user standpoint (which is where I was/is trying to tailor the API) I have many hours of pain dealing with pandas numpy NaN, trying to iterate over data that's expected to be some form of int, float, str type. Now on the excel side of things, a cell equal to an empty cell actually returns 0, which was my first hard decision because I wanted to make the API as excel-like friendly as possible but returning 0 for a cell that has nothing in it posed too many issue when trying to view the data (ex: no distinction between no-data versus measured 0 value data). Therefore an empty string became the next solution, which in python works out great for if statements as well 

for cell in db.ws('Sheet1').row(1):
    if cell:
        print(cell)

This is actually a really great topic, let me know what else you think about it.

Best,
Viktor


On Friday, January 3, 2020 at 4:22:49 AM UTC-8, Harald Armin Massa wrote:
Viktor,

thank you so much for your work and sharing it. 

Could I humbly ask you about this design choice:

"Worksheet data size is consistent for each row/col. Any data that is empty will return a ‘’  "

I read this as "empty cells will be returned as String with length zero aka empty string"

What make you choose empty string instead of None?

I am asking this because Oracle made a similar design choice (empty string equals NULL, (with NULL being the SQL-similiar to None)).
A design choice that was not shared by the SQL-Standard committee and which continues to hunt
developers forever.

Cheers

Harald

On Fri, 3 Jan 2020 at 12:54, Viktor Kis <vkisf...@gmail.com> wrote:
Hello everyone,

I recently published my first open source library called pylightxl. It is a light weight excel reader (soon to be writer) that was written from bottom up with zero dependencies (only standard lib libraries used are: zipfile, re, os and sys). For a single source file (single version) it supports python 2.7 and 3+ so you don’t have to worry about compatibilities. It is 100% test driven development aimed to be as pythonic as possible without any bells or whistles; it just reads cell data and returns easy to work with lists and iterators for rows/cols.

 

I would be honored if you guys could also share it with your audience on python-excel.org! It has been a great pleasure to be able to contribute back to the python community and I have received so many great feedback in processes of developing this library.

 

Please take a look, hope you like it as well:

 

Docs: https://pylightxl.readthedocs.io

GitHub: https://github.com/PydPiper/pylightxl

Pypi: https://pypi.org/project/pylightxl/

 

Best,

Viktor Kis


--
You received this message because you are subscribed to the Google Groups "python-excel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python...@googlegroups.com.

Harald Armin Massa[legacy]

unread,
Jan 3, 2020, 10:12:11 AM1/3/20
to python...@googlegroups.com
Hi Viktor,

thanks for explaining your way of reasoning!

I must say I was not aware that the SQL standard committee preferred to use Null/None for empty cell data,

to be exact: The SQL standard committee does not have any opinion about Excel cell data. 
The SQL standard defines a value NULL, which means "we have no entry for this field", i.e. empty field.
NULL brings its own set of troubles, as it means "we do not know". So NULL is neither true nor false, comparing with it
is impossible beyond "is NULL" or "not is NULL"

but from a user standpoint (which is where I was/is trying to tailor the API) I have many hours of pain dealing with pandas numpy NaN, trying to iterate over data that's expected to be some form of int, float, str type.

Yes, I can imagine. Been doing database development for decades, and still stumble upon that bad NULL thingy.
And the database-not-to-be-named removed that pain for at least String-Types by returning an empty string - 
making the pain go away at least sometimes.
 
Now on the excel side of things, a cell equal to an empty cell actually returns 0, which was my first hard decision because I wanted to make the API as excel-like friendly as possible but returning 0 for a cell that has nothing in it posed too many issue when trying to view the data (ex: no distinction between no-data versus measured 0 value data). Therefore an empty string became the next solution, which in python works out great for if statements as well 

Clearly understandable and reasoned way of deciding, thank you very much for sharing.

My question: as returning 0 for no-data became an issue immediately, are you sure returning empty-string for no data 
will be fine for the future?

My decision would have been to escalate that decision to the apis user and return None.
If she thinks "empty string" is good, that is a list comprehension away, the same with 0.

Thanks for explaining your reasoning and engaging in this discussion. None, NULL, NIL and their siblings will be
forever painful as the unknown...

chears

Harald




 
To unsubscribe from this group and stop receiving emails from it, send an email to python-excel...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/python-excel/ca0521c6-baa1-4c2d-ad34-ef9838aab0f5%40googlegroups.com.

John Yeung

unread,
Jan 3, 2020, 11:06:56 AM1/3/20
to python-excel
Viktor,

Thank you for your work, and for sharing it with us!

Regarding how to handle empty cells, I would say the decision to use
the empty string is definitely a reasonable choice, and has a very
powerful precedent: It's what xlrd does, and that package has been the
gold standard for reading Excel workbooks in Python ever since it was
introduced.

Excel users are used to an empty cell being equivalent to both zero
and the empty string, depending on context. When added to a numeric
cell, it's zero; but when concatenated with a text cell, it's the
empty string. So you definitely cannot win every time when doing this
in Python. (Perhaps you can in PHP or JavaScript.)

One thing that xlrd has, which allows you to distinguish between an
empty cell and a text cell containing an empty string, is that it
gives you the "Excel type" of each cell (if you want it), and the
"empty cell" type is different from the "text cell" type.

John Y.

Viktor Kis

unread,
Jan 3, 2020, 12:20:59 PM1/3/20
to python-excel
Hey Harald,

The api is still new and I am all ears for improvements. Your note:

My question: as returning 0 for no-data became an issue immediately, are you sure returning empty-string for no data 
will be fine for the future?

My decision would have been to escalate that decision to the apis user and return None.
If she thinks "empty string" is good, that is a list comprehension away, the same with 0.

actually got me thinking that it might be best to have the user define their own empty-cell values instead of forcing them to use "" the empty string. By default the API would be "", with the option to overwrite it.  This is actually really easy to implement (many thanks to Microsoft folks who put the excel xml together) because only filled out cell values are stored in the sheet#.xml files, therefore I made my api return "" if no cell value was fed. It's actually a very beautiful way to use less memory and no allocate large ranges.

The only draw back I see is inconsistency between code bases. The whole cookie cutter force versus malleable of an api 

Best,
Viktor

Viktor Kis

unread,
Jan 3, 2020, 12:32:32 PM1/3/20
to python-excel
Hi John,

Really appreciate you taking a look at the lib! Your note on:

One thing that xlrd has, which allows you to distinguish between an
empty cell and a text cell containing an empty string, is that it
gives you the "Excel type" of each cell (if you want it), and the
"empty cell" type is different from the "text cell" type.

can also be fixed by allowing users to pick their own empty cell value, I hope. See my note above to Harald. Although the idea of making each cell value have attributes crossed my mind, I decided to keep it simple since all the API cares about is cell values (no formats, etc).

My next project is to get this writer up and running as soon as I can so people can use it. I was able to figure out how to write an excel file from scratch but the hard part will be to modify existing files (inject data into xml) without an errors.

Best,
Viktor 

python...@raf.org

unread,
Jan 5, 2020, 5:30:55 PM1/5/20
to python...@googlegroups.com
Viktor Kis wrote:

> Hey Harald,
>
> The api is still new and I am all ears for improvements. Your note:

Hi,

I haven't looked at it yet, so this might be unhelpful, but I'd recommend
supporting the idea of the client deciding whether they want to read cell
values as stored or cell values as displayed to a user of Excel (i.e. the
value as modified by the format). For some use cases, the value as seen by
a human is far more important than the stored value (which for example may
have many more digits than the displayed value).

cheers,
raf

John Yeung

unread,
Jan 5, 2020, 8:21:17 PM1/5/20
to python-excel
On Sun, Jan 5, 2020 at 5:30 PM <python...@raf.org> wrote:
>
> I haven't looked at it yet, so this might be unhelpful, but I'd recommend
> supporting the idea of the client deciding whether they want to read cell
> values as stored or cell values as displayed to a user of Excel (i.e. the
> value as modified by the format). For some use cases, the value as seen by
> a human is far more important than the stored value (which for example may
> have many more digits than the displayed value).

It certainly would be useful, but it's also not that easy to build
full support for arbitrary formatted values. Maybe it wouldn't be
unreasonably difficult to support a limited set of common formats. But
that might still be beyond the scope of this project.

John Y.

Viktor Kis

unread,
Jan 5, 2020, 8:43:52 PM1/5/20
to python-excel
Thanks for the input Raf and John,

I am still knee deep in the writer function for now, but I have briefly looked at what it takes to scrape the formats (contained in the styles.xml file) a bit more difficult to read because the sheet.xml references a style number that is not in the style.xml (go figure).. it is actually only matched up by <xf> order tags (within <cellXfs> tag). Now I see that how numbers are formatted; ex:

<numFmt formatCode=""$"#,##0.00" numFmtId="164"/>
<numFmt formatCode="0.000" numFmtId="165"/>

but as John pointed out there may be more variations here that I care to reverse engineer. Perhaps adding the unit flags and number of decimals may be a good start. That should be too bad to parse out of these.

Viktor Kis

unread,
Apr 23, 2020, 11:38:07 PM4/23/20
to python-excel
Hello everyone,

Just following up on the status of pylighxl being added to the listed python-excel tools. Please let me know if there are additional information needed for it to be added to the list.

Again here are it's references:

Since the last email, pylightxl was featured on pythonbytes and on pycon 2020 excel poster.

Best,
Viktor Kis
Reply all
Reply to author
Forward
0 new messages