New Developer Woes

13 views
Skip to first unread message

Daniel

unread,
Apr 2, 2020, 3:55:26 PM4/2/20
to pytables-dev
Hello,

I am strongly considering developing on PyTables in order to resolve issue #48 (support variable length strings). I setup my development environment for PyTables (on Windows 10), but a couple of things seem off to me...

1. Upon compiling the .pyx modules, the built .pyd files were added to the tables/ package. However, they are NOT ignored with .gitignore - shouldn't these be ignored? In addition, a Users/ directory was also created, which is also NOT ignored....
2. I ran several thousand of the test cases, and many times I ran into errors like `SomeBaseTestCase` object has no attribute 'start' or 'close' or 'reopen'. Are these actual bugs in the test cases or am I missing something...

Thank you,

Daniel Raimi-Zlatic

PS. This is my first time developing for a python library with this rigorous of a code base / merge procedure, so I might have a lot of dumb questions like this - I could only find partial information on the Wiki.

Daniel

unread,
Apr 2, 2020, 5:02:43 PM4/2/20
to pytables-dev
Well! I took a stab at even implementing a solution to #48 and the fix would be very deep into the code that it is going to take too long for me to implement a solution.... So, feel free to leave this thread unanswered, I am giving up.

Antonio Valentino

unread,
Apr 3, 2020, 2:09:25 AM4/3/20
to pytabl...@googlegroups.com
Dear Daniel,

Il 02/04/20 23:02, Daniel ha scritto:
> Well! I took a stab at even implementing a solution to #48 and the fix
> would be very deep into the code that it is going to take too long for me
> to implement a solution.... So, feel free to leave this thread unanswered,
> I am giving up.

I'm really sorry the you gave up, a contribution on issue #48 would be
very appreciated indeed.

I quickly re-read the comments in the issue and it is not totally clear
to me which is the use case you are interested in:

* reading HDF5 generated by other SW?
* writing your own data having VL size?
* why exactly you need VL strings?
I mean which is the case in which data cannot be arrange to use one of
the existing ?Array types already provided by PyTables?

Also I'm interested in knowing something more about your analysis.
Were exactly you get stuck?
Why you decided it needs to much effort?
What would you need to complete the task with a reasonable amount of effort?

I don't know what exactly you have in mind but probably VL strings used
in attributes or VLArray could be a good starting point for your purpose.

other comments inline below


> On Thursday, April 2, 2020 at 3:55:26 PM UTC-4, Daniel wrote:
>>
>> Hello,
>>
>> I am strongly considering developing on PyTables in order to resolve issue
>> #48 (support variable length strings). I setup my development environment
>> for PyTables (on Windows 10), but a couple of things seem off to me...
>>
>> 1. Upon compiling the .pyx modules, the built .pyd files were added to the
>> tables/ package. However, they are NOT ignored with .gitignore - shouldn't
>> these be ignored? In addition, a Users/ directory was also created, which
>> is also NOT ignored....

Personally I do not develop in Windows and I see that our gitignore file
is quite minimal.
I would say that you are right please feel free to open a PR for that or
just let me know which are exactly patterns you want to add.

>> 2. I ran several thousand of the test cases, and many times I ran into
>> errors like `SomeBaseTestCase` object has no attribute 'start' or 'close'
>> or 'reopen'. Are these actual bugs in the test cases or am I missing
>> something...

I need more details for that.
One comment is that, I nor remember well, I'm not sure that our test
suite works correctly with pytest.
You should use one of the ways recommended in the docs to run tests.

>> Thank you,
>>
>> Daniel Raimi-Zlatic
>>
>> PS. This is my first time developing for a python library with this
>> rigorous of a code base / merge procedure, so I might have a lot of dumb
>> questions like this - I could only find partial information on the Wiki.

Please fell free to ask, this mailing list is the right place.
Also suggestions/PR to improve our Wiki/docs would be very appreciated.


kind regards

--
Antonio Valentino

Tom Kooij

unread,
Apr 3, 2020, 2:29:29 AM4/3/20
to pytabl...@googlegroups.com
Hello Daniel,

I'm actually more familiar with Windows development. The issues with .gitignore are certainly correct. I tend to ignore them in my workflow.
(Developping python with compiled C-extensions is actually *much* (much!) easier on Unix/Mac).

From your comments I realise it would be helpful for newcommers to fix the windows .gitignore and create a .BAT file that replaces `make clean`. 
As Daniel already pointed out, it is a pity you gave up. We'll be glad to help.

Would you be willing to contribute a "windows development" PR? (gitignore perhaps a batchfile, I can clean up my own mess and contribute that).


Best regards,
Tom Kooij


Op do 2 apr. 2020 om 21:55 schreef Daniel <daniel.ra...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "pytables-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pytables-dev/2ada9fac-052c-4cac-8fea-174319e4c28c%40googlegroups.com.

Daniel

unread,
Apr 3, 2020, 10:12:42 AM4/3/20
to pytables-dev
Hmmm. You guys are a bit more supportive than I thought! Maybe I won't give up quite yet. I would appreciate a point in the right direction.

To respond to your comments:

1. I am pretty committed to working on Windows - I don't have space on my hard drive for a linux VM and the setup required to make a virtual machine on a server with a similar environment to my Windows setup seems menacing...
2. I can certainly update the .gitignore to ignore the built files - FYI I am using Visual Studio Cpp compiler (I don't know remember the version) to make the .pyd files. Also, how bad would it be to include this change in the PR to fix issue #48?
3. I am not that familiar with make besides just being able to run make.bat files that are premade - I've also only dabbled in making .bat scripts. I would be pretty inefficient at that, so I would appreciate your help, Tom.
4. I just started the "python -m tables.tests.test_all --heavy" command, it might take a while to finish - I had previously been running a PyCharm Unittests configuration for everything in tables/tests directory (and for individual test modules). I've included a screenshot of a sample of the errors I get when I run the unit tests with that method.
5. My particular issue is that some external software sends data out with VL string arrays - I have no control over the data format - that I want to read in with PyTables. I've attached a screenshot from an HDF viewer I use to peek at HDF data (see the Data Type field and MAx Dimension Size(s) field). This data type means that the "/Sensor/serial_number/" node is treated as an UnImplemented node.
6. After a couple hours looking at the code / setting up my environment, I am getting stuck in 1. figuring out where the class description associated to a leaf with a specific datatype/no of dimensions/max dimension Size(s) gets attributed to each leaf - it seems to be somewhere in the cython hdf5extension. 2. I still have no idea which class represents VL String arrays, or if I'd need to make a new, and 3. I don't have any leads on how to implement a solution.
7. I decided it was too much effort because I gave myself two hours to peruse the code and setup my environment so that I could figure out the answers to the questions in 6, and I could not figure out anything. Moreover, I ran into the basic windows trouble I described that made it seem even harder to get started. Note, I do have inelegant workarounds for my problem so my 2-hour rule was a judgement call to see if the right solution (adding a feature to PyTables) was worth it.
8. If I am to continue, I would appreciate some detailed (e.g. where the VLArray and attributes live, what methods needs to review, etc.) help on the questions I have in bullet 6 and some Windows support - not to seem too needy, but I think Tom is right that not having Windows support can be scary to newcomers.
9. As far as improving docs goes, I don't think anything was unclear and I think it would be too much effort to provide enough detail for me to answer the questions in bullet 6. However, one thing I did notice was the setup of the MSC compiler page seemed outdated.

After all that, I hope I didn't come off as blaming you guys for me not understanding how to solve issue #48. I really meant that it seemed too much for me to handle coming into a new library. I have only used pytables for parsing one file type, so I don't have a nearly complete understanding of all the features this library includes.

Thank you,

Daniel


On Friday, April 3, 2020 at 2:29:29 AM UTC-4, Tom Kooij wrote:
Hello Daniel,

I'm actually more familiar with Windows development. The issues with .gitignore are certainly correct. I tend to ignore them in my workflow.
(Developping python with compiled C-extensions is actually *much* (much!) easier on Unix/Mac).

From your comments I realise it would be helpful for newcommers to fix the windows .gitignore and create a .BAT file that replaces `make clean`. 
As Daniel already pointed out, it is a pity you gave up. We'll be glad to help.

Would you be willing to contribute a "windows development" PR? (gitignore perhaps a batchfile, I can clean up my own mess and contribute that).


Best regards,
Tom Kooij


Op do 2 apr. 2020 om 21:55 schreef Daniel <daniel.r...@gmail.com>:
Hello,

I am strongly considering developing on PyTables in order to resolve issue #48 (support variable length strings). I setup my development environment for PyTables (on Windows 10), but a couple of things seem off to me...

1. Upon compiling the .pyx modules, the built .pyd files were added to the tables/ package. However, they are NOT ignored with .gitignore - shouldn't these be ignored? In addition, a Users/ directory was also created, which is also NOT ignored....
2. I ran several thousand of the test cases, and many times I ran into errors like `SomeBaseTestCase` object has no attribute 'start' or 'close' or 'reopen'. Are these actual bugs in the test cases or am I missing something...

Thank you,

Daniel Raimi-Zlatic

PS. This is my first time developing for a python library with this rigorous of a code base / merge procedure, so I might have a lot of dumb questions like this - I could only find partial information on the Wiki.

--
You received this message because you are subscribed to the Google Groups "pytables-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytabl...@googlegroups.com.

Daniel

unread,
Apr 3, 2020, 10:17:55 AM4/3/20
to pytables-dev
I realize I forgot the attachments (now included)
unitest-errors.png
unittest-configurations.png
VLStringArray.png

Antonio Valentino

unread,
Apr 4, 2020, 5:06:38 AM4/4/20
to pytabl...@googlegroups.com
Dear Daniel,

Il 03/04/20 16:12, Daniel ha scritto:
> Hmmm. You guys are a bit more supportive than I thought! Maybe I won't give
> up quite yet. I would appreciate a point in the right direction.

happy tho hear it :)

> To respond to your comments:
>
> 1. I am pretty committed to working on Windows - I don't have space on my
> hard drive for a linux VM and the setup required to make a virtual machine
> on a server with a similar environment to my Windows setup seems menacing...

OK

> 2. I can certainly update the .gitignore to ignore the built files - FYI I
> am using Visual Studio Cpp compiler (I don't know remember the version) to
> make the .pyd files.

OK, I have just pushed an updated .gitignore which also included .pyd.
Don't know about the "User" folder you mention in the previous post.
Please feel free to add yourself if you think it is a good idea.

> Also, how bad would it be to include this change in
> the PR to fix issue #48?

well, probably it is not the best way to do things but IMHO it is fine
as soon as you add changes to the .gitignore in a separate commit.
Don't know what other PyTables developers think about it.

> 3. I am not that familiar with make besides just being able to run make.bat
> files that are premade - I've also only dabbled in making .bat scripts. I
> would be pretty inefficient at that, so I would appreciate your help, Tom.

An example of make.bat can be found here:
https://github.com/PyTables/PyTables/blob/v3.4.4/doc/make.bat

Having a make.bat is surely useful but not mandatory, so it is OK if you
don't feel it is something you can do.

As Tom said having a quick way to clean the source tree is handy.
Maybe you can use

$> git clean -dfx

as an alternative to a make.bat, but be careful with it because it
removes all files and directories that are not in git.

> 4. I just started the "python -m tables.tests.test_all --heavy" command, it
> might take a while to finish - I had previously been running a PyCharm
> Unittests configuration for everything in tables/tests directory (and for
> individual test modules). I've included a screenshot of a sample of the
> errors I get when I run the unit tests with that method.

OK, using "--heavy" may take very long time and IMHO it is not necessary
to run it systematically after each commit.
I personally run it just before a delivery.

Regarding PyCharm, I have just made some test and I can replicate your
problem.
The point is that the PyTables test suite is not to "test discover"
friendly.
Fortunately I have found a workaround.
In the PyCharm unittest configuration you should set

Module name: tables.tests.test_<MODULENAME>.suite

the final ".suite" makes the trick because it only selects test classes
that are actually meant to be test cases and not just base classes for
other test cases.
I hope tho wording is not too much confusing :)


> 5. My particular issue is that some external software sends data out with
> VL string arrays - I have no control over the data format - that I want to
> read in with PyTables. I've attached a screenshot from an HDF viewer I use
> to peek at HDF data (see the Data Type field and MAx Dimension Size(s)
> field). This data type means that the "/Sensor/serial_number/" node is
> treated as an UnImplemented node.

OK this is a use case for which we cannot suggest any workaround IMHO to
bypass the unavailability of VL strings.


> 6. After a couple hours looking at the code / setting up my environment, I
> am getting stuck in 1. figuring out where the class description associated
> to a leaf with a specific datatype/no of dimensions/max dimension Size(s)
> gets attributed to each leaf - it seems to be somewhere in the cython
> hdf5extension. 2. I still have no idea which class represents VL String
> arrays, or if I'd need to make a new, and 3. I don't have any leads on how
> to implement a solution.

OK some starting point to look at could be:

* RootGroup._g_load_child [1]
* Group._g_get_child_leaf_class [2]
* utilsextension.ehich_class [3]

[1]
https://github.com/PyTables/PyTables/blob/08510d5cb620132d6b99ae27b168baa50a2bd13d/tables/group.py#L1166
[2]
https://github.com/PyTables/PyTables/blob/08510d5cb620132d6b99ae27b168baa50a2bd13d/tables/group.py#L301
[3]
https://github.com/PyTables/PyTables/blob/master/tables/utilsextension.pyx#L753

[CUT]

> 9. As far as improving docs goes, I don't think anything was unclear and I
> think it would be too much effort to provide enough detail for me to answer
> the questions in bullet 6. However, one thing I did notice was the setup of
> the MSC compiler page seemed outdated.

Yes, we have a warning on top of the page stating that it is outdated.
Unfortunately I do not develop for python under windows, but maybe you
can provide an updated procedure to setup a working dev environment or
at least some links to pages explaining how to do it and/or a list of
updated tools.

> After all that, I hope I didn't come off as blaming you guys for me not
> understanding how to solve issue #48. I really meant that it seemed too
> much for me to handle coming into a new library. I have only used pytables
> for parsing one file type, so I don't have a nearly complete understanding
> of all the features this library includes.

Please don't worry. PyTables source code is not so little and not so
easy, at least at a first look.

I hope I have provided useful hints.
Please let me know if you have other questions.


kind regards
antonio

> On Friday, April 3, 2020 at 2:29:29 AM UTC-4, Tom Kooij wrote:
>>
>> Hello Daniel,
>>
>> I'm actually more familiar with Windows development. The issues with
>> .gitignore are certainly correct. I tend to ignore them in my workflow.
>> (Developping python with compiled C-extensions is actually *much* (much!)
>> easier on Unix/Mac).
>>
>> From your comments I realise it would be helpful for newcommers to fix the
>> windows .gitignore and create a .BAT file that replaces `make clean`.
>> As Daniel already pointed out, it is a pity you gave up. We'll be glad to
>> help.
>>
>> Would you be willing to contribute a "windows development" PR? (gitignore
>> perhaps a batchfile, I can clean up my own mess and contribute that).
>>
>>
>> Best regards,
>> Tom Kooij
>>
>>
>> Op do 2 apr. 2020 om 21:55 schreef Daniel <daniel.r...@gmail.com
>> <javascript:>>:
>>
>>> Hello,
>>>
>>> I am strongly considering developing on PyTables in order to resolve
>>> issue #48 (support variable length strings). I setup my development
>>> environment for PyTables (on Windows 10), but a couple of things seem off
>>> to me...
>>>
>>> 1. Upon compiling the .pyx modules, the built .pyd files were added to
>>> the tables/ package. However, they are NOT ignored with .gitignore -
>>> shouldn't these be ignored? In addition, a Users/ directory was also
>>> created, which is also NOT ignored....
>>> 2. I ran several thousand of the test cases, and many times I ran into
>>> errors like `SomeBaseTestCase` object has no attribute 'start' or 'close'
>>> or 'reopen'. Are these actual bugs in the test cases or am I missing
>>> something...
>>>
>>> Thank you,
>>>
>>> Daniel Raimi-Zlatic
>>>
>>> PS. This is my first time developing for a python library with this
>>> rigorous of a code base / merge procedure, so I might have a lot of dumb
>>> questions like this - I could only find partial information on the Wiki.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "pytables-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to pytabl...@googlegroups.com <javascript:>.
>>> <https://groups.google.com/d/msgid/pytables-dev/2ada9fac-052c-4cac-8fea-174319e4c28c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>

--
Antonio Valentino

Tom Kooij

unread,
Apr 5, 2020, 7:18:44 AM4/5/20
to pytabl...@googlegroups.com
I know about the strange "Users/" folder that is created (by the MS Compiler) on compiling cython modules. For some reason that compiler tries to store something in the actual "Users/" folder (where userprofiles are stored). I'd rather not add only add Users/ to .gitignore but also fix the problem. 

Anyway, I'll have a look to see if I can fix that, so that developing on Windows is a bit easier.

Regards,

Tom Kooij


Op za 4 apr. 2020 om 11:06 schreef Antonio Valentino <antonio....@tiscali.it>:
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pytables-dev/567c3ff6-0641-eed1-9fe0-985fd1892889%40tiscali.it.
Reply all
Reply to author
Forward
0 new messages