caching config data in the sync files

100 views
Skip to first unread message

tom....@overclocked.net

unread,
Dec 13, 2014, 4:18:51 AM12/13/14
to lightsh...@googlegroups.com
I've been working on a way to store the configuration in the sync files to avoid the index errors reported in issue #57.   
I have a fix right now that is working, It stores 7 variables in an numpy array
            current_hardware = np.array([[hc.GPIOLEN],
                                       
[sample_rate],
                                       
[_MIN_FREQUENCY],
                                       
[_MAX_FREQUENCY],
                                       
[_CUSTOM_CHANNEL_MAPPING],
                                       
[_CUSTOM_CHANNEL_FREQUENCIES],
                                       
[CHUNK_SIZE],
                                       
[-1]], dtype=object)


Currently we add the std and mean to the top of cache_matrix, pull then out and delete them from cache_matrix, and everything is saved to a nicely formatted text file. Now my fix removes std and mean from cache_matrix and they will be stored in there own arrays.  So now the sync file has 4 separate arrays in it, each accessible  by key, value pairs just like a dict(), 
>>> ca = np.load(".OhComeAllYeFaitfull.mp3.sync.npz")
>>> ca.files
['std', 'cache_matrix', 'cached_hardware', 'mean']
>>> s = ca['std']
>>> c = ca['cache_matrix']
>>> ch = ca['cached_hardware']
>>> m = ca['mean']
>>> ch
array
([[8],
       
[44100],
       
[20.0],
       
[15000.0],
       
[0],
       
[0],
       
[2048],
       
[-1]], dtype=object)


only the sync file is in the numpy npz format, which is a binary format.  So that would mean no more easy to open a text file to edit, we could write an editor, a simple curses based editor wouldn't be to hard to do.
And there are other advantages, to name a few.       
  1. The hardware config is a separate array, so are the std, mean, can cache matrix     
  2. The sync file size is reduced 60% - 70%, these files are uncompressed and load time is a little quicker 
  3. There is room for other arrays in there.  Per-song configs, Per-song sequences, If we get a sequencer going that could be added to the sync file with minimum effort.    
I could also just write everything out in a structured text file, it's a little more tedious if we want to add to it later, but it would have the same end results.   
I need everyone's input on this.  As is I could have per-song configs going in a few days, as it is now, I just need to refactor configuration_manager.py to stop constantly reloading of all the configs, then we could add in the methods to make the changes and they would stick.

So which direction should I go on this.  Both will require about the same amount of work, using the binary format is less work to implement (only because I did it already) and making it user friendly will be extra work in making an editor.  The text file method will be only marginally harder to implement, but more work to expand upon going forward.

Feedback?,  Input?,  Rude comments? :)

Todd Giles

unread,
Dec 14, 2014, 11:52:02 PM12/14/14
to lightsh...@googlegroups.com
I really like the way you are taking this - and use the binary format seems perfectly fine to me.  We can provide tools to make it human readable / editable.  I also like that it allows for really easy expansion of what we might add to the sync file over time in a way that wouldn't break older versions, etc...

Also - adding in per-song overrides will enable quite a bit of flexibility and customization for all sorts of things... looking forward to it!  Thanks for taking this on!

tom slick

unread,
Dec 15, 2014, 12:08:49 AM12/15/14
to lightsh...@googlegroups.com
Cool, Thank you.  I still would like some feed back from some of the other too.  This will affect their work also.    
And for the moment in the example that I posted, is there anything else that would affect the cached fft matrix?    
array([[8],        # length of the gpio pins       
       [44100],   # sample rate          
       [20.0],      # min frequency               
       [15000.0],# max frequency                
       [0],           # custom channel mapping            
       [0],           # custom channel frequency          
       [2048],     # chunk size          
       [-1]], dtype=object)  # an extra field that I was using to trigger using the current config or the cached config,  which worked     


--
http://www.lightshowpi.com/
---
You received this message because you are subscribed to a topic in the Google Groups "LightshowPi Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lightshowpi-dev/ttOr-DMWEJU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lightshowpi-d...@googlegroups.com.
To post to this group, send email to lightsh...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lightshowpi-dev/59b5b83e-a8b5-45cb-9d6f-c1fa36b542ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Todd Giles

unread,
Dec 15, 2014, 12:10:40 AM12/15/14
to lightsh...@googlegroups.com
Have you thought through storing per-song configuration as well as the hardware configuration that the fft matrix was based on?  It seems to me they are slightly different beasts, although one would impact the other (i.e. when you change a per-song configuration that affects the hardware, the fft matrix would need to be regenerated as it would be based off a different hardware configuration).

One idea for an initial "editor" for the more human readable parts, e.g. the per-song configuration, would be to have a command line tool that takes a text configuration file and stores it in the sync file.  Although I'm wondering now if the per-song overrides portion of this file might be better off in their own file so it is readily human editable.  This would mean there are potentially two files for each song (three actually if you count the song file), the sync file which would be a binary file containing items that are not normally going to be human edited (they could be, but not typically), and then a text file that is simply a python config file with overrides applied to the configuration but only for that specific file.  e.g.:

my_song.mp3     ---> The music file
.my_song.sync.npz     ----> The sync file as you've defined it, created upon initial play - or via tools script ahead of time
.my_song.cfg         ----> A text file in the exact same format as the defaults.cfg with overrides used only for this song

Thoughts?

Todd Giles

unread,
Dec 15, 2014, 12:12:25 AM12/15/14
to lightsh...@googlegroups.com
This idea of a having a my_song.cfg is the initial idea I had a year ago that would be trivial to implement, in that you could leverage all the existing code that loads the configuration file, with the simple addition of loading this file as the last override - should be a one line change I think.

Todd Giles

unread,
Dec 15, 2014, 12:17:40 AM12/15/14
to lightsh...@googlegroups.com
Tom - wondering if it would be better to store the data that is sent to the fft.calculate_levels routine (namely chunk_size, sample_rate, frequency_limits, gpiolen, channels=2) instead of the underlying configuration options.  I can see the potential configuration options that affect fft matrix calculation changing over time, and it would be nice to de-couple the configuration from the sync cache itself.  In particular, I think it would make good sense to have the creation / loading of what you've called "current_hardware" as part of the fft module (could make an fft class perhaps that does this) to encapsulate it.  Then when loading the sync file you'd have the fft module / class determine if anything has changed that would require re-caching the fft data.
Does that make sense?
To unsubscribe from this group and all its topics, send an email to lightshowpi-dev+unsubscribe@googlegroups.com.
To post to this group, send email to lightshowpi-dev@googlegroups.com.

Tom Enos

unread,
Dec 15, 2014, 12:24:20 AM12/15/14
to lightsh...@googlegroups.com
That was what I was thinking at first, and it's still an option.  What I did for testing was to generate a sync file with my desktop, but before I did that I read the config files and then updated the read in data with new values.  It worked great, took about 10 seconds to generate new sync files for 3 songs.  Then I uploaded them to the pi.  did not change the configs on the pi and it played the new sync files with the new values.  All I changed was the custom channel mapping and the frequencies but it played them with the new setting instead of the ones in the configs.

Tom Enos

unread,
Dec 15, 2014, 1:10:08 AM12/15/14
to lightsh...@googlegroups.com
That is about what I did, I took the values of CHUNK_SIZE as defined, sample_rate = musicfile.getframerate(), etc... all from memory, and when I reload then I compare them the same way from the values stored in memory.  
            current_hardware = np.array([hc.GPIOLEN,
                                        sample_rate
,
                                        _MIN_FREQUENCY
,
                                        _MAX_FREQUENCY
,
                                        _CUSTOM_CHANNEL_MAPPING
,
                                        _CUSTOM_CHANNEL_FREQUENCIES
,
                                        CHUNK_SIZE
,
                                       
-1], dtype=object)


The channels I did not include because in play_song() it was never passed to the fft module.  But everything came from memory, just before it would have been sent to the fft module.     
As for removing the config from the sync files that's not a problem.  everything is in its own separate array.  But it was my understanding that the cached matrix was derived from those specific settings. and if any of them changed the matrix was no longer valid,  I'm a good programmer (not bragging, I know what I can do), but I do not know that much about how the fft calculations work, but I am starting to understand (slowly), and so I still have a lot of holes in my knowledge about fft.  So you may have to talk me through a few points as they come up.      
      
Back to the point.      

Is the matrix only good for the input values listed in the config array, or is there a way to adapt then for different setup?      
        
And the config array is a separate array, it is just stored in the same file.  Same with the std and mean.  I no longer stack them on the cache_matrix then remove them from the matrix.  Separate arrays stored in the same file.  But like I said it is only one way to do this, if you have a way your looking at for the future then walk me through it, I could do it a different way too, I just like to code doesn't matter much if it's my idea or someone else's, someone else's make it a challenge and that is the fun part for me.  That is one of the reasons I wanted to have this discussion.  I'm one person working on a small part of the project, it need to fit the project now and in the future and with the work others are doing. 

To unsubscribe from this group and all its topics, send an email to lightshowpi-d...@googlegroups.com.
To post to this group, send email to lightsh...@googlegroups.com.

chrisausey

unread,
Dec 15, 2014, 11:51:20 AM12/15/14
to lightsh...@googlegroups.com
Going to have to digest this a bit. Like you Tom I have some learning to do regarding the FFT calculations, thats has always been Todd's brainchild. My initial thought when this issue was brought to light, was to have a separate config file per song. Possibly something named my_song.overrides.cfg. Users could define certain pins, frequencies, and channel mappings, etc that would be used for the song being played. Those configs would would generate the sync file for that song much like is done now. I was always a fan of Keeping the sync file as simple as possible, containing only the data calculations needed, that way the data could always be easily edited, or imported into programs that would know how to interact with the data (IE if someone wrote a driver to make our data file be recognizable by already available sync software such as vixen).  In my opinion the sync file should only contain the data from the FFT calculations, but I am not the only dev here and being a team player, ill go with whatever the majority chooses. Tackling the issue at hand of detecting changes that would require sync files to be removed: My initial thought was to have those parameters defined somewhere, then in the program grabbing all those parameters, and somehow creating a hash based on those parameters and their values and storing that in state.cfg or somewhere else possibly.  In future iterations of the program it would generate that hash and check it against the stored hash, if it changes, delete the sync file, if not, continue.  

Those where just my initial thoughts, not totally thought through, but just wanted to put them out there. I do like your ideas, they are very thorough and creative, and I am very open to proceeding in that direction if thats the best thing for the future of the program. I just wanted to throw my two cents in to get some more brainstorming going. 

P.S. Thanks for all your work Tom, I don't know where you find the time, but I wish I had it :)

Todd Giles

unread,
Dec 15, 2014, 12:33:41 PM12/15/14
to lightsh...@googlegroups.com
On Sun Dec 14 2014 at 11:10:10 PM Tom Enos <tom....@overclocked.net> wrote:
That is about what I did, I took the values of CHUNK_SIZE as defined, sample_rate = musicfile.getframerate(), etc... all from memory, and when I reload then I compare them the same way from the values stored in memory.  
            current_hardware = np.array([hc.GPIOLEN,
                                        sample_rate
,
                                        _MIN_FREQUENCY
,
                                        _MAX_FREQUENCY
,
                                        _CUSTOM_CHANNEL_MAPPING
,
                                        _CUSTOM_CHANNEL_FREQUENCIES
,
                                        CHUNK_SIZE
,
                                       
-1], dtype=object)


The channels I did not include because in play_song() it was never passed to the fft module.  But everything came from memory, just before it would have been sent to the fft module.     
As for removing the config from the sync files that's not a problem.  everything is in its own separate array.  But it was my understanding that the cached matrix was derived from those specific settings. and if any of them changed the matrix was no longer valid,  I'm a good programmer (not bragging, I know what I can do), but I do not know that much about how the fft calculations work, but I am starting to understand (slowly), and so I still have a lot of holes in my knowledge about fft.  So you may have to talk me through a few points as they come up.      
      
Back to the point.      

Is the matrix only good for the input values listed in the config array, or is there a way to adapt then for different setup?      

What do you mean by "config array" here?  The currently cached matrix is only good for the input parameters to fft.calculate_levels.  We could cache it at a bit higher level (e.g. the raw FFT output before calculating the level for each frequency bin), which would allow us to adapt more readily for pretty much any setup we throw at it, but would still require a "binning" process for each frequency bin, and calculation of mean / std.dev. / etc... which does take time, so not sure it would be worth doing that.
        
And the config array is a separate array, it is just stored in the same file.  Same with the std and mean.  I no longer stack them on the cache_matrix then remove them from the matrix.  Separate arrays stored in the same file.  But like I said it is only one way to do this, if you have a way your looking at for the future then walk me through it, I could do it a different way too, I just like to code doesn't matter much if it's my idea or someone else's, someone else's make it a challenge and that is the fun part for me.  That is one of the reasons I wanted to have this discussion.  I'm one person working on a small part of the project, it need to fit the project now and in the future and with the work others are doing. 

I feel that having the per-song overrides be it's own easily editable text file makes more sense, other than that - I like the idea of storing the sync file data in a binary format.  We can of course make an editor for the parts of it that we feel would be human "editable" in the future.
 
You received this message because you are subscribed to the Google Groups "LightshowPi Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lightshowpi-d...@googlegroups.com.

To post to this group, send email to lightsh...@googlegroups.com.

tom slick

unread,
Dec 15, 2014, 6:00:14 PM12/15/14
to lightsh...@googlegroups.com
A separate per-song config was my first thought too.  This was also just a thought, it was convenient with only having one file and easy, no mods to configuration_manager.py.  But it does look like a separate per-song config file would be easier to work with going forward.  But it still requires saving more data in the sync files to identify the config that was used to calculate the matrix and it still would need to check if it should use that config or re-cache the data.  I also thought about using a hash at one point to check for a match, but then you still need to generate the hash and then check it, which is why I went with storing the data directly and then checking that.  Checking 2 arrays for equity is simply regardless of what the array holds.  (array1 == array2).all()



Tom Enos

unread,
Dec 15, 2014, 7:13:44 PM12/15/14
to lightsh...@googlegroups.com


On Monday, December 15, 2014 9:33:41 AM UTC-8, Todd Giles wrote:
On Sun Dec 14 2014 at 11:10:10 PM Tom Enos <tom....@overclocked.net> wrote:
That is about what I did, I took the values of CHUNK_SIZE as defined, sample_rate = musicfile.getframerate(), etc... all from memory, and when I reload then I compare them the same way from the values stored in memory.  
            current_hardware = np.array([hc.GPIOLEN,
                                        sample_rate
,
                                        _MIN_FREQUENCY
,
                                        _MAX_FREQUENCY
,
                                        _CUSTOM_CHANNEL_MAPPING
,
                                        _CUSTOM_CHANNEL_FREQUENCIES
,
                                        CHUNK_SIZE
,
                                       
-1], dtype=object)


The channels I did not include because in play_song() it was never passed to the fft module.  But everything came from memory, just before it would have been sent to the fft module.     
As for removing the config from the sync files that's not a problem.  everything is in its own separate array.  But it was my understanding that the cached matrix was derived from those specific settings. and if any of them changed the matrix was no longer valid,  I'm a good programmer (not bragging, I know what I can do), but I do not know that much about how the fft calculations work, but I am starting to understand (slowly), and so I still have a lot of holes in my knowledge about fft.  So you may have to talk me through a few points as they come up.      
      
Back to the point.      

Is the matrix only good for the input values listed in the config array, or is there a way to adapt then for different setup?      

What do you mean by "config array" here?  The currently cached matrix is only good for the input parameters to fft.calculate_levels.  We could cache it at a bit higher level (e.g. the raw FFT output before calculating the level for each frequency bin), which would allow us to adapt more readily for pretty much any setup we throw at it, but would still require a "binning" process for each frequency bin, and calculation of mean / std.dev. / etc... which does take time, so not sure it would be worth doing that.
        
config array,  the hardware configuration used to generate the FFT matrix for that song.

I think I understand  what you are saying about the raw FFT output, but if I am not mistaken and as you state, that would slow things down between songs, is that correct?   

And the config array is a separate array, it is just stored in the same file.  Same with the std and mean.  I no longer stack them on the cache_matrix then remove them from the matrix.  Separate arrays stored in the same file.  But like I said it is only one way to do this, if you have a way your looking at for the future then walk me through it, I could do it a different way too, I just like to code doesn't matter much if it's my idea or someone else's, someone else's make it a challenge and that is the fun part for me.  That is one of the reasons I wanted to have this discussion.  I'm one person working on a small part of the project, it need to fit the project now and in the future and with the work others are doing. 

I feel that having the per-song overrides be it's own easily editable text file makes more sense, other than that - I like the idea of storing the sync file data in a binary format.  We can of course make an editor for the parts of it that we feel would be human "editable" in the future.
I have been convinced this is the way to go.  As I said I'm flexible, I think that adding a function to the config manager to read in a new config is required.  Some kind of tag or something in the cache file that let's synchronized_lights know it should use the new config.  It still requires that the hardware config used to generate the matrix be stored.  So that doesn't change, it should still check that the config matches,  whither or not it's the current config or the per-song config.

And also as side note,  I mentioned that the configs kept getting reloaded because of all the different imports.  I was sort of right.  Each of check_sms.py, commands.py, fft.py, prepostshow.py, hardware_controller.py and synchronized_lights.py all import the config manager in some way, (fft.py imported hardware_controller.py and that imported configuration_manager.py or it did, now fft.py does not import anything other then numpy), it was not as big of a problem as I thought.  check_sms.py was the one doing most of it (restarting every 15 seconds and reloading the configs every 15 seconds and causing commands.py to do the same, we might just want to run a loop and time.sleep in check_sms.py's main() instead of restarting it every 15 seconds in start_lights_and_music), but that did not affect hardware_controller or synchronized_lights.  But hardware_controller and synchronized_lights both load different copies of configuration_manager and the per-song configs are affected by that.  Some of the options that we want to add to per-song configs are controlled in hardware_controller (always on off inverted channels) so we would have to make hardware_controller reload the new values too, OR we could stop synchronized_lights from loading configuration_manager and just use the copy loaded by hardware_controller. 
example, in synchronized_lights to load the config on line 85
_CONFIG = cm.CONFIG
If we only import hardware_controller and not configuration_manager we could just do this
_CONFIG = hc.cm.CONFIG
Now we are only using one copy of configuration_manager, and changes made in that copy affect both
The copy I am using has these changes and it running fine, and I changed that same copy to pass synchronized_lights copy of hardware_controller to prepostshow, so it no longer reloads the configs there either and still works from the command line for testing with no change to the way the user would call it.

With all of this I see the configs reload only at the beginning of each song.

chrisausey

unread,
Dec 16, 2014, 10:01:49 AM12/16/14
to lightsh...@googlegroups.com
I agree saving as arrays does make sense and is just as effective as generating a hash and checking against it. Im still not sure about saving anything other than the sync data in the sync file. Would it make sense to save it to its own separate config file? Like I said Im on the fence about it, still trying to think wrap my brain around it all.
To unsubscribe from this group and all its topics, send an email to lightshowpi-dev+unsubscribe@googlegroups.com.
To post to this group, send email to lightshowpi-dev@googlegroups.com.

Todd Giles

unread,
Dec 16, 2014, 11:39:22 AM12/16/14
to lightsh...@googlegroups.com
I believe we should definitely save the data used to validate the sync file is still valid in the sync file itself for internal consistency of that file.

You received this message because you are subscribed to the Google Groups "LightshowPi Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lightshowpi-d...@googlegroups.com.

To post to this group, send email to lightsh...@googlegroups.com.

Tom Enos

unread,
Dec 23, 2014, 6:20:53 AM12/23/14
to lightsh...@googlegroups.com

I have it working with separate per-song configs.  Give me a little time to test it and i'll put it in my repo.  Also it has overrides for pwm channels and a sync file generator option as part of synchronized_lights.py that will cache files for a single file or an entire playlist, it also gets rid of that nasty cpu spike at the start of check_sms. I am still working on a way for check_sms and synchronized_lights to share the same playlist data with out reloading and saveing it al the time, but that is looking like an OOP approach (which I am also working on). 

I really need to stop working on several issues at the same time.  But ADHD has it's advantages.

And I think I might have GPU_FFT working in a few weeks.

And, has any body looked at the python-twisted module as a possible solution for the network socket interface?     

Okay, for the ADHC thing I'm only half that bad.  Well maybe 2/3 that bad.  Okay 9/10 that bad :)

Chris Usey

unread,
Dec 23, 2014, 8:15:29 AM12/23/14
to lightsh...@googlegroups.com
What did you find as the cause for the SMS spike ?

Chris Usey
You received this message because you are subscribed to the Google Groups "LightShow Pi Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to lightshowpi-d...@googlegroups.com.
To post to this group, send email to lightsh...@googlegroups.com.

Tom Enos

unread,
Dec 23, 2014, 3:57:18 PM12/23/14
to lightsh...@googlegroups.com
It's a combination of google voice and the startup of the script it self. 
if you change bin/check_sms to only start it once     
then change py/check_sms.py to loop and pause 
you still get a cpu spike but it is not as bad.  There was a large spike as the script started, bash starting python (I saw the cpu usage jump up 30-40% for a split second then drop), then the script running it's code (cpu usage 10-20% higher for about 2-3 seconds)
putting main in a loop stopped the big jump, but it still goes 10-20% higher every 15 seconds.
To try to cut this down even more I also moved a few thing out of main and put them in the global namespace, 
moved logging.basicConfig to global, as it only needs to be done once
moved setting playlist_path to global
the call to get unknown_command_response
commented out cm.set_songs(songs), as it doesn't do anything any way
renamed main and created a new main with only the the argparser in it and passed the args to the renamed main so that the parser only runs once.  Pretty much anything that only needed to be done once was moved out

So the renamed main does 5 things + logging, no calls to the configuration manager at all
        # Load playlist from file, notifying users of any of their requests that have now played
        # Parse and act on any new sms messages
        # Update playlist with latest votes
        # Delete all mesages now that we've processed them
        # sleeps for 15 seconds before doing it all over again
So with all that it went for an increase of 10-20% to an increase of only 9-18% while not sleeping

the next thing I did was to use a smaller playlist.  my playlist has 70 songs in it.
So I made a playlist with 3 songs and now the biggest spike from check sms is 7%

So it looks like the size of the playlist has something to do with it.  but also just checking the messages is a big chunk also
Reply all
Reply to author
Forward
0 new messages