On 05.02.16 23.26, Lynn McGuire wrote:
> 2. many of the DataItem instances are exactly alike since they are
> snapshots of a user's workspace
You really perform a deep copy to take a snap of your data and you do
this quite often?
No wonder that you run out of memory.
I have some similar requirements. An application that takes a snap of
the entire database on each user action to ensure data consistency. The
snap is nothing but a unique UTC time stamp. Each object instance has a
time stamp too. If an object is too new an older copy is used instead.
For this to work foreign keys never refer directly to the target
instance. They are just small key objects.
> 3. all kinds of data: strings, integers, doubles, string arrays, double
> arrays, integer array, strings larger than 300 characters are compressed
> using zlib
The data in your items looks like you are trying to implement
polymorphism by something like a custom union. Turning into C++
polymorphism could save memory. It looks a bit over-engineering to store
#Int with some hundred bytes.
> 4. DataItems are stored in a hierarchical object system using a primary
> key in DataGroup objects
So you have two levels. The DataGroup and the Key.
> 5. not sure what you are asking
> 6. no database backend
> 7. no concurrency (yet)
You probably never want to implement multi-threading with this data model.
> 8. when the storage used is 1.5 GB, the memory leakage is 10 MB (observed)
> 8a. the lifetime of the objects is controlled by the user by opening a
> file or closing a file
> 9. I think that it is DataItems but will not know for sure until
> completion of the current deduplication project
A memory profiler should give you the answer.
> Here is part of the declaration for the DataItem and DesValue classes. There are no member variables in the ObjPtr class.
The size of the ObjPtr will not be zero, because C++ forbids this.
> class DataItem : public ObjPtr
> {
> private:
>
> int datatype; // Either #Int, #Real, #String or #Enumerated
Polymorphism?
> int vectorFlag; // Flag indicating value contains an Array.
Use one item with all flags rather than individual integers saves Memory.
> int descriptorName; // name of Corresponding DataDescriptor
Redundant? You have DataDescriptor * below.
> // DataGroup * owner; // The DataGroup instance to which this
> item belongs
Is this not always part of caller context?
> std::vector <DataGroup *> owners; // The DataGroup instance(s) to
> which this item belongs
? - Either this or the above is redundant.
If you do not need the uplinks for your business, a reference counter
might be enough.
> DesValue * inputValue; // DesValue containing permanent input value
> DesValue * scratchValue; // DesValue containing scratch input value
Do not clobber the data store with intermediate data used for editing.
Use subclassing or shadow instances for this purpose. I guess only a few
instances have reasonable differences between inputValue and scratchValue.
By the way. What about the memory consumption of the DesValue objects?
They are bulky too.
> int writeTag; // a Long representing the object for purposes of
> reading/writing
No idea. Maybe also better a part of a mutable subclass.
> int unitsClass; // nil or the symbol of the class
> std::string unitsArgs; // a coded string of disallowed units
What about the storage of the strings behind this?
> std::map <int, std::vector <int> > dependentsListMap;
And this one could grow really large.
The nested structure is really not recommended. At least std::multi_map
should be better.
Depending on the number of entries std::map is not the best choice. It
allocates an object for each item. If the number of entries is quite
small or if it changes rarely a sorted vector performs better.
> DataDescriptor * myDataDescriptor;
See above.
> BOOL scratchChangedComVector; // if the scratch value was changed
See flags above.
> virtual int isDataItem () { return true; };
Your class is virtual anyway, so using subclasses for different types is
straight forward.
> class DesValue : public ObjPtr
> {
> public:
>
> int datatype; // Either #Int, #Real, or #String.
Again hand made polymorphism.
> int vectorFlag; // Flag indicating value contains an Array.
> int optionListName; // name of the option list item
> int * intValue; // Either nil, an Int, a Real, a String, or an
> Array thereof.
> double * doubleValue;
> std::string * stringValue;
> std::vector <int> * intArrayValue;
> std::vector <double> * doubleArrayValue;
> std::vector <std::string> * stringArrayValue;
> unsigned char * compressedData;
> unsigned long compressedDataLength;
> std::vector <unsigned long> uncompressedStringLengths;
Really bad design. If there are many objects of this type then you know
where your problem is.
Use sub classes with only one of the fields and avoid the additional
allocations for the referenced value objects.
> int isTouched; // Flag indicating if value, stringValue, or units
> have been modified since this DesValue was created. Set to true by
> setValue, setString, setUnits, and convertUnits.
> int isSetFlag; // Flag indicating whether the contents of the
> DesValue is defined or undefined. If isSet is false, getValue returns
> nil despite the contents of value, while getString and getUnits return
> the empty string despite the contents of stringValue and units.
Join all flags into one field.
> int unitsValue; // current string value index in $UnitsList
> (single or top)
> int unitsValue2; // current string value index in $UnitsList
> (bottom)
> std::string errorMessage; // message about last conversion of
> string to value
This looks again like transitional data that should not be part of the
persistent data model.
> std::string unitsArgs; // a coded string of disallowed units
Adjusting the data model would probably save at least 50% of memory even
without deduplication. Especially the DesValue Objects are unnecessarily
bulky. If you also do not create excessive deep copies of the structures
you should be fine. No need for deduplication as far as I can see. COW
should be sufficient.
It looks like you are trying to reinvent Excel. Well considering the
speed its design is probably similar. ;-)
Marcel