ok with still being a routine, I havnt yet used up my allocation of
'stupid questions' so here comes one.
So I have a large amount of rows to insert in cache. I defined a class
for this and called it from c# using ADO .net. I generated a wrapper
class from cache for .Net and instianted a new class, set the values
and called the .Save on it. I managed to get in about 500k of rows in
15 minutes. I tried to speed this up by having a static method that
would take in parameters and have the method create the cache class
using the %new.
However, I have just discovered the power of globals and it seems a
lot faster for me to call a static method with the parameters and just
insert them into the globals myself. I have the globals strutured with
sub scripts of course.
1) So is this the best way is question 1.
2) If I need to load data from the globals based on a query what is
the best and fastest way to do this.
3) If I want to use a class to cast the globals back to the class I
heard there is a 'Storage' properity on a class that lets you map
globals back to properties in a class. How do I do this.
4) And last but not least.. I have a float in a class and when I
generate the .Net wrapper classes it makes this a double. So when I
set the value to say 1.1234 it comes in as 1.1238076352 One of the
problems from going to a double. How do I overcome this considering
that I use financial data and want it to come in as 1.1234 and nothing
else.
Sorry for the long list but any help is appreciated ... rgds M
--
List: http://groups.google.com/group/intersystems-public-cache
Devcon: March 21 - 24, Orlando, Fl. Register before March 1 for $300 early-bird discount.
as I never used .NET within Caché I can only answer some of your
questions.
> However, I have just discovered the power of globals and it seems a
> lot faster for me to call a static method with the parameters and just
> insert them into the globals myself. I have the globals strutured with
> sub scripts of course.
In the scenario you described it's better not to use direct global
access. Let Caché do the meta data handling for you. That means if you
described your project by classes use SQL or object access. There are
very rare scenarios where you would use direct global access.
Imagine, years after you closed your project someone defines an index
in your class. Direct global access won't update this index. That's
not what you want.
> 2) If I need to load data from the globals based on a query what is
> the best and fastest way to do this.
Try to insert data by SQL it is nearly as fast as direct global
access.
Regards
Alex
On Feb 20, 11:51 pm, Mighty Abhabelle <mightyabhabe...@gmail.com>
wrote:
> hi all,
> So I have a large amount of rows to insert in cache. I defined a class
> for this and called it from c# using ADO .net. I generated a wrapper
> class from cache for .Net and instianted a new class, set the values
> and called the .Save on it. I managed to get in about 500k of rows in
> 15 minutes. I tried to speed this up by having a static method that
> would take in parameters and have the method create the cache class
> using the %new.
When you insert large number of rows/instances in Caché via block of
code, the process starts slowing down due to the fact that the indexes
have to be updated and that is what slows down the process.
HTH
Regards
Sukesh Hoogan
Bombay, India
- Enterprise Resource Planning
- Business Intelligence
- Financial Accounting
- Offshore Development
^SharePriceD(Company,Date,Hour,Min) and within minute I would store a
record that has the high and low for each minute and the volume
traded. So if I need to pull out a certain day for eaxmple I can use
the subscripts. However when I put these details into a class I have a
global that just has one big record block in it like ..
^SharePriceD(1) = $lb("","IBM","2010-01-03","22","21",
1.4301,1.4305,"1762763")
It just has one big list. If I take this approach I will end up with a
hugh global with loads of records at the one level. I was thinking of
creating a Class called Company, then SharePriceDate, and letting a
class of PriceInfo date inherit from them to see would this set the
globals up any better. For me at the moment I cant see the advantage
in cache over SQL Server when all the data is stored as rows... or am
I still missing something. I thought it was how the data was stored in
Globals was the secret.
Ok, so yes the point that was made not to intermix globals and SQL
makes sense.. I was just looking for speed I suppose but then want to
use the nice classes later. I was going have a class that would simply
provide methods to interact with the globals... so CreateSharePrice
might take in the data and add the info to a global and when I query
it by calling a method like GetShareInfoForCompany( CompanyName, Date)
it would take the data from the globals, and return them to the
caller.. but then again I'm not sure if I can get any gains.
So I would be happy I suppose to continue with the classes if someone
could tell me why my globals arnt structured when I call the save on
the .Net Generated class and end up with one big global with rows.
thanks Mighty
Sounds to me like to want to optimise for throughput... welcome to the
dark side.
Basically you have a large number of records you want to insert as
fast a possible into the database.
Will you have people "logged in" using this area of your application
during the insert?
If not you can always insert your data and then purge and rebuild your
indexes. In this case it doesn't matter that you use direct global
access.
You could look at dividing up the problem. One of the bottle-necks is
going to be the database connection.
Carve up your insert data and load via multiple C# processes (not
threading) to push the data in.
You could look at batch loading an array of your insert objects via a
static method, for example pass 100 objects accross at a time. When
you do this you are aggregating a series of your global set commands
so you can look at turning off sorting. Basically when cache inserts
data into globals every entry is sorted. You can use the temp database
to divvy up a series of set operations and then have the system do the
sort and commit at the end.
Have a look at the commands $SORTBEGIN and $SORTEND.
With these you can easily double throughput. Careful not to fill you
temp database with too large a dataset. Maybe I would pre-expand the
temp database before running processing.
If your data was stored in Lists instead of global sub-nodes this
decreases the number of globals of sets.
You can set a list node directly eg:
Set ^myGlobal(id)=$ListBuild("abc",123,345)
This used to be about 5 times faster than creating an object and
saving an average object.
If you have a very complex object with say 100 fields and are just
inserting necessary fields eg: 20 you can define an new cache
persistent class that layers over the same storage as the big object
using a custom storage stratergy.
This will get you faster inserts.
If you are bulk loading you data from another database eg: MS SQL or
Oracle, have a look at the SQL gateway and pulling the data from
Cache. ie: may be possible to get the data without using a C# layer?
With regards to custom storage when you have a persistent class open
in studio from the menu choose:
View->ViewStorage
This unhides a block of XML at the bottom of the class. If you have no
fear just tweak the XML to remap where and what you want to store.
However after unhiding the storage definition this should be visible
in the class inspector. Select Storage and Default.
Select StorageMap (...) button and this opens up the custom storage
wizard. This should give some idea of what is allowed and what is
possible.
For a financial application if you are always using float you could
look at projecting the property as string and sub-class the projection
with a new property of type float:
public float MyNumber
{
get
{
float out;
float.TryParse(inner.myvar,.out);
return out;
}
set
{
inner.myvar=value.ToString();
}
}
On the cache side during loading you can put a + sign in front of a
variable to turn it into a number when you store it.
As you'll notice there are two basic types in cache globals, strings
and numbers.
Just out of interest is the property defined as type of %Float in the
cache class definition? I notice that %Float's XSDTYPE is Double
wonder if this is a bug.
You can override this in your classes property definition:
Also if you use obj.i%PropertyName access instead of obj.PropertyName
cache will not bother with validation code as part of the %Float
datatype class, so speeds up the a %Save method.
On a side note I was wondering whether there is any interest in a new
C# projection that negates you having to pass a database connection
across with every method call. ie: you have a static connection
factory that looks in app.config connections strings for your cache
connection. Clutters up the C# implementation code.
Something where you can add code attributes on the Cache class side:
* to project methods as constructors
* re-badge methods with longer / meaningful names.
* re-badge methods to look like overloads
* project documentation into the generated c# stubs.
* something that implements desctructors (~) and IDisposable to
support dereferencing in garbage collection and using statements
respectively.
Maybe you would like to work in the opposite direction with C#
attributes to decorate classes and have a tool build the cache class
definitions for you.
Hope this helps.
Cheers
Alex
I think that by default the storage strategy is to use a list for
data.
This results in a single operation when reading the object data from
or saving data to disk.
You could change this for various reasons.
* A list can only be up to 32K and you may want to customise the
storage if you have a class with many string properties that would
exceed this limit.
* Maybe there is a part of a class that changes often but you don't
want to re-save the entire object on each update.
* Maybe you wish to ensure that all leaf nodes are always under 8K for
good ECP scaling.
You may decide on a Data Access Layer within cache. For example you
may be building an API and wish to expose your data via objects,
however you don't want every Tom, Dick and Harry creating, deleting
and modifying your data.
Do you really want %New, %OpenId, %DeleteId to be projected?
In this case what you can provide is all access to your data via
%Registered types.
ClassMethods and Registered types are projected to C# but data can
only be modified via your tightly controlled object interface. Ie: You
map between:
* external registered types and internal persistent types
* or external registered types and internal globals.
* or a hybrid of the two.
I have some sample classes for you to consider from your earlier
questions:
Class Test.City Extends %Persistent
{
Property Name As %String;
Property Longitude As %Float;
Property Latitude As %Float;
}
Class Test.Address Extends %Persistent
{
Property StreetName As %String;
Property City As Test.City;
Property PostCode As %String;
}
Class Test.Person Extends %Persistent
{
Property Forename As %String;
Property Address As Test.Address;
classMethod GetPeopleInCity(City As %String) As %Library.ListOfObjects
{
// create a list to return instance of Person
set ret=##class(%ListOfObjects).%New()
// declare a cursor for query - look no joins :)
// would normally want to clean up the City variable to prevent sql
injection issues etc.,.
&SQL(DECLARE PeopleInCity CURSOR FOR
SELECT %ID
FROM Test.Person p
WHERE p.Address->City->Name = :City)
&SQL(OPEN PeopleInCity)
if (SQLCODE=0) // if the cursor opens ok
{
// for each record get the id
For {
&SQL(FETCH PeopleInCity INTO :id)
Quit:SQLCODE'=0
// open each person with no concurrency locking
do ret.Insert(##class(Test.Person).%OpenId(id,0))
}
}
// close cursor when done
&SQL(CLOSE PeopleInCity)
// return list of people with critera
quit ret
}
}
/// create some test data
set obj=##class(Test.Person).%New()
set obj.Forename="John"
set obj.Address=##class(Test.Address).%New()
set obj.Address.StreetName="Regent Street"
set obj.Address.City=##class(Test.City).%New()
set obj.Address.City.Name="London"
set obj.Address.PostCode="WC1 5ER"
do obj.%Save()
do obj.%Close()
/// try out the new method
set myperson=mylist.GetNext("")
write myperson.Forename
write myperson.Address.City.Name
/// note that because of lazy loading the City object is not swizzled
from disk until I try to reference it.
Cache classes are going to give you 4 big gains over SQL server:
1) It doesn't have to be relational. If you are a C# specialist and
model objects,your storage need not be any different. Eg: A Person
instance contains an Address reference which contains a City reference
which has a Name.
2) You can use Object SQL in a similar way to Linq on objects with C#,
ie: no joins necessary.
For example the GetPeopleByCity method. The the query runs internally
to cache and what is returned to C# is an actual list of person, not a
flat resultset. If only it was a List<Test.Person>?
3) Inheritance - A common pattern with relational model is using a
base table and then joining "regional" or specialised tables onto
them.
In cache this isn't necessary - you just inherit the base class. For
example you could have a type Employee that extends Person but has an
additional property of “work address”.
4) When you subclass eg with Employee you can override methods from
the Person base to alter their behaviour. I don't recall other
databases have polymorphic stored procedures...
Hope this helps
Cheers
Alex
PS... nice find rtweed... wish I had it a week ago LOL but its great.
On Feb 21, 11:55 pm, Alex Woodhead <alexatwoodh...@googlemail.com>
wrote: