Need to import data from Excel (.xls) or (.csv) to cache

1,302 views
Skip to first unread message

Ricky

unread,
Mar 13, 2016, 12:21:38 PM3/13/16
to Caché, Ensemble, DeepSee
I have a requirement to read excel sheet or .csv file and fetch the data from some of the columns and store it in the cache database. Could you please help me in implementing this task. I would be happy if you could provide me some sample code.

Thanks

Dmitry Maslennikov

unread,
Mar 13, 2016, 12:46:15 PM3/13/16
to Caché, Ensemble, DeepSee
Hello Ricky,

XLS and CSV files are too different in a format. While XLS it is a binary format, even newest XLSX still binary, which you should repack to get plain xml.
CSV much easier, it is just a text file, which you can read line by line. But in this case also possible difference in delimiters, some time it may be comma, or even just tab letter.
And some data here could be quoted.
    set delimeter=","
    Set stream=##class(%Stream.FileCharacter).%New()
    Set sc=stream.LinkToFile("c:\myfile.csv")
    #;first line usually should contain names for columns
    Set headLine=$listfromstring(stream.ReadLine(), delimeter)
    While 'stream.AtEnd {
        Set line=$listfromstring(stream.ReadLine(), delimeter)
        
        #; here you can proceed this data, by $listget(line, 1) for first column or $listget(line, 2) for second one
        
    }

On Sun, Mar 13, 2016 at 5:21 PM Ricky <prabh...@gmail.com> wrote:
I have a requirement to read excel sheet or .csv file and fetch the data from some of the columns and store it in the cache database. Could you please help me in implementing this task. I would be happy if you could provide me some sample code.

Thanks

--
--
Caché, Ensemble, DeepSee

---
You received this message because you are subscribed to the Google Groups "Caché, Ensemble, DeepSee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-publi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Dmitry Maslennikov
Lead Programmer at LETOGRAF
Skype: DAiMor

Brendan Bannon

unread,
Mar 13, 2016, 5:59:24 PM3/13/16
to intersystems...@googlegroups.com

It is also possible to define the XLS as a table in Excel and then use ODBC to query it.

 

To query the data, you will need to define a name for the data in Excel.  The way to do this

differs between Excel versions, but you will need to choose "Name", and then

"Define" from either the Insert Menu or the button in the top left of the worksheet grid.

 Enter a name, and choose the cells containing the data to create a table.  The top cell of each

column will then become the column name for your query.

 

If you have problems getting this setup the WRC should be able to help you.

 

Brendan

Mikko Laitamäki

unread,
Mar 14, 2016, 8:22:23 AM3/14/16
to intersystems...@googlegroups.com
Dmitry's solution is simple and works fine in most of the times,
but is not working if your data contains commas or newlines etc as
https://tools.ietf.org/html/rfc4180 describes.

If you need full support I'dont know any implementation for that in mumps,
but for java and javascript or dotNet there is plenty to choose from.


Mikko
> <mailto:intersystems-publi...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> *Dmitry Maslennikov*
> Lead Programmer at LETOGRAF <http://letograf.com/>
> Skype: DAiMor <http://skypedaimor.blogspot.com/>
> about.me/dmitry.maslennikov
>
> <http://about.me/dmitry.maslennikov>
> --
> --
> Caché, Ensemble, DeepSee
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Caché, Ensemble, DeepSee" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to intersystems-publi...@googlegroups.com
> <mailto:intersystems-publi...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.


--
Mikko Laitamäki
mikko.l...@mediware.fi 040-7772109

Tom Fitzgibbon

unread,
Mar 14, 2016, 9:25:37 AM3/14/16
to intersystems...@googlegroups.com
You can generate CSV files from Excel spreadsheets via program calls (lots of free utilities available with output options), then pick the CSV apart using Cache. Fairly easy. We've taken apart just about all file formats over the years in Cache.

Tom Fitzgibbon | Multidata | 212-967-6700 x537



To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-publi...@googlegroups.com.

Jamie Prevo

unread,
Mar 14, 2016, 12:30:08 PM3/14/16
to intersystems...@googlegroups.com

If you need to automate this somehow, I would recommend a VB script to convert your excel file to a tab delimited file (commas cause problems) and then import the tab delimited file into the cache database.

Prasad Kari

unread,
Mar 14, 2016, 12:30:08 PM3/14/16
to intersystems...@googlegroups.com
It can be easily done from import wizard( Cache Management Portal/ SQL page)

HTH,
PK

On Mar 13, 2016, at 9:21 AM, Ricky <prabh...@gmail.com> wrote:

I have a requirement to read excel sheet or .csv file and fetch the data from some of the columns and store it in the cache database. Could you please help me in implementing this task. I would be happy if you could provide me some sample code.

Thanks

--

Peterm

unread,
Apr 27, 2016, 7:46:41 AM4/27/16
to Caché, Ensemble, DeepSee
Have you considered using the cache Activate wizard to expose the native Excel methods to cache?
 
You can then open an instance of the Excel application in cache and read the data directly. For example if you import the Excel Activate classes into a package called Office you can do things like:
 
set oExcel=##class(Office.Excel.Application.%New()
set oWorkbook=oExcel.Workbooks.Open("c:\MyFile.xls")
set oWorkSheet=oExcel.Workbook.Worksheets.ItemGet(1)  ; for 1st worksheet or you can use a name here for anamed worksheet
set oWorksheet=oWorkSheet.Become("Office.Excel.uWorksheet")  ;this is required to access the properties and methods of the uWorksheet class
 
you can then access the cloumns and rows of the worksheet you can get either the text ie displayed value of a cell or the underlying Excel value

an...@fiestahms.com

unread,
Apr 27, 2016, 8:21:53 AM4/27/16
to intersystems...@googlegroups.com
Hi Peter,
Just a query, do these classes created from Activate Wizard work on 64 bit windows as well?  On version 2010 it does not.
Regards
Anil
--

Roberto

unread,
Apr 27, 2016, 8:49:56 PM4/27/16
to Caché, Ensemble, DeepSee
I use CSV myself but I set up Excel to use the vertical pipe symbol as the separator since you don't normally see that in normal business data.

You set the character that you want as the separator in control panel in Windows. I'm pretty sure there's a way to do it also in Mac.

-Roberto

Cristiano Silva

unread,
Apr 28, 2016, 12:53:01 PM4/28/16
to Caché, Ensemble, DeepSee
Hi anil,

To use Acitvate x64 you need the Caché 2014.1 or higher.

Regards.
Cristiano José da Silva.
To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-public-cache+unsub...@googlegroups.com.

Joel Solon

unread,
Apr 28, 2016, 4:08:31 PM4/28/16
to intersystems...@googlegroups.com
Is there a reason you can’t use the Data Import Wizard? You can use it interactively in the SQL section of the Management Portal, or programmatically.
 

--
--
Caché, Ensemble, DeepSee

---
You received this message because you are subscribed to the Google Groups "Caché, Ensemble, DeepSee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-publi...@googlegroups.com.

DollarQ

unread,
May 1, 2016, 7:03:40 PM5/1/16
to Caché, Ensemble, DeepSee
Do you have examples of doing the import and export programmatically? I would like to see a working example if you have one.

Tony

unread,
May 2, 2016, 9:50:09 AM5/2/16
to Caché, Ensemble, DeepSee
Hello Ricky

Did you know a Microsoft Excel spreadsheet that has a 'xlsx' file extension is actually a compressed folder structure, comprised of a collection of XML files?  If you change the 'xlsx' extension to 'zip' you will be able view the data content, see example below.  You can't do this for earlier versions of Excel (i.e. 'xls'), as this is a proprietary binary format.

C:\Users\Tony\Desktop\Sample_XL_SSheet_XML.zip
|   [Content_Types].xml
|
+---_rels
|       .rels
|
\---xl
    |   styles.xml
    |   workbook.xml
    |
    +---worksheets
    |       sheet1.xml
    |
    \---_rels
            workbook.xml.rels

You can read more about Office Open XML, also known as OpenXML or OOXML, in the following web link.

http://www.officeopenxml.com/
- WordprocessingML (docx)
- SpreadsheetML (xlsx)
- PresentationML (pptx)
- DrawingML

If you're interested, the schemas for OOXML can be found on the following web site.

https://svn.cesecore.eu/svn/signserver/vendor/openxml4j/current/src/schemas/

You can use the 'XML Schema Wizard' add-in within Caché Studio to create the necessary class definitions.  I would recommend you download all these schemas onto your workstation first, as the 'sml-sheet.xsd' schema for example references other schemas using a relative path (i.e. all in the same directory level).

Once you've extracted the contents into a folder structure, you can bring the data into Caché using the following code.

Set reader = ##class(%XML.Reader).%New()
Set sc=reader.OpenFile("C:\Users\Tony\Desktop\[your unzipped xlsx file]\xl\worksheets\sheet1.xml")
Do reader.Correlate("worksheet","OOXML.SML.Worksheet")
Write reader.Next(.worksheet,.sc)
Do worksheet.XMLExport(,",indent")

Unfortunately, I do not have any sample code for you to use, but perhaps someone may be willing to help.

Tony

PS: I'm actually doing this process the other way, creating an 'XLSX' file from a global array work file (merging in standard template data), see below.

^OOXML("Templates",some_key,"OOXML.Types")
^OOXML("Templates",some_key,"OOXML.Types","Default",1,"ContentType") = application/vnd.openxmlformats-package.relationships+xml
^OOXML("Templates",some_key,"OOXML.Types","Default",1,"Extension") = rels
^OOXML("Templates",some_key,"OOXML.Types","Default",2,"ContentType") = application/xml
^OOXML("Templates",some_key,"OOXML.Types","Default",2,"Extension") = xml
^OOXML("Templates",some_key,"OOXML.Types","Override",1,"ContentType") = application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml
^OOXML("Templates",some_key,"OOXML.Types","Override",1,"PartName") = /xl/workbook.xml
^OOXML("Templates",some_key,"OOXML.Types","Override",2,"ContentType") = application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml
^OOXML("Templates",some_key,"OOXML.Types","Override",2,"PartName") = /xl/worksheets/sheet1.xml
^OOXML("Templates",some_key,"OOXML.Types","Override",3,"ContentType") = application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml
^OOXML("Templates",some_key,"OOXML.Types","Override",3,"PartName") = /xl/styles.xml

Method CreateContentTypes() As %Status
{
    // populate 'OOXML.Types' object
    S name="OOXML.Types"
    I '$D(^||OOXML(%ID,"Class",name)) Q $$$OK
    S sc=..BuildObject(name, $NA(^||OOXML(%ID,"Class",name)), .types)
   
    // create directory path and write '[Content_Types].xml' file
    S filepath=..PathName
    I '##class(%File).CreateDirectoryChain(filepath) Q $$$ERROR($$$GeneralError,"Cannot create directory chain")
    S file=##class(%XML.FileCharacterStream).%New()
    S file.Filename=filepath_"/[Content_Types].xml"
    S sc=file.WriteLine("<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>") I $$$ISERR(sc) Q sc
    S ns=##class(%XML.Namespaces).%New()
    S sc=types.XMLExportToStream(file,"Types",..ExportFormat,ns) I $$$ISERR(sc) Q sc
    S sc=file.%Save() I $$$ISERR(sc) Q sc
   
    // increment file counter
    S ..Count=..Count+1   
    Q $$$OK
}

ClassMethod BuildObject(ClassName As %String, Array As %String, ByRef Object As %RegisteredObject = "") As %Status
{
    // instantiate class definition and create new object
    S cdef=##class(%Dictionary.ClassDefinition).%OpenId(ClassName,,.sc)
    I $$$ISERR(sc) Q sc
    S Object=$ZOBJCLASSMETHOD(ClassName,"%New")
   
    // walk though properties in class
    S count = cdef.Properties.Count()
    F i=1:1:count {
        S prop=cdef.Properties.GetAt(i)       
        S data=$D(@Array@(prop.Name))    // get data for property
        I 'data CONTINUE        // no data defined
        I data=1 S $ZOBJPROPERTY(Object,prop.Name)=@Array@(prop.Name) CONTINUE
        I prop.Collection="" {
            S sc=..BuildObject(prop.Type, $NA(@Array@(prop.Name)), .ObjectSub)
            S $ZOBJPROPERTY(Object,prop.Name)=ObjectSub
        } ELSE {
            S key="" F  {
                S key=$O(@Array@(prop.Name,key)) I key="" Q
                S sc=..BuildObject(prop.Type, $NA(@Array@(prop.Name,key)), .ObjectSub)
                S ObjectColl=$ZOBJPROPERTY(Object,prop.Name)
                I prop.Collection="list" S sc=$ZOBJMETHOD(ObjectColl,"InsertAt",ObjectSub,key)
                I prop.Collection="array" S sc=$ZOBJMETHOD(ObjectColl,"SetAt",ObjectSub,key)
            }
        }
    }
   
    // all done
    Q $$$OK

}

On Sunday, 13 March 2016 16:21:38 UTC, Ricky wrote:

Aleksandr Kalinin

unread,
May 18, 2017, 1:41:48 AM5/18/17
to Caché, Ensemble, DeepSee
Hello Piter! What dll do I need to expose if I have MS Office 2010 installed? I was searching in list of Active Wizard something with "excel" word, but there is nothing..

среда, 27 апреля 2016 г., 14:46:41 UTC+3 пользователь Peterm написал:
Reply all
Reply to author
Forward
0 new messages