OPeNDAP & iRODS

166 views
Skip to first unread message

Pauline Mak

unread,
Jun 8, 2010, 12:23:04 AM6/8/10
to irod...@googlegroups.com
Hi all,

Has there been any progress on OPeNDAP and iRODS integration? I'm
particularly interested in this for the THREDDS Data Server.

Currently, we have data managed by iRODS, but due to the mask, we have
to run a cron script that will change the file permission to allow the
user running TDS to read these files from disk. With millions of files
under those directories, it

The alternative was to run an iRODSFUSE mount with TDS running on top.
This would remove the need for the cron job.

Contents of the directories are likely to be changed as well, and I can
see that there is a warning in the client/fuse under the iRODS source:

"2) When a collection is mounted using irodsFs, users should not use
iCommands such as iput, irm, icp, etc to change the content of the
collection because the FUSE implementation seems to cache the attributes
of the contents of the collection."

We also have processes on our servers using icommands such as icp and
imv for dealing with contents of those directories.

Note that we will only offer read access to TDS, so could this still
work, if say, we re-mounted those collections using iRODSFUSE once a day?

Cheers,

-Pauline.

--
Pauline Mak

Assistant Manager, ARCS Data Services
Ph: +61 3 6226 7518
Mob: +61 411 638 196
Email: pauli...@arcs.org.au
Jabber: pauli...@arcs.org.au
Calendar: http://tinyurl.com/pmak-arcs-calendar
http://www.arcs.org.au/

TPAC
Email: pauli...@utas.edu.au
http://www.tpac.org.au/

Bing Zhu

unread,
Jun 8, 2010, 2:22:50 AM6/8/10
to Pauline Mak, irod...@googlegroups.com
Hello Pauline,

Reagan, Raja, and I just exchanged emails today regarding the integration of
iRODS and OPeNDAP.

My initial feeling is it would be a better solution if we can develop an
iRODS driver for OPeNDAP file system. What you described here is quite
different from what I am thinking. Would you draw a diagram? It will
definitely help me understand your ideas.

Sincerely,
Bing Zhu
DICE Team

Hi all,

Cheers,

-Pauline.

--
Pauline Mak

--
"iRODS: the Integrated Rule-Oriented Data-management System; A community
driven, open source, data grid software solution" https://www.irods.org

iROD-Chat: http://groups.google.com/group/iROD-Chat

Arcot Rajasekar

unread,
Jun 8, 2010, 8:42:29 AM6/8/10
to irod...@googlegroups.com, Reagan Moore

Bing
Can you also check with th OOI people (Matt Rodriguez atthe irods-chat who discussed the EC2 on iRODS thread will be of help). getting a diagram from them also would help. I will also check with folks here and se what they need in ters of OpenDAP service through iRODS.

thanks
raja

PS: Once we have some idea, we can send a note to the irods-chat on the design and request comments before performing any developments.


________________________________________
From: irod...@googlegroups.com [irod...@googlegroups.com] on behalf of Bing Zhu (([bz...@diceresearch.org]
Sent: Tuesday, June 08, 2010 2:22 AM
To: 'Pauline Mak'
Cc: irod...@googlegroups.com
Subject: RE: [iROD-Chat:4253] OPeNDAP & iRODS

Gareth....@csiro.au

unread,
Jun 8, 2010, 6:36:16 AM6/8/10
to irod...@googlegroups.com
> -----Original Message-----
> From: irod...@googlegroups.com [mailto:irod...@googlegroups.com] On
> Behalf Of Bing Zhu
-snip-

> Hello Pauline,
>
> Reagan, Raja, and I just exchanged emails today regarding the integration
> of
> iRODS and OPeNDAP.
>
> My initial feeling is it would be a better solution if we can develop an
> iRODS driver for OPeNDAP file system. What you described here is quite
> different from what I am thinking. Would you draw a diagram? It will
> definitely help me understand your ideas.

Hi Bing,

OPeNDAP would not be a (file-system) resource underlying iRODS - the situation is the other way around and is presumably a common one. We want to run a service (OPeNDAP) which serves files (and subsets, transformation and aggregation of files) and have iRODS provide the underlying storage (with all the associated benefits of managing that storage).

In our current setup, OPeNDAP is only serving data based on files in particular collections. We make sure those particular collections have a physical replica on a particular server/resource with a filesystem as the backend, then we set up the OPeNDAP service to serve the files directly out of the local filesystem. We have to reset permissions so that the unix user running the service (not rods - maybe jetty) can read the files to be served as by default only the rods user has access permission.

One alternative setup would be to use fuse to make the data available to the server/service (which would not then need to be on the resource server), but we are worried about robustness and performance. Before pursuing this, Pauline is asking if there are other developments pending and for you to please explain the notes on iRODS fuse.

Regards,

Gareth (ARCS data team)

PS. Any web server/service is based on (data) file in a filesystem could potentially use a similar model, though iRODS response/latency might be too slow for many such applications. In general developing such a service to support an iRODS API to get files will not be feasible (or maintainable).

Arcot Rajasekar

unread,
Jun 8, 2010, 12:54:26 PM6/8/10
to irod...@googlegroups.com
Hi Gareth
What we are looking at is providing a "logical naming" framework for data served through OpenDAP.
There are three ways to go with it.
1. Have OpenDAP clients use iRODS API when getting files. The cheaper way
is to expose iRODS collections as a Fuse mount. Then a client can access
data files that are spread across multiple sites without knowing their locations and
login requirements. Access control, metadata-based discovery, replication, collection organization
and othe iRODS featuresprovide additional features that will enhance the OpenDAP data experience.
2. Have an OpenDAP server at the backend and just serve the files from there by accessing
files using the OPENDAP protocol. This is good for get and put of files that are stored in
OpenDAP servers. This will provide similar functionalities as (1) but one needs to
download and upload files using iRODS. May be useful in a super/cluster/cloud-computing setting.
3. Wrap the libdap functions as micro-services and expose them through the iRODS client (as done
for HDF5). This will enable one to perform DAP-subsetting/analysis operations at the
server-side without pulling the raw files over the network. One can have these servers with good
computing facilities and use them to perform production runs in a workflow environment with results
being sent back to client (or stored in iRODS or OpenDAP server) for visualization. What we get
out of this approach is that one can handle large data (and large numbers of data) without having to
move them to client. Also, the workflow framework can be used for doing a distributed analysis.

The question is that what types of scenarios are relevant to the user community? How can one enhance their experience in using OpenDAP under iRODS settings. What is the buy-in for a scientist to use this approach. These questions si what we are looking answers for in making an informed design decisions.Hope the iRODS user community can help us in this design process. Also, if there are other computational models than those enumerated above, feel free to add the to the mix.....


thanks
raja

________________________________________
From: irod...@googlegroups.com [irod...@googlegroups.com] on behalf of Gareth....@csiro.au [Gareth....@csiro.au]
Sent: Tuesday, June 08, 2010 6:36 AM
To: irod...@googlegroups.com
Subject: RE: [iROD-Chat:4254] OPeNDAP & iRODS

Hi Bing,

Regards,

Gareth (ARCS data team)

--

mw...@diceresearch.org

unread,
Jun 8, 2010, 1:42:40 PM6/8/10
to irod...@googlegroups.com
 
Hello Pauline and Gareth,

Taking everything into consideration, it may not be appropriate to use iordsFs (FUSE)
for your task because of performance and caching issues.

So, what do we need to do to get iRods to run under OpenDap ? It is just a matter
of doing chmod for all the physical files in the vault ? In irodsctl.pl, there are 2 parameters:
$DefFileMode and $DefDirMode that can be used to set the default mode of files
and directories in the vault.

Another way to speed up read access to files in iRods may be by mounting the vault
directory using imcoll. Read access to these files using iCommands should be much
faster because interaction with the iCat is minimal.

Mike


Mike

Bing Zhu

unread,
Jun 8, 2010, 6:14:52 PM6/8/10
to Gareth....@csiro.au, irod...@googlegroups.com

Hello Pauline and Gareth,

 

Your email clarified my questions about your expectation of integration between iRODS and OPENDAP.

In addition to Mike’s suggestions, I would put together some options of the integration approaches regarding files in a local storage in an OPeNDAP server side as you use the FUSE for.

 

Option #1. Use FUSE as described in your mail.

 

Option #2. Use ‘imcoll’ as suggested by Mike.

 

Option #3. Use ‘file registration’. For files in a local storage of an OPeNDAP server, you can register them into iRODS. This can be done using a periodic Shell/Perl script. In iRODS, a data replication micro service can be deployed as a rule to monitor the collection for registered OPeNDAP files, once a new OPeNDAP is registered into iRODS, the monitoring rule will make replicas (within iRODS) as required. Notice this approach allows OPeNDAP and iRODS to stay as two independent eco systems. There is no performance issue at all since the OPeNDAP server deals with its local storage only.

 

Option #4. Implement iRODS storage module for OPeNDAP. The approach is like doing I/O intercept in an OPeNDAP server that will do I/O operations directly with iRODS. Examples of such implementation are iRODS/SRB storage modules for Fedora and DSpace. In Fedora case, a Java interface allows easily an iRODS plug-in module to be used to replace local storage module without a need to change Fedora code. This approach may require some changes in the OPeNDAP server code, at least, the configuration part.

 

Finally, I noticed that OPeNDAP uses URL. Would it be great that if you can register a collection of OPeNDAP URLs of your interests into iRODS that will automatically replicate data inside iRODS? For this, iRODS need to build a driver to be able to access the data from OPeNDAP servers.

 

Let me know.

 

-Bing

 

 

 


Sent: Tuesday, June 08, 2010 10:43 AM
To: irod...@googlegroups.com

Pauline Mak

unread,
Jun 8, 2010, 7:31:44 PM6/8/10
to irod...@googlegroups.com
Hi all,

> Option #4. Implement iRODS storage module for OPeNDAP. The approach is
> like doing I/O intercept in an OPeNDAP server that will do I/O
> operations directly with iRODS. Examples of such implementation are
> iRODS/SRB storage modules for Fedora and DSpace. In Fedora case, a Java
> interface allows easily an iRODS plug-in module to be used to replace
> local storage module without a need to change Fedora code. This approach
> may require some changes in the OPeNDAP server code, at least, the
> configuration part.
>

Please be aware that there's more one implementation of the OPeNDAP
server: there's Hyrax (from opendap.org), THREDDS Data Server (from
Unidata) and PyDAP (Roberto De Almeida). So it's not just one server to
modify...

>
>
> Finally, I noticed that OPeNDAP uses URL. Would it be great that if you
> can register a collection of OPeNDAP URLs of your interests into iRODS
> that will automatically replicate data inside iRODS? For this, iRODS
> need to build a driver to be able to access the data from OPeNDAP servers.
>

While I think this is certainly possible, but not sure if that's the
right use case. Most folks using OPeNDAP would like to data to sit at
the server end. The point of OPeNDAP is that, one can always get a
small chunk of data they're most interested in, without the need to
download the entire file.

My use case is: as OPeNDAP is a read-only protocol, I an using iRODS to
manage the "upload" of files to the OPeNDAP servers. This allows end
users to upload/modify files at will. Traditionally, the sys admin must
put files into a specific place where OPeNDAP/TDS can read it, which can
be a bit of a *nightmare* if you have 6 different sites to look after.
This works well in our case, as there is a TDS on each of our resource
nodes. All we needed to do was to set a few rules based on directory
names, and files will end up on the right server to be used by TDS.

There are advantages to use iRODSFuse (or similar) over local file
system. As is the case right now, some of our backend storage is down
for maintainence.

OPeNDAP also has a standardised way of handling metadata in
self-describing file formats, such as NetCDF and HDF. TDS, at least,
supports up to 20 different file formats. It is abstraction for
different file types by communicating to clients using the DAP protocol.
In addition to this, TDS also offers GIS services like Web Coverage
Service and Web Map Service for the right kind of data. TDS also
offers aggregation (and I suspect Hyrax would do so too), where multiple
files can merged together into a single logical volume. For example, a
lot of model data is stored in daily files, spanning many hundreds of
years. Using aggregation, these would just look like one OPeNDAP URL.

So you will loose some pretty funky features if you downloaded the files
into iRODS. (Am I making any sense?)

Personally, I would like to see loosely coupled systems... It means
both OPeNDAP servers and iRODS can be developed independently from each
other. All that is required is to expose iRODS as a local file system
(and therefore, is the "API" between the two) then we can take advantage
of both systems.

Sorry about the long email... If it helps, I would love to have a chat
on Skype.

Arcot Rajasekar

unread,
Jun 9, 2010, 8:30:53 AM6/9/10
to irod...@googlegroups.com

Pauline
Thanks for an interesting discussion and putting more meat into what will be the type of architecture that will support the needs of users. This discussion will help us in moving forward. I think that we will come back to you and others with more questions and possibly a strawman design and see how it will help in a more friendly manner.

thanks
raja


________________________________________
From: irod...@googlegroups.com [irod...@googlegroups.com] on behalf of Pauline Mak [pauli...@arcs.org.au]
Sent: Tuesday, June 08, 2010 7:31 PM
To: irod...@googlegroups.com
Subject: Re: [iROD-Chat:4258] OPeNDAP & iRODS

Hi all,

Cheers,

-Pauline.

--
Pauline Mak

--

Howard Lander

unread,
Jun 11, 2010, 11:07:34 AM6/11/10
to iROD-Chat
I've just started reading this discussion and think it's pretty
relevant to some work we are thinking of doing at RENCI. I'm wondering
if anyone has much of an idea how the THREDDS Data Server is
architected. I'm wondering if IRODS somehow implemented the DAP
protocol, would that be enough to allow THREDDS to serve/combine/
subset files resident in IRODS.

Raja: Do you think this is interesting enough to try to set up a call
with John Caron at UNIDATA? I know him slightly from a course I took
out there last summer.

Howard

On Jun 8, 7:31 pm, Pauline Mak <pauline....@arcs.org.au> wrote:
> Hi all,
>
> > Option #4. Implement iRODS storage module forOPeNDAP. The approach is
> > like doing I/O intercept in anOPeNDAPserver that will do I/O
> > operations directly with iRODS. Examples of such implementation are
> > iRODS/SRB storage modules for Fedora and DSpace. In Fedora case, a Java
> > interface allows easily an iRODS plug-in module to be used to replace
> > local storage module without a need to change Fedora code. This approach
> > may require some changes in theOPeNDAPserver code, at least, the
> > configuration part.
>
> Please be aware that there's more one implementation of theOPeNDAP
> server: there's Hyrax (fromopendap.org), THREDDS Data Server (from
> Unidata) and PyDAP (Roberto De Almeida).  So it's not just one server to
> modify...
>
>
>
> > Finally, I noticed thatOPeNDAPuses URL. Would it be great that if you
> > can register a collection ofOPeNDAPURLs of your interests into iRODS
> > that will automatically replicate data inside iRODS? For this, iRODS
> > need to build a driver to be able to access the data fromOPeNDAPservers.
>
> While I think this is certainly possible, but not sure if that's the
> right use case.  Most folks usingOPeNDAPwould like to data to sit at
> the server end.  The point ofOPeNDAPis that, one can always get a
> small chunk of data they're most interested in, without the need to
> download the entire file.
>
> My use case is: asOPeNDAPis a read-only protocol, I an using iRODS to
> manage the "upload" of files to theOPeNDAPservers.  This allows end
> users to upload/modify files at will.  Traditionally, the sys admin must
> put files into a specific place whereOPeNDAP/TDS can read it, which can
> be a bit of a *nightmare* if you have 6 different sites to look after.
> This works well in our case, as there is a TDS on each of our resource
> nodes.  All we needed to do was to set a few rules based on directory
> names, and files will end up on the right server to be used by TDS.
>
> There are advantages to use iRODSFuse (or similar) over local file
> system.  As is the case right now, some of our backend storage is down
> for maintainence.
>
> OPeNDAPalso has a standardised way of handling metadata in
> self-describing file formats, such as NetCDF and HDF.  TDS, at least,
> supports up to 20 different file formats.  It is abstraction for
> different file types by communicating to clients using the DAP protocol.
>     In addition to this, TDS also offers GIS services like Web Coverage
> Service and Web Map Service for the right kind of data.   TDS also
> offers aggregation (and I suspect Hyrax would do so too), where multiple
> files can merged together into a single logical volume.  For example, a
> lot of model data is stored in daily files, spanning many hundreds of
> years.  Using aggregation, these would just look like oneOPeNDAPURL.
>
> So you will loose some pretty funky features if you downloaded the files
> into iRODS.  (Am I making any sense?)
>
> Personally, I would like to see loosely coupled systems...  It means
> bothOPeNDAPservers and iRODS can be developed independently from each
> other.  All that is required is to expose iRODS as a local file system
> (and therefore, is the "API" between the two) then we can take advantage
> of both systems.
>
> Sorry about the long email... If it helps, I would love to have a chat
> on Skype.
>
> Cheers,
>
> -Pauline.
>
> --
> Pauline Mak
>
> Assistant Manager, ARCS Data Services
> Ph:  +61 3 6226 7518
> Mob: +61 411 638 196
> Email: pauline....@arcs.org.au
> Jabber: pauline....@arcs.org.au
> Calendar:http://tinyurl.com/pmak-arcs-calendarhttp://www.arcs.org.au/
>
> TPAC
> Email: pauline....@utas.edu.auhttp://www.tpac.org.au/

Mike Conway

unread,
Jun 11, 2010, 8:20:34 PM6/11/10
to irod...@googlegroups.com
Hey Howard,


In terms of the frameworks, it's a Java server running in Tomcat, using Spring and Spring MVC.  This might map nicely onto where Jargon is going, as well as the various web and web service interfaces that are being developed, also using Spring.

In terms of how Jargon might be able to plug underneath the server as a layer to communicate with iRODS, that would take a good deal more study.  As someone pointed out, there are multiple other examples where iRODS can plug into a layer on such a server, such as the Fedora repository via low-level storage or Akubra.  I too am curious as to how that model might apply, as there are a very wide range of services where similar arrangements could be done (e.g. Apache ServiceMix).

Cheers,
Mike C

--
"iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution"   https://www.irods.org

iROD-Chat:  http://groups.google.com/group/iROD-Chat


Mike Conway
DICE Center
Jargon, Java, Interface Developer

------------------------------------------------

Google voice/video: Michael....@gmail.com

Skype: michael.c.conway





Gareth Williams

unread,
Jun 14, 2010, 7:58:16 PM6/14/10
to irod...@googlegroups.com
Hi Mike,

On Wed, Jun 9, 2010 at 3:42 AM, <mw...@diceresearch.org> wrote:
>
> Hello Pauline and Gareth,
> Taking everything into consideration, it may not be appropriate to use
> iordsFs (FUSE)
> for your task because of performance and caching issues.

It would help us make decisions about what to do in the short and long
term if someone would explain potential irodsFs caching issues. Can
you do that or suggest who could?

cheers,

Gareth

mw...@diceresearch.org

unread,
Jun 15, 2010, 12:51:10 PM6/15/10
to irod...@googlegroups.com
Hello Gareth,

Most UNIX commands use the system call "stat' a lot, sometimes
doing 'stat' of the same file over and over. Access irods files
through FUSE can be extremely slow because each time  a 'stat'
is called, a remote call to the server is made. To get around this, the
irodsFs caches the stat of files with the assumption that all modifications
of files (create, delete and change) are done through irodsFs. For example,
if you have a file xy in irodsFs and do the following
 %ls -l xy
-rw-r----- 1 mwan mwan 732 2010-06-15 09:37 xy
(the stat of xy will be cached)
%irm -f xy
(rm the file outside of irodsFs)
%ls -l xy
-rw-r----- 1 mwan mwan 732 2010-06-15 09:37 xy
(Although the file is gone, ls still shows it)

Basically, if you make changes to files outside of irodsFs, irodsFs can be confused.
The stat cache is kept for 10 minutes before it becomes stale and discarded.

Hope this help.

Mike

Gareth Williams

unread,
Jun 16, 2010, 12:04:48 AM6/16/10
to irod...@googlegroups.com
Thanks Mike!

That's what I guessed, but it is far better to have concrete
information like this than to rely on my guess. We can target some
tests for particular use cases based on this info. For opendap, we'd
mostly expect very static stat info but it would be good to check what
sort of failure we get say if we replace a file with a smaller/larger
one with icommands and access it with FUSE. I'm expecting the result
to depend on how the file is opened and thus the application using
accessing the file.

regards,

Gareth

Tiffany Mathews

unread,
Jun 2, 2014, 11:30:28 AM6/2/14
to irod...@googlegroups.com
I was wondering if since this posting there was anything found on using iRODS with OPeNDAP to enable data in iRODS to be subsetted. This is something we are looking into doing with atmospheric science data. Any help would be greatly appreciated!

Arcot Rajasekar

unread,
Jun 6, 2014, 8:54:22 AM6/6/14
to irod...@googlegroups.com

Hi Tiffany

  I am not sure. Have you looked at the web page at: 

     https://wiki.irods.org/index.php/NETCDF

  Step  12  in the Example section  shows some subsetiing. Not sure if that is the type you are looking at:

   here is what it says

  "12) subsetting. "inc --noattr" shows the 4 dimensions. The subsetting syntax: dimName[start%stride%end] where 'start' and 'end'  are the starting and ending indices of the dimension array. A stride of 1 means all points from start-end. A stride of 2 means  every other points."

 

thanks

raja

 


 


--
--
"iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution" https://www.irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat

---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages