Has there been any progress on OPeNDAP and iRODS integration? I'm
particularly interested in this for the THREDDS Data Server.
Currently, we have data managed by iRODS, but due to the mask, we have
to run a cron script that will change the file permission to allow the
user running TDS to read these files from disk. With millions of files
under those directories, it
The alternative was to run an iRODSFUSE mount with TDS running on top.
This would remove the need for the cron job.
Contents of the directories are likely to be changed as well, and I can
see that there is a warning in the client/fuse under the iRODS source:
"2) When a collection is mounted using irodsFs, users should not use
iCommands such as iput, irm, icp, etc to change the content of the
collection because the FUSE implementation seems to cache the attributes
of the contents of the collection."
We also have processes on our servers using icommands such as icp and
imv for dealing with contents of those directories.
Note that we will only offer read access to TDS, so could this still
work, if say, we re-mounted those collections using iRODSFUSE once a day?
Cheers,
-Pauline.
--
Pauline Mak
Assistant Manager, ARCS Data Services
Ph: +61 3 6226 7518
Mob: +61 411 638 196
Email: pauli...@arcs.org.au
Jabber: pauli...@arcs.org.au
Calendar: http://tinyurl.com/pmak-arcs-calendar
http://www.arcs.org.au/
TPAC
Email: pauli...@utas.edu.au
http://www.tpac.org.au/
Reagan, Raja, and I just exchanged emails today regarding the integration of
iRODS and OPeNDAP.
My initial feeling is it would be a better solution if we can develop an
iRODS driver for OPeNDAP file system. What you described here is quite
different from what I am thinking. Would you draw a diagram? It will
definitely help me understand your ideas.
Sincerely,
Bing Zhu
DICE Team
Hi all,
Cheers,
-Pauline.
--
Pauline Mak
--
"iRODS: the Integrated Rule-Oriented Data-management System; A community
driven, open source, data grid software solution" https://www.irods.org
iROD-Chat: http://groups.google.com/group/iROD-Chat
thanks
raja
PS: Once we have some idea, we can send a note to the irods-chat on the design and request comments before performing any developments.
________________________________________
From: irod...@googlegroups.com [irod...@googlegroups.com] on behalf of Bing Zhu (([bz...@diceresearch.org]
Sent: Tuesday, June 08, 2010 2:22 AM
To: 'Pauline Mak'
Cc: irod...@googlegroups.com
Subject: RE: [iROD-Chat:4253] OPeNDAP & iRODS
Hi Bing,
OPeNDAP would not be a (file-system) resource underlying iRODS - the situation is the other way around and is presumably a common one. We want to run a service (OPeNDAP) which serves files (and subsets, transformation and aggregation of files) and have iRODS provide the underlying storage (with all the associated benefits of managing that storage).
In our current setup, OPeNDAP is only serving data based on files in particular collections. We make sure those particular collections have a physical replica on a particular server/resource with a filesystem as the backend, then we set up the OPeNDAP service to serve the files directly out of the local filesystem. We have to reset permissions so that the unix user running the service (not rods - maybe jetty) can read the files to be served as by default only the rods user has access permission.
One alternative setup would be to use fuse to make the data available to the server/service (which would not then need to be on the resource server), but we are worried about robustness and performance. Before pursuing this, Pauline is asking if there are other developments pending and for you to please explain the notes on iRODS fuse.
Regards,
Gareth (ARCS data team)
PS. Any web server/service is based on (data) file in a filesystem could potentially use a similar model, though iRODS response/latency might be too slow for many such applications. In general developing such a service to support an iRODS API to get files will not be feasible (or maintainable).
The question is that what types of scenarios are relevant to the user community? How can one enhance their experience in using OpenDAP under iRODS settings. What is the buy-in for a scientist to use this approach. These questions si what we are looking answers for in making an informed design decisions.Hope the iRODS user community can help us in this design process. Also, if there are other computational models than those enumerated above, feel free to add the to the mix.....
thanks
raja
________________________________________
From: irod...@googlegroups.com [irod...@googlegroups.com] on behalf of Gareth....@csiro.au [Gareth....@csiro.au]
Sent: Tuesday, June 08, 2010 6:36 AM
To: irod...@googlegroups.com
Subject: RE: [iROD-Chat:4254] OPeNDAP & iRODS
Hi Bing,
Regards,
Gareth (ARCS data team)
--
Hello Pauline and Gareth,
Your email clarified my questions about your expectation of integration between iRODS and OPENDAP.
In addition to Mike’s suggestions, I would put together some options of the integration approaches regarding files in a local storage in an OPeNDAP server side as you use the FUSE for.
Option #1. Use FUSE as described in your mail.
Option #2. Use ‘imcoll’ as suggested by Mike.
Option #3. Use ‘file registration’. For files in a local storage of an OPeNDAP server, you can register them into iRODS. This can be done using a periodic Shell/Perl script. In iRODS, a data replication micro service can be deployed as a rule to monitor the collection for registered OPeNDAP files, once a new OPeNDAP is registered into iRODS, the monitoring rule will make replicas (within iRODS) as required. Notice this approach allows OPeNDAP and iRODS to stay as two independent eco systems. There is no performance issue at all since the OPeNDAP server deals with its local storage only.
Option #4. Implement iRODS storage module for OPeNDAP. The approach is like doing I/O intercept in an OPeNDAP server that will do I/O operations directly with iRODS. Examples of such implementation are iRODS/SRB storage modules for Fedora and DSpace. In Fedora case, a Java interface allows easily an iRODS plug-in module to be used to replace local storage module without a need to change Fedora code. This approach may require some changes in the OPeNDAP server code, at least, the configuration part.
Finally, I noticed that OPeNDAP uses URL. Would it be great that if you can register a collection of OPeNDAP URLs of your interests into iRODS that will automatically replicate data inside iRODS? For this, iRODS need to build a driver to be able to access the data from OPeNDAP servers.
Let me know.
-Bing
From: irod...@googlegroups.com [mailto:irod...@googlegroups.com] On Behalf Of mw...@diceresearch.org
Sent: Tuesday, June 08, 2010 10:43
AM
To: irod...@googlegroups.com
> Option #4. Implement iRODS storage module for OPeNDAP. The approach is
> like doing I/O intercept in an OPeNDAP server that will do I/O
> operations directly with iRODS. Examples of such implementation are
> iRODS/SRB storage modules for Fedora and DSpace. In Fedora case, a Java
> interface allows easily an iRODS plug-in module to be used to replace
> local storage module without a need to change Fedora code. This approach
> may require some changes in the OPeNDAP server code, at least, the
> configuration part.
>
Please be aware that there's more one implementation of the OPeNDAP
server: there's Hyrax (from opendap.org), THREDDS Data Server (from
Unidata) and PyDAP (Roberto De Almeida). So it's not just one server to
modify...
>
>
> Finally, I noticed that OPeNDAP uses URL. Would it be great that if you
> can register a collection of OPeNDAP URLs of your interests into iRODS
> that will automatically replicate data inside iRODS? For this, iRODS
> need to build a driver to be able to access the data from OPeNDAP servers.
>
While I think this is certainly possible, but not sure if that's the
right use case. Most folks using OPeNDAP would like to data to sit at
the server end. The point of OPeNDAP is that, one can always get a
small chunk of data they're most interested in, without the need to
download the entire file.
My use case is: as OPeNDAP is a read-only protocol, I an using iRODS to
manage the "upload" of files to the OPeNDAP servers. This allows end
users to upload/modify files at will. Traditionally, the sys admin must
put files into a specific place where OPeNDAP/TDS can read it, which can
be a bit of a *nightmare* if you have 6 different sites to look after.
This works well in our case, as there is a TDS on each of our resource
nodes. All we needed to do was to set a few rules based on directory
names, and files will end up on the right server to be used by TDS.
There are advantages to use iRODSFuse (or similar) over local file
system. As is the case right now, some of our backend storage is down
for maintainence.
OPeNDAP also has a standardised way of handling metadata in
self-describing file formats, such as NetCDF and HDF. TDS, at least,
supports up to 20 different file formats. It is abstraction for
different file types by communicating to clients using the DAP protocol.
In addition to this, TDS also offers GIS services like Web Coverage
Service and Web Map Service for the right kind of data. TDS also
offers aggregation (and I suspect Hyrax would do so too), where multiple
files can merged together into a single logical volume. For example, a
lot of model data is stored in daily files, spanning many hundreds of
years. Using aggregation, these would just look like one OPeNDAP URL.
So you will loose some pretty funky features if you downloaded the files
into iRODS. (Am I making any sense?)
Personally, I would like to see loosely coupled systems... It means
both OPeNDAP servers and iRODS can be developed independently from each
other. All that is required is to expose iRODS as a local file system
(and therefore, is the "API" between the two) then we can take advantage
of both systems.
Sorry about the long email... If it helps, I would love to have a chat
on Skype.
thanks
raja
________________________________________
From: irod...@googlegroups.com [irod...@googlegroups.com] on behalf of Pauline Mak [pauli...@arcs.org.au]
Sent: Tuesday, June 08, 2010 7:31 PM
To: irod...@googlegroups.com
Subject: Re: [iROD-Chat:4258] OPeNDAP & iRODS
Hi all,
Cheers,
-Pauline.
--
Pauline Mak
--
--
"iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution" https://www.irods.org
iROD-Chat: http://groups.google.com/group/iROD-Chat
On Wed, Jun 9, 2010 at 3:42 AM, <mw...@diceresearch.org> wrote:
>
> Hello Pauline and Gareth,
> Taking everything into consideration, it may not be appropriate to use
> iordsFs (FUSE)
> for your task because of performance and caching issues.
It would help us make decisions about what to do in the short and long
term if someone would explain potential irodsFs caching issues. Can
you do that or suggest who could?
cheers,
Gareth
That's what I guessed, but it is far better to have concrete
information like this than to rely on my guess. We can target some
tests for particular use cases based on this info. For opendap, we'd
mostly expect very static stat info but it would be good to check what
sort of failure we get say if we replace a file with a smaller/larger
one with icommands and access it with FUSE. I'm expecting the result
to depend on how the file is opened and thus the application using
accessing the file.
regards,
Gareth
Hi Tiffany
I am not sure. Have you looked at the web page at:
https://wiki.irods.org/index.php/NETCDF
Step 12 in the Example section shows some subsetiing. Not sure if that is the type you are looking at:
here is what it says
"12) subsetting. "inc --noattr" shows the 4 dimensions. The subsetting syntax: dimName[start%stride%end] where 'start' and 'end' are the starting and ending indices of the dimension array. A stride of 1 means all points from start-end. A stride of 2 means every other points."
thanks
raja