How to implement Amazon S3 storage?

1,852 views
Skip to first unread message

Cahir King

unread,
Nov 9, 2015, 10:20:00 AM11/9/15
to dcm4che
Hello,
  We have implemented dcm4chee on an Amazon EC2 instance.  We would like to now archive images older than one month to an Amazon S3 instance.

I have reviewed the documentation here:
Using Amazon S3 for NEARLINE storage - dcm4chee-2.x - Confluence

However, we are not too clear on how exactly to build dcm4chee from SVN. Can anyone please provide step-by-step
instructions? Is there any way to simply add an "S3 module" to an existing dcm4chee archive?

Or has anyone got a better way of setting up nearline storage to archive to S3 or Amazon glacier?

Thank you.

Rob McLear

unread,
Nov 10, 2015, 9:46:35 AM11/10/15
to dcm4che
I have been using S3FS for years, it has been very stable for me. It just mounts your S3 bucket as a file system and you can create a nearline storage location with the mount point as your target. 

-Rob

Scott McCrimmon

unread,
Nov 11, 2015, 12:08:46 PM11/11/15
to dcm4che

You can definitely get nearline archiving working on an existing DCM4CHEE archive, but it’s a bit of a challenge. Here’s a digest of my notes:


1. There is no binary available, so you need to check out the source and build it yourself. The URL is: https://svn.code.sf.net/p/dcm4che/svn/dcm4chee/sandbox/dcm4chee-hsm-cloud/

2. To get the source and some of the dependencies, install a SVN client like Tortoise SVN for Windows.

3. Use the SVN client to check out the project from the above URL. If you use Tortoise, it’s just right click a folder in Windows Explorer, SVN checkout…, paste in the URL, and click OK.

4. There’s no build script and anyway you need to modify the source code (see below) so import this project into your IDE as a standard Java project.

5. The following dependencies are needed to build the project: aws-java-sdk-1.2.5.jar, commons-io-1.3.1.jar, dcm4che.jar, dcm4chee-ejb-client.jar, dcm4chee.jar, httpclient-4.2.3.jar, httpcore-4.2.jar, jboss-common.jar, jboss-j2ee.jar, jboss-jmx.jar, jboss-system.jar, log4j-1.2.16.jar. Many can be found in the lib of your DCM4CHEE installation. Add these to the Java project lib.

6. Now you can build your binary but you need to fix an issue found by Jonathan Morra (https://groups.google.com/forum/#!topic/dcm4che/rXbDH4QlJM8), which is to add fetchHSMFileFinished(fsID, filePath, file) to the end of storeHSMFile method.

7. Also note carefully from that post that when you deploy the service as described in the instructions you will find that it registers itself as 

dcm4chee.archive:service=FileCopyHSMModule,type=CLOUD

-NOT-

dcm4chee.archive:service=FileCopyHSMModule,type=S3

Anywhere in the instructions that you are told to enter type=S3, change this to type=CLOUD

8. Now you can finish following the installation instructions.

9. Depending on your deployment, to avoid filling the system drive and crashing your server, you may want to change the following directory locations in the TarRetriever service:

CacheRoot and CacheJournalRootDirectory: set these to some non-system mount point.

And for similar reasons in FileCopyHSMModule service, type = CLOUD, change the following:

OutgoingDirectory and IncomingDirectory to a non-system mount point.

10. Some of the jar dependencies above are needed at runtime and therefore need to be copied to the DCM4CHEE deployment lib  to make the HSM cloud module work, so add the following to /dcm4chee-root/server/default/lib: aws-java-sdk-1.2.5.jar, httpclient-4.2.3.jar, httpcore-4.2.jar

11. Once you’ve set this up, keep in mind that if a study is in online storage and nearline storage, then the web interface will only report online availability. You can verify in the files DB table that the study is stored in both locations. To manually move files to S3 as a test, do the following:

a. Log in to the JMX console and bring up the FileCopy service

b. Use the copyFilesOfStudy() to specify a study by UID to copy to S3

c. Verify with the AWS system admin that the files made it to the AWS S3 bucket

d. Run a query on the PACS DB to verify that each file of the study now has a record assigned to the new file system and has the keyword 'tar' in the file path

e. Bring up the FileSystemMgt (ONLINE_STORAGE group) and use the scheduleStudyForDeletion() method to schedule the study for deletion from the online group.

f. Upon logging in to the web admin interface, you should now see that the availability of the test study is NEARLINE rather than ONLINE.


That's it. You should now have the ability to schedule studies for copy to S3 and delete from online storage.


On Monday, November 9, 2015 at 10:20:00 AM UTC-5, Cahir King wrote:

Cahir King

unread,
Nov 13, 2015, 8:37:27 AM11/13/15
to dcm4che
Guys,
  Thank you for your reply.  I decided to try Rob's approach of mounting an S3 file store locally on my PACs server running dcm4chee as it seemed less daunting than Scott's approach (which is undoubtedly effective for someone more competent than me).  For anyone interested, I followed the amazon S3 FS mounting instructions here:

https://www.maketecheasier.com/mount-amazon-s3-in-ubuntu/

I then tried to setup nearline storage following the instructions on the dcm4chee confluence site.  I added the S3 mount as a filesystem to NEARLINE filesystem management.  I configured the file copy service.  Then I configured the filesync service.  I set the archive to send any online images older than 4weeks to the nearline S3 mountpoint.  But nothing happened.  Here is the error thrown up:

org.dcm4che.net.DcmServiceException: Storage would require forbidden Coercion of (0020,000D) Study Instance UID,UI,*1,#56,[1.3.51.0.7.536499652.41360.59211.35763.39162.32012.35395] to (0020,000D) Study Instance UID,UI,*1,#56,[1.3.51.0.7.11177383263.42288.3151.39993.47053.6245.49294]

What did I miss in the NEARLINE setup?  What "coercion" is forbidden?  Or does anyone have an idiot's guide to setting up nearline that doesn't mention S3, because I got confused in the instructions when they referred to the S3 HSM?

Rob McLear

unread,
Nov 13, 2015, 10:56:45 AM11/13/15
to dcm...@googlegroups.com, Cahir King
I have never seen that issue before. The only issues I recall were with the use of SSL links to S3; using a regular http link solved the problem.

One general troubleshooting tip would be to check your mount points for the S3FS file system, create a couple of files via the command line using the same user ID and permissions as you are running dcm4chee just to be sure it isn’t something really basic.

At one point we had an issue with the S3FS file system becoming unmounted during routine use, and I wrote a script to check that the system was still mounted, and then force unmount and remount it if needed, but that problem seems to have resolved itself with either Ubuntu updates or S3FS updates.

-Rob
> --
> You received this message because you are subscribed to a topic in the Google Groups "dcm4che" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/dcm4che/hRWy5d_a_6M/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to dcm4che+u...@googlegroups.com.
> To post to this group, send email to dcm...@googlegroups.com.
> Visit this group at http://groups.google.com/group/dcm4che.
> For more options, visit https://groups.google.com/d/optout.

Cahir King

unread,
Nov 14, 2015, 5:56:39 AM11/14/15
to dcm4che, cahi...@googlemail.com
MMM - still getting the same "storage would require forbidden error" in the dcm4chee server log.  I changed the owner of the S3 mount folder to "dcm4chee", the same user that has control of the dcm4chee process.

I also launched riofs under this username, with the "-o "allow_other" flag.

Anyone got any thoughts on getting nearline storage running on an S3 mount?

Rob McLear

unread,
Nov 14, 2015, 10:40:47 AM11/14/15
to dcm...@googlegroups.com, cahi...@googlemail.com
Looking at my setup, I have s3fs (https://code.google.com/p/s3fs/wiki/FuseOverAmazon) version 1.74 installed. 

Under the ‘mount’ listing my s3fs  is listed as:

s3fs on /mnt/s3 type fuse.s3fs (rw,nosuid,nodev,allow_other)

df -h:   

s3fs            256T     0  256T   0% /mnt/s3

My share is mounted with the following:

s3fs -oallow_other -ourl=http://s3.amazonaws.com <nameofs3bucket> /mnt/s3

Looking in ‘top’, the s3fs process is owned by user root.

The main storage directory for my nearline file system has the following permissions:

    2    0 drwxr-xr-x 1 root root    0 Jan  2  2015 2015

In the JMX console NEARLINE_STORAGE MBean these are the parameters:

DefaultAvailabilityjava.lang.StringRWDefault Availability, which will be associated with new file systems added by operation addRWFileSystem. Enumerated values: "ONLINE", "NEARLINE","OFFLINE", "UNAVAILABLE".
DefaultUserInformationjava.lang.StringRWDefault User Information, which will be associated with new file systems added by operation addRWFileSystem.
DefaultStorageDirectoryjava.lang.StringRWDefault Storage Directory, used on receive of the first object if no Storage File System was explicit configured by operation addRWFileSystem. A relative path name is resolved relative to <archive-install-directory>/server/default/. Use "NONE" to disable auto-configuration to force failure to receive objects, if no Storage File System was explicit configured.


When I list all file systems, this is the output:

FileSystem[pk=1, archive, groupID=ONLINE_STORAGE, aet=petrad, ONLINE, RW+, userinfo=null]
FileSystem[pk=2, /mnt/s3, groupID=NEARLINE_STORAGE, aet=petrad, NEARLINE, RW+, userinfo=/mnt/s3]

I hope this information is helpful. I have had 7 or 8 virtual machines running this same setup without any errors for many years now.

-Rob

Cahir King

unread,
Nov 15, 2015, 3:52:30 AM11/15/15
to dcm4che, cahi...@googlemail.com
Thanks Rob.  Very helpful.  Unfortunately I am still getting the "not allowed" error.  I think I have tracked the problem.  The user dcm4chee cannot save files to the S3 mount AFTER it is mounted.  It can do this before.  I can't figure out a way to change the permissions of the S3 mount, when mounted so that the dcm4chee user can mount to it.  I have ran the riofs command with "allow_other", nosuid, nodev, but nothing is working too well.  Any thoughts appreciated.  P.S. riofs seems to use fuse.

Rob McLear

unread,
Nov 15, 2015, 5:14:49 PM11/15/15
to dcm...@googlegroups.com, cahi...@googlemail.com
Can you try running dcm4chee as root to see if the problem resolves? 

(Insert outrage comments regarding security here)

-Rob

fleetwoodfc

unread,
Nov 16, 2015, 6:37:24 AM11/16/15
to dcm4che
This error is not related to S3. Each DICOM file contains Study, Series and Instance UID tags. If two instances have matching series uid then the study uid should also match i.e if the instances belong to the same series then they should also belong to the same study. If they do not then you get this error when you try to store the 2nd instance. 

Cahir King

unread,
Dec 31, 2015, 5:19:51 AM12/31/15
to dcm4che, cahi...@googlemail.com
Hey Guys,
  I am coming back to this after a few months off.

OK - I have set up and mounted the S3 storage.  I have added it to NEARLINE.  However, I login to the dcm4chee-web3 console, and the NEARLINE storage does not appear under Filesystems.  Is there an additional setting required to get it to appear?

Rob McLear

unread,
Jan 4, 2016, 11:47:10 AM1/4/16
to dcm...@googlegroups.com, cahi...@googlemail.com
I’m not sure about the dcm4chee-web3 interface, but in the JMX-console is it listed under Online or Nearline filesystems when you do a list all?

-Rob
> Visit this group at https://groups.google.com/group/dcm4che.

matthe...@netscape.net

unread,
Jan 15, 2016, 6:00:54 AM1/15/16
to dcm4che, cahi...@googlemail.com
Hi Rob,

Found some old instructions from Damien. Hope they help.  Sorry about the weird characters embedded,  No idea where this document picked them up.  The attached file might be better.
I've used these to set up a few dcm4chee archives for testing and never had any issues, when I followed them correctly anyway  :-)
dcm4chee v2.1x

In this section we will configure the various services and get an understanding of what they do.
1. Add a file system to the NEARLINE storage group
1. Open a web browser, and navigate to the JMX console. e.g.†http://localhost:8080/jmx-console
2. Locate the ìdcm4chee.archiveî section, and click on ìgroup=NEARLINE_STORAGE,service=FileSystemMgtî
3. Scroll down to the ìList of MBean Operationsî section and find the ìaddRWFileSystem()î operation. Enter in a path for nearline storage. This will not actually be used, since we are going to store the files in S3, but we need to configure something here so that the system knows we are using the NEARLINE storage. I have entered: ìtar:/storage/nearlineî. Note the tar prefix. This tells dcm4chee that all of the files going to this storage group will be tarred up.
4. Click Invoke to create add the file system record into the database.
2. Configure the FileCopy service
The FileCopy service is responsible for physically copying files to your nearline storage. This is where we configure our particular plugin.
1. Specify a value for the DestinationFileSystem. This value should equal the value you specified for your nearline storage file system so that dcm4chee knows that this FileCopy service is associated with that file system configuration. e.g. ìtar:/storage/nearlineî
2. Specify a value for HSMModulServicename. This should be the JMX ObjectName of our S3 plugin module, and enables it for use within this service when storing and retrieving files. Enter: ìdcm4chee.archive:service=FileCopyHSMModule,type=S3î
3. Leave the FileStatus set to TO_ARCHIVE. This will be the status of files stored in S3. When the SyncFileStatus service runs and verifies that these files are stored properly, it will change the status to ARCHIVED.
4. Click Apply Changes.
3. Configure the TarRetriever service
This service is responsible for fetching and extracting tar files from the nearline storage during retrieve requests.
1. Specify a value for HSMModulServicename. This should be the JMX ObjectName of our S3 plugin module, and enables it for use within this service when storing and retrieving files. Enter: ìdcm4chee.archive:service=FileCopyHSMModule,type=S3î 
2. Click Apply Changes.
4. Configure the S3 HSMModule (service=FileCopyHSMModule,type=S3)
Now we are ready to configure the S3 integration. Here are the main things to configure here:
1. Amazon S3 bucket name
2. Amazon AWS Access Key
3. Amazon AWS Secret Key (this is write only, and you will not see a value after clicking Apply Changes)
4. The Outgoing and Incoming directories in this configuration are temporary storage areas that are used for tarring and untarring files.
5. Click Apply Changes.
5. Configure the SyncFileStatus service
This service will run periodically and verify the files that have been stored to S3. It will fetch the tar files and ensure that the correct files are contained within. Once it verifies the files, it will update the file status in the database to ARCHIVED.
1. Specify a value for the MonitoredFileSystem. This value should equal the value you specified for your nearline storage file system so that dcm4chee knows that this service is associated with that file system configuration. e.g. ìtar:/storage/nearlineî
2. Specify a value for HSMModulServicename. This should be the JMX ObjectName of our S3 plugin module, and enables it for use within this service when fetching tar files. Enter: ìdcm4chee.archive:service=FileCopyHSMModule,type=S3î
3. Specify a TaskInterval. It is set to NEVER by default, so you you should set it to a proper interval that is good for your workflow, preferably not during peak business hours.
Summary
At this point you should be able to store DICOM objects to dcm4chee and it will archive them to Amazon S3. The S3 key will be a hierarchical path, which should look familiar to you if you have looked at how dcm4chee stores objects on a file system. For example, here is a screenshot of my Amazon Management Console showing the archived path:

In the database, the files should have a changed file status, and should reflect their tar path as shown in this screenshot:

If you canít read it in the picture, the filepaths look like this:†2011/8/3/16/745ABFED/CF024730-323397.tar!CF024730/000004A0
Note the ìtarî designator in the path. This tells the system that the file is contained within a tar file.
Setting up Retention Rules to Remove Studies from Online
We probably donít want two copies of the study forever, so lets set up some rules now so that studies are deleted from the ONLINE storage group after a period of time. This will leave the remaining copy on S3.
Note that this is only an example. Your retention/deletion requirements may differ!
1. Configure deletion of ONLINE studies
1. Open a web browser, and navigate to the JMX console. e.g.†http://localhost:8080/jmx-console
2. Locate the ìdcm4chee.archiveî section, and click on ìgroup=ONLINE_STORAGE,service=FileSystemMgtî
3. Set DeleteStudyIfNotAccessedFor = your retention period (52w or whatever your SLA requires)
4. Set DeleteStudyOnlyIfStorageNotCommited = false
5. Set DeleteStudyOnlyIfCopyOnMedia = false
6. Set DeleteStudyOnlyIfCopyOnReadOnlyFileSystem = false
7. Set ScheduleStudiesForDeletionInterval = a reasonable time interval for the system to check the database and schedule deletion jobs.
8. Set DeleteStudyOnlyIfCopyOnFileSystemOfFileSystemGroup = NEARLINE_STORAGE
9. Set DeleteStudyOnlyIfCopyArchived = true (only delete studies that have been verified by the SyncFileStatus service. If you donít care about that or are not running that service, you can set this false.)
10. Click Apply Changes
At this point, dcm4chee will look for studies in ONLINE that meet these criteria and schedule them for deletion. After they are deleted, and the only copy is on S3, a retrieve request will trigger a fetch from Amazon. The tar file(s) will be fetched, images extracted and sent to the destination.
Thatís it!
Cloud installation and configuration.odt

Alejandro Rovira

unread,
Mar 8, 2016, 2:27:18 PM3/8/16
to dcm4che
Hi all, is there any more nearline storage that we can use?
I've seen another interesting provider https://www.backblaze.com/b2/cloud-storage.html
and I would like to use with my PACS, is it possible? 
Reply all
Reply to author
Forward
0 new messages