ONLINE/NEARLINE approaches study date based

569 views
Skip to first unread message

Thiago Henrique Busarello

unread,
Feb 12, 2019, 8:13:07 AM2/12/19
to dcm4che

Good Morning.

I have the following scenario: a cloud VM with a Virtual Disk (/storage/fs1/) and a Google Cloud Bucket (/storage/cold/) mapped.

I wish after 90 days the exams were moved to Bucket in order to save money. I thought of two approaches:

I thought about using the "NEARLINE" option, however, as I understood it, it would have two problems:
  • the copy is instantaneous, regardless of having expired the 90 days;
  • It only erases from ONLINE when the disk space limit is reached, and not only after expired.
So I thought of using the "Reject Expired Studies" option along with the "Move Expired Studies before Rejection" option so, I could keep the exams on the Virtual Disk for 90 days, then they would be moved to Bucket, and then deleted. The problem with this approach is:
  • after moved, the exams would no longer be available in the same AeTitle, you would have to move to another.

In short, I would like to keep the exams in /storage/fs1/ for 90 days, and then move to /storage/cold/, but without changing AeTitle, and consequently still leaving the exams accessible.

What would be the best approach?

Thanks in advance.

vrinda...@j4care.com

unread,
Feb 13, 2019, 12:04:09 PM2/13/19
to dcm...@googlegroups.com
For your requirement, you would just need the following :

- Add Nearline storage (= Google Cloud Bucket) : Refer dcmStorageID=nearline in sample-config.ldif and add dcmRetrieveCacheStorageID: fs1
- Update fs1 : Set Storage Duration to CACHE (by default it is PERMANENT).
- Add an Exporter : Refer dcmExporterID=CopyToNearlineStorage exporter in sample-config.ldif. You may choose not to reject expired studies but just delete objects from Virtual Disk which acts as online cache storage. For this ensure to set Delete Study From Storage ID as fs1 and Reject Entity for Data Retention Expiry may be left as it is as false (default value)
- Add a Study Retention Policy : Ensure to set Export expired Study to Exporter ID of exporter created above.

By doing this the study is still available and can also be retrieved after they have been exported to nearline and deleted from online file system. I do not quite understand what you mean by but without changing AeTitle

For the sake of testing, you may set expiration date to a past date and also lower the values of Reject Expired Studies Polling Interval and Purge Storage Polling Interval. Verify in logs/database/filesystem that objects are deleted from virtual disk. Once done, do a query/retrieve and verify that they are still retrievable.

leogrande

unread,
Feb 19, 2019, 10:22:06 AM2/19/19
to dcm4che
I have a similar to Thiago's scenario and need some elaboration from you on this subject.

5.15.1 on AWS EC2/S3.
EC2 volume is an ONLINE storage and S3 bucket is a NEARLINE storage.

Everything works fine but I am trying to figure out on how to translate dcm4chee-2 constraints into dcm4chee-arc 5 settings:
DeleterThresholds
DeleteStudyIfNotAccessedFor
DeleteStudyOnlyIfNotAccessedFor
DeleteStudyOnlyIfCopyOnFileSystemOfFileSystemGroup

I realize that there is no direct replacements for those very convenient settings but maybe you can give some clarifications on how to use the current CACHE, PERMANENT storage in my case.
From your response I understand that fs1 online must be set with the CACHE duration and NEARLINE as a PERMANENT.

I can't find these settings:
 "Reject Entity for Data Retention Expiry may be left as it is as false (default value)"

Add a Study Retention Policy : Ensure to set Export expired Study to Exporter ID of exporter created above.

On which level they have to be set up, Archive or AE?

Thank you in advance.



vrinda...@j4care.com

unread,
Feb 19, 2019, 11:02:36 AM2/19/19
to dcm4che
By default configurations, fs1 online storage is always set as PERMANENT. To fulfill his requirement (first post), I suggested the changes from PERMANENT to CACHE.

    maybe you can give some clarifications on how to use the current CACHE, PERMANENT storage in my case.

    "I can't find these settings:
      "Reject Entity for Data Retention Expiry may be left as it is as false (default value)"
      Add a Study Retention Policy : Ensure to set Export expired Study to Exporter ID of exporter created above.

    On which level they have to be set up, Archive or AE?"
- Again, this was mentioned in response to fulfill his requirement (first post). Study Retention Policy child object maybe added on Archive Device level (= Apply policy to objects received by any AE) or on Archive AE level (= Apply policy to objects received only by this AE). Reject Entity for Data Retention Expiry is a field of Exporter Descriptor child object, which can be configured only on Archive Device level. See attached screenshot which shows the list of child objects that can be configured on Archive Device level.

   
    Everything works fine but I am trying to figure out on how to translate dcm4chee-2 constraints into dcm4chee-arc 5 settings:
- Since, I have never worked on DCM4CHEE Archive  version 2, Gunter is the best person to answer this.
  

Screenshot_2019-02-19 dcm4chee-arc-ui.png

leogrande

unread,
Feb 19, 2019, 12:53:36 PM2/19/19
to dcm...@googlegroups.com
"...Indicates if the Storage is used as permanent (=PERMANENT), cache (=CACHE) or temporary (=TEMPORARY) storage. Objects get purged from cache and temporary storage according configured deleter thresholds or - if no deleter threshold is specified - all objects on the Storage will get purged. ..."

It sounds scary.

Please see an attachment in regard of ". Reject Entity for Data Retention Expiry is a field of Exporter Descriptor child object.."

I guess, I am missing something.

EDIT:

Yes, I know that my scenario is a little bit different from the original post. I need to have studies on ONLINE and NEARLINE storage. And I have already set up Exporter and Export rule to facilitate this requirement. I just want to keep studies on the ONLINE storage for 6 months but not based on the age of studies but rather on the base of "DeleteStudyIfNotAccessedFor" (dcm4chee-2). If it is not possible, I can live with the Retention Policy.

So PERMANENT duration means that it is not possible to delete studies from this storage, right?
What if I need to keep studies on the NEARLINE storage (PERMANENT) just for 7 years and purge them after this period?

I am a bit confused with this new duration stuff.

Just checked Study Retention Policy document and compared it with my dcm4chee-arc. Mine is missing half of fields.

On Tuesday, February 19, 2019 at 11:02:36 AM UTC-5, vrinda...@j4care.com wrote:
By default configurations, fs1 online storage is always set as PERMANENT. To fulfill his requirement (first post), I suggested the changes from PERMANENT to CACHE.

    maybe you can give some clarifications on how to use the current CACHE, PERMANENT storage in my case.

    "I can't find these settings:
      "Reject Entity for Data Retention Expiry may be left as it is as false (default value)"
      Add a Study Retention Policy : Ensure to set Export expired Study to Exporter ID of exporter created above.

    On which level they have to be set up, Archive or AE?"
- Again, this was mentioned in response to fulfill his requirement (first post). Study Retention Policy child object maybe added on Archive Device level (= Apply policy to objects received by any AE) or on Archive AE level (= Apply policy to objects received only by this AE). Indicates if the Storage is used as permanent (=PERMANENT), cache (=CACHE) or temporary (=TEMPORARY) storage. Objects get purged from cache and temporary storage according configured deleter thresholds or - if no deleter threshold is specified - all objects on the Storage will get purged. which can be configured only on Archive Device level. See attached screenshot which shows the list of child objects that can be configured on Archive Device level.
Screenshots.pdf

gunterze

unread,
Feb 20, 2019, 3:36:46 AM2/20/19
to dcm4che
Retention Policies does not consider study access time.
You may use threshold based deletion and dimensioning the ONLINE file system(s) to hold (at least) 7 year production.

gunterze

unread,
Feb 20, 2019, 3:41:52 AM2/20/19
to dcm4che
Correction:
You may use threshold based deletion and dimensioning the ONLINE file system(s) to hold (at least) 6 month production, and control permanent deletion from NEARLINE storage after 7 years by a Study Retention Policy.

gunterze

unread,
Feb 20, 2019, 4:41:00 AM2/20/19
to dcm4che
Opened Delete Studies not accessed a configurable time from ONLINE Storage #1854 as alternate to threshold based deletion. But I can't give a date when it will be implemented. If you are willing to pay you may contact J4Care - which pays my salary - to increase its priority.

vrinda...@j4care.com

unread,
Feb 20, 2019, 4:59:50 AM2/20/19
to dcm4che
   Please see an attachment in regard of ". Reject Entity for Data Retention Expiry is a field of Exporter Descriptor child object.."
   I guess, I am missing something.
  Just checked Study Retention Policy document and compared it with my dcm4chee-arc. Mine is missing half of fields.

- These fields are present in the latest code (which for now you may pull and install) and shall be part of 5.16.0.

leogrande

unread,
Feb 20, 2019, 10:04:25 AM2/20/19
to dcm4che
Do you mean that this code (dcm4chee-arc-5.15.1-psql-secure.zip) was fixed after 1-10-19? I have downloaded it on 1-10-19.

How about PERMANENT and study retention policy? Will studies be purged after 7 years expiration, or it can be done only on the CACHE storage?

If I understand it correctly, as of now, a deleter threshold is based on the Minimal Usable Space on Storage System not on the base of study age, right?


    control permanent deletion from NEARLINE storage after 7 years by a Study Retention Policy.

My NEARLINE storage is PERMANENT, will it work on a PERMANENT storage, or I need to switch both ONLINE and NEARLINE to make it work.


In my case I have already studies stored on both storages, so Study Retention policy will not work for me and I just need to switch to the original post scenario, when studies are either on ONLINE or on NEARLINE storage but not on both. (to make a study retention policy to work)
.

Probably I need to change my study store design from keeping studies on both storages (for 6 months) to above mentioned scenario (considering that my PACS is a AWS cloud based app).
A study speed retrieval from the local storage (in my case ec2 volume) is higher than from S3 bucket, that is why I kept most recent studies on both storages.

I have used this scenario on dcm4chee-2 and expected it to work on dcm4chee-arc-5.  Maybe
Delete Studies not accessed a configurable time from ONLINE Storage #1854 will work the same way as it was in dcm4chee-2.

gunterze

unread,
Feb 21, 2019, 3:23:54 AM2/21/19
to dcm4che
Delete Studies not accessed a configurable time from ONLINE Storage #1854 is only needed if you use a cloud storage as online storage, otherwise you can use the threshold based deletion of the least recently accessed study.

leogrande

unread,
Feb 21, 2019, 9:04:41 AM2/21/19
to dcm4che
I do not see any settings in Deleter Threshold that are based on the study age.

"...

Minimal Usable Space on Storage System to trigger deletion. If present, studies are deleted from the Storage System configured for cache (Storage Duration = CACHE) or temporary (Storage Duration = TEMPORARY) storage, if the usable space fall below that value. If absent all studies are deleted from cache/temporary storage. Format [nn’[‘<schedule>’]’]nnn(MB|GB|MiB|GiB).

(dcmDeleterThreshold) ..."


There is  Deletion of least recently accessed studies article, but I do not see any settings there that relate to the study age.


    is only needed if you use a cloud storage as online storage, otherwise you can use the threshold based deletion of the least recently accessed study.


Yes, I know. This is exactly what I need considering that I can't find any deleter based on the study age (besides Study Retention Policy).


Do I have to wait until 5.16.0 released to have all configuration settings mentioned in  this document? Once again,
dcm4chee-arc-5.15.1-psql-secure.zip downloaded on 1-10-19 is behind of that document.


Thank you.


I have hijacked this original post which I usually never do, so I am just shutting up and passively waiting for a resolution.


vrinda...@j4care.com

unread,
Feb 21, 2019, 10:24:18 AM2/21/19
to dcm4che
   There is  Deletion of least recently accessed studies article, but I do not see any settings there that relate to the study age.
- What do you mean by study age here? The wiki page talks about least recently accessed studies which means the studies whose access_time field in the database is oldest.

   Do I have to wait until 5.16.0 released to have all configuration settings mentioned in  this document? Once again,
   dcm4chee-arc-5.15.1-psql-secure.zip downloaded on 1-10-19 is behind of that document.

- The DICOM Conformance Statement is updated as and when any new LDAP attributes are added (on Device/Extensions/Child Objects) depending on the issues that are worked on.



Gunter Zeilinger

unread,
Feb 21, 2019, 10:24:24 AM2/21/19
to dcm...@googlegroups.com
The order of the deletion of studies is determined by the access time, deleting least recently accessed studies first. So the time a not accessed study is kept can be derived from the filesystem size and the daily amount of received objects.

--
You received this message because you are subscribed to the Google Groups "dcm4che" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dcm4che+u...@googlegroups.com.
To post to this group, send email to dcm...@googlegroups.com.
Visit this group at https://groups.google.com/group/dcm4che.
For more options, visit https://groups.google.com/d/optout.

leogrande

unread,
Feb 21, 2019, 8:01:00 PM2/21/19
to dcm4che
That article explains how to use Deleter Threshold to delete studies based on the access time (Study table). That is great. But It is triggered not by the access time but by the storage size.

I am talking about study deletion triggered by the access time and it is not just simple deletion but only if the study is also stored on the NEARLINE storage (and some other constraints).
dcm4chee-2 (sorry for mentioning this masterpiece of software development again, I am serious) has Deleter Threshold based on the storage size as well but it has also constraints based on the study access time which are configurable.

dcm4chee-arc-5 has Retantion Policy with Exporters which would probably do some workaround for me, but unfortunately my dcm4chee-arc-5.15.1-psql-secure doesn't have all necessary fields (mentioned in my previous posts) to properly accomplish it.

Gunter Zeilinger

unread,
Feb 22, 2019, 2:39:13 AM2/22/19
to dcm...@googlegroups.com
dcm4chee-arc 5 supports to constraint the deletion to Studies which are also stored on the NEARLINE storage. Tell me a use case were you want to delete studies based on last access date although you have still space left on the ONLINE storage.

leogrande

unread,
Apr 12, 2019, 4:36:52 PM4/12/19
to dcm4che
Sorry for the late reply.

HIPAA compliance requirement to keep studies for 6-7 years (NY). It doesn't make sense to keep studies forever, especially on AWS S3 (nearline storage, in my case).

On Friday, February 22, 2019 at 2:39:13 AM UTC-5, gunterze wrote:
dcm4chee-arc 5 supports to constraint the deletion to Studies which are also stored on the NEARLINE storage. Tell me a use case were you want to delete studies based on last access date although you have still space left on the ONLINE storage.

To unsubscribe from this group and stop receiving emails from it, send an email to dcm...@googlegroups.com.

gunterze

unread,
May 17, 2019, 9:08:02 AM5/17/19
to dcm4che
To unsubscribe from this group and stop receiving emails from it, send an email to dcm4che+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages