Skip to first unread message

FEDESOFT S.L.U.

unread,
Nov 6, 2023, 7:24:52 AM11/6/23
to AtoM Users
Hello everyone,

After consulting the documentation and reviewing similar questions regarding permissions and PREMIS rights I'd like to illustrate my question with the following example:

Admin
  • A simple archival description has been created:
    • Fonds > Series > File/Item (digital object import, i.e. document.pdf)
  • Group > anonymous > All archival description:
    • All options are set to deny, except for 'read,' which is granted.
  • All premis act > Disallow permissions 
    • Master, Reference and Thumb -> uncheked
  • document.pdf > create new rights > Act/granted rights
    • Discover, display, replicate -> Disallow

Anonymous User
  • Can read the archival description and the pdf is displayed as in the following image:
document_example_anonymous.png



Everything is correct and as expected, accessing as admin or another user with permissions to view the document generates a url similar to the following:

https://domain.test/uploads/r/null/0/9/e/<long-hex-code>/document.pdf?token=<long-hex-code>

However, with that url, any other anonymous user can view the document. Is that the correct behaviour when the file is a text media type? that is, anyone can access if the url is provided.

version: 2.7.3 - 192

Thank you in advance for your help.


Dan Gillean

unread,
Nov 6, 2023, 2:36:13 PM11/6/23
to ica-ato...@googlegroups.com
Hi there, 

This could certainly be improved, but it is an edge case that applies only to PDFs. 

PDFs are the one exception to the various Permissions rules in AtoM to deny access to the master digital object. This was hardcoded early on in the application as a sort of workaround - in general. if you are adding PDFs to your public-facing catalog, you want users to read any text that might be found in the PDF. However, the reference image generated from PDFs is a JPG that is generally just the first page, and too small to be readable. For that reason, early ICA-AtoM developers decided that, rather than adding custom rules just for PDFs, they would hard-code an exception for digital objects identified as PDFs so that AtoM won't apply the usual permission restrictions. This allows users to still restrict access to other digital object masters using the permission settings, without rendering the PDFs completely unusable.

This is also part of the reason that we use hashing when determining the digital object directory structure and final path to the master - to help obfuscate it. To be able to access a PDF as a public user that has been restricted as you have done, you need someone with access to the master digital object to give you the URL directly - it would take a lot of effort to brute force that link, and random guessing alone will not do it. 

This means that the security risk is human - an insider with access helping to violate the established permissions by navigating to the master object, copying the URL, and giving it to someone who cannot log in themselves. We could certainly look to improve this in the future and ensure that the master PDF is fully restricted when there are also PREMIS Rights applied, etc - but in this theoretical security breach, why wouldn't the insider simply share their login credentials, or download the PDF themselves and give it to the public user, or help in some other way, etc...?

In any case: though low risk, I do agree that this should be fixed. While hardcoding the permissions to allow for the PDF exclusion is understandable (if not ideal) from an historical development perspective, it would be good to patch the case where specific restrictions via PREMIS Rights are applied, as this should not have the same exception. 

I would also personally love to see the hard-coded rule about PDFs and permissions changed to a user-configurable global setting, as this seems a manageable compromise between the complexity of adding even more granular permissions to an already-overloaded module in AtoM needing an overhaul, and doing nothing. This would allow users with particularly strict security requirements to consistently enforce the digital object permissions for all object types, while still allowing others who want to configure permissions globally but still allow PDFs to be readable to maintain their current configurations. 

In the short term, I have let our Maintainers know about this edge case, so they can determine how they would like to proceed. 

Cheers, 
 
Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/55a1b343-dbd5-45c8-bfc6-72d32821a859n%40googlegroups.com.

Abelardo Torres

unread,
Apr 22, 2024, 12:55:17 PMApr 22
to AtoM Users

Hello everyone.


In my case, the problem is that if the master files are PDF, anonymous users can already click to access the master file in the reference copy of the item. I have detected that in version 2.6.4 it worked fine and in version 2.7.0 it stopped working. If a master file is, for example, JPG or PNG, there is no security problem.

Is it the same problem or would it be a different problem?

Thank you so much.

Dan Gillean

unread,
Apr 23, 2024, 9:21:45 AMApr 23
to ica-ato...@googlegroups.com
Hi Abelardo, 

I think this sounds like the same issue as I described in my previous reply. There is a mention of it in the introduction to the Upload Digital Objects documentation page: 
atom-pdf-permissions-warning.png
If you want your end users to be able to read the PDF, they will need to access the file - the thumbnail provided is too small and low-resolution, and will only include the first page. This is why there are different rules in place. However, I understand that this is not always preferable, and I have let the Maintainers know about this feedback, so they can consider making it a configurable setting in the future. 

In the meantime, you can apply PREMIS rights to your descriptions with PDFs to restrict access - see the Rights documentation; particularly this section: 
There are also other workarounds, such as: 
  • Don't upload a PDF at all, or create a Draft duplicate description with the PDF, so that public users never see the Draft record (or the PDF)
  • Try editing the digital object metadata and change the Media type from Text to something else - perhaps "Other." I haven't tested this myself, but it may change the behavior so that the usual permissions are applied and users cannot click through anymore. Note that it will affect the ability to filter search results by media type
  • etc
Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Abelardo Torres

unread,
Apr 25, 2024, 9:57:25 AMApr 25
to AtoM Users

Thank you very much, Dan.

I have tried applying PREMIS rights to my descriptions with PDFs to restrict access  , but it didn't work.

What has worked has been to change the digital object metadata changing the Media type from "Text" to "Other" and that has worked for me.

I already have thousands of PDF files loaded, so I can't change the type of them one by one. Do you know if there is a way to do it automatically for all of them?

Thank you very much again for your help

Dan Gillean

unread,
Apr 25, 2024, 10:41:15 AMApr 25
to ica-ato...@googlegroups.com
Hi Abelardo, 

It's possible we could do this via SQL. Please keep in mind that I have not tested this myself, so you will definitely want to back up your data, and proceed at your own risk

First, database backup: 
General information on how to access the MySQL command prompt: 
  • DESCRIBE digital_object;
+---------------+---------------+------+-----+---------+-------+
| Field         | Type          | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| id            | int           | NO   | PRI | NULL    |       |
| object_id     | int           | YES  | MUL | NULL    |       |
| usage_id      | int           | YES  | MUL | NULL    |       |
| language      | varchar(50)   | YES  |     | NULL    |       |
| mime_type     | varchar(255)  | YES  |     | NULL    |       |
| media_type_id | int           | YES  | MUL | NULL    |       |
| name          | varchar(1024) | NO   |     | NULL    |       |
| path          | varchar(1024) | NO   | MUL | NULL    |       |
| sequence      | int           | YES  |     | NULL    |       |
| byte_size     | bigint        | YES  |     | NULL    |       |
| checksum      | varchar(255)  | YES  |     | NULL    |       |
| checksum_type | varchar(50)   | YES  |     | NULL    |       |
| parent_id     | int           | YES  | MUL | NULL    |       |
+---------------+---------------+------+-----+---------+-------+
13 rows in set (0.01 sec)


We want to find those records that have a Media type of "Text" and change that to "Other." However, we can tell just by the field name that AtoM uses an ID, not a string, to set the media type. So first, we need to figure out what ID's correspond to what media type. 

These are likely stored in an internal taxonomy - we can get a list of all taxonomies and their IDs with: 
This will spit out a big 2-column table. I can see that in my instance, the Media Types taxonomy has an ID of 46. You should run this query yourself just to confirm that the ID is the same in your installation (it should be, but it's good to double-check).

I can now use that Taxonomy ID to look up the relevant taxonomy values: 
+-----+-------+
| id  | name  |
+-----+-------+
| 135 | Audio |
| 136 | Image |
| 137 | Text  |
| 138 | Video |
| 139 | Other |
+-----+-------+
5 rows in set (0.01 sec)


Ok, now we are into the experimental part, where we are actually going to apply an update. We want to change all media_type_id values of 137 to 139, in the digital_object table. We want to target ONLY PDFs, so perhaps we can use the mime type. I tried the following select query so I could see how some PDF mime types might look: 
  • SELECT mime_type, name FROM digital_object WHERE media_type_id='137' LIMIT 10;
+-----------------+-------------------+
| mime_type       | name              |
+-----------------+-------------------+
| application/pdf | 015-.1-3-1-1.pdf  |
| application/pdf | 015-.2-1-1.pdf    |
| application/pdf | 015-.2-1-2.pdf    |
| application/pdf | 015-2-1.pdf       |
| application/pdf | 016-.1-2-1-1_.pdf |
| application/pdf | 016-.1-2-2-1_.pdf |
| application/pdf | 016-.1-2-3-1.pdf  |
| application/pdf | 016-.1-2-3-2.pdf  |
| application/pdf | 016-.1-2-1-2.pdf  |
| application/pdf | 016-.1-5.pdf      |
+-----------------+-------------------+
10 rows in set (0.00 sec)



So, it's "application/pdf". Let's try using that to limit our query, and see if we can update everything now: 
  • UPDATE digital_object SET media_type_id='139' WHERE media_type_id='137' AND mime_type='application/pdf';
Hopefully that should do it! 

You will want to clear the cache, restart PHP-FPM, build the nested set, and repopulate the search index after making these changes. Links and further instructions for all these task can be found from our Troubleshooting page: 
Hope that does it! Let us know how it goes. 

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


Abelardo Torres

unread,
Apr 26, 2024, 12:27:25 PMApr 26
to AtoM Users
This solution could work.

Thank you very much, Dan for your help.
Reply all
Reply to author
Forward
0 new messages