Zotero API Pulling some PDF's but not others from the same collection

36 views
Skip to first unread message

Mike Reid

unread,
Jun 10, 2024, 2:49:04 PMJun 10
to zotero-dev
Hi there, 

I am trying to write a python program that will allow me to take all of the PDF's from a collection and collect them into a single folder for further processing. The issue that I'm running into is that for some of the PDF's it's working great, and for others it's not. 

For example: 
It pulls this file with no problems: /Users/mikereid/Zotero/storage/DVYBEVMK/Dorsey et al. - 2022 - Identifying service quality gaps between patients .pdf 

but when it gets to 

/Users/mikereid/Zotero/storage/QKNJLX27/Zullig et al. - 2016 - A Systematic Review of Conceptual Frameworks of Me.pdf

It throws the error 
2024-06-10 10:50:47,975 - DEBUG - Starting new HTTPS connection (1): api.zotero.org:443
2024-06-10 10:50:48,126 - DEBUG - https://api.zotero.org:443 "GET /users/2667172/items/QKNJLX27/file HTTP/1.1" 404 9
2024-06-10 10:50:48,126 - ERROR - Error: Attachment QKNJLX27 not found at URL: https://api.zotero.org/users/2667172/items/QKNJLX27/file

Here is the code I am using to make the call: 
def get_collection_items(collection_id, start=0, limit=100):
url = f'{BASE_URL}/collections/{collection_id}/items?start={start}&limit={limit}'
logger.debug(f"Fetching collection items from URL: {url}")
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.json(), int(response.headers.get('Total-Results', 0))

def get_item_children(item_key):
url = f'{BASE_URL}/items/{item_key}/children'
logger.debug(f"Fetching children for item {item_key} from URL: {url}")
response = requests.get(url, headers=headers)
if response.status_code == 404:
logger.error(f"Error: Parent item {item_key} not found.")
return []
response.raise_for_status()
return response.json()

def download_pdf(attachment_key, filename):
url = f'{BASE_URL}/items/{attachment_key}/file'
logger.debug(f"Downloading PDF from URL: {url} with filename: {filename}")
try:
response = requests.get(url, headers=headers, stream=True)
except requests.exceptions.RequestException as e:
logger.error(f"Request failed for URL: {url} with error: {e}")
return

if response.status_code == 404:
logger.error(f"Error: Attachment {attachment_key} not found at URL: {url}")
return

response.raise_for_status()

file_path = os.path.join(OUTPUT_DIR, filename)
logger.debug(f"Saving PDF to file path: {file_path}")
with open(file_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
logger.info(f'Downloaded {filename}')

Any thoughts would be much appreciated. 

Thanks so much 

Mike 

Dan Stillman

unread,
Jun 10, 2024, 3:47:36 PMJun 10
to zoter...@googlegroups.com
On 6/10/24 10:22 AM, Mike Reid wrote:
> 2024-06-10 10:50:48,126 - DEBUG - https://api.zotero.org:443 "GET
> /users/2667172/items/QKNJLX27/file HTTP/1.1" 404 9
> 2024-06-10 10:50:48,126 - ERROR - Error: Attachment QKNJLX27 not found
> at URL: https://api.zotero.org/users/2667172/items/QKNJLX27/file

This doesn't have anything to do with the API specifically. It's just a
file that's not available online. You're at your file-storage quota, and
you would be getting a warning about that on every sync on the device
where you added the file.

https://www.zotero.org/support/kb/files_not_syncing

Mike Reid

unread,
Jun 11, 2024, 11:43:35 AMJun 11
to zotero-dev

Thanks so much Dan, I feel a bit silly for not catching it myself, but I feel a bit silly with a path forward and I'll take that every time. 
Basically I was getting caught up because whenever I looked on my local machine, I could see the clear path the the file that was sitting on my hard drive. I didn't put two and two together that by using the API I was ( now quite obviously) pulling the doc off the online library. I think it was largely because I wanted to use the API to generate a list of collections that i could pick from so i didn't have to process my entire library the whole time. I didn't make the connection that I would be pulling the PDF's through the API as well. What can I say, it's my first day. 

Anyways, I upgraded my storage plan, synched,  and all seems to be working now. Once again I really appreciate your help. Thank you. 

Mike 
Reply all
Reply to author
Forward
0 new messages