Zotero API Pulling some PDF's but not others from the same collection

Mike Reid

unread,

Jun 10, 2024, 2:49:04 PMJun 10

to zotero-dev

Hi there,

I am trying to write a python program that will allow me to take all of the PDF's from a collection and collect them into a single folder for further processing. The issue that I'm running into is that for some of the PDF's it's working great, and for others it's not.

For example:
It pulls this file with no problems: /Users/mikereid/Zotero/storage/DVYBEVMK/Dorsey et al. - 2022 - Identifying service quality gaps between patients .pdf

but when it gets to

/Users/mikereid/Zotero/storage/QKNJLX27/Zullig et al. - 2016 - A Systematic Review of Conceptual Frameworks of Me.pdf

It throws the error

2024-06-10 10:50:47,975 - DEBUG - Starting new HTTPS connection (1): api.zotero.org:443
2024-06-10 10:50:48,126 - DEBUG - https://api.zotero.org:443 "GET /users/2667172/items/QKNJLX27/file HTTP/1.1" 404 9
2024-06-10 10:50:48,126 - ERROR - Error: Attachment QKNJLX27 not found at URL: https://api.zotero.org/users/2667172/items/QKNJLX27/file

Here is the code I am using to make the call:

def get_collection_items(collection_id, start=0, limit=100):
    url = f'{BASE_URL}/collections/{collection_id}/items?start={start}&limit={limit}'
    logger.debug(f"Fetching collection items from URL: {url}")
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    return response.json(), int(response.headers.get('Total-Results', 0))

def get_item_children(item_key):
    url = f'{BASE_URL}/items/{item_key}/children'
    logger.debug(f"Fetching children for item {item_key} from URL: {url}")
    response = requests.get(url, headers=headers)
    if response.status_code == 404:
        logger.error(f"Error: Parent item {item_key} not found.")
        return []
    response.raise_for_status()
    return response.json()

def download_pdf(attachment_key, filename):
    url = f'{BASE_URL}/items/{attachment_key}/file'
    logger.debug(f"Downloading PDF from URL: {url} with filename: {filename}")
    try:
        response = requests.get(url, headers=headers, stream=True)
    except requests.exceptions.RequestException as e:
        logger.error(f"Request failed for URL: {url} with error: {e}")
        return

    if response.status_code == 404:
        logger.error(f"Error: Attachment {attachment_key} not found at URL: {url}")
        return

    response.raise_for_status()

    file_path = os.path.join(OUTPUT_DIR, filename)
    logger.debug(f"Saving PDF to file path: {file_path}")
    with open(file_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    logger.info(f'Downloaded {filename}')

Any thoughts would be much appreciated.

Thanks so much

Mike

Dan Stillman

unread,

Jun 10, 2024, 3:47:36 PMJun 10

to zoter...@googlegroups.com

On 6/10/24 10:22 AM, Mike Reid wrote:
> 2024-06-10 10:50:48,126 - DEBUG - https://api.zotero.org:443 "GET
> /users/2667172/items/QKNJLX27/file HTTP/1.1" 404 9
> 2024-06-10 10:50:48,126 - ERROR - Error: Attachment QKNJLX27 not found
> at URL: https://api.zotero.org/users/2667172/items/QKNJLX27/file

This doesn't have anything to do with the API specifically. It's just a
file that's not available online. You're at your file-storage quota, and
you would be getting a warning about that on every sync on the device
where you added the file.

https://www.zotero.org/support/kb/files_not_syncing

Mike Reid

unread,

Jun 11, 2024, 11:43:35 AMJun 11

to zotero-dev

Thanks so much Dan, I feel a bit silly for not catching it myself, but I feel a bit silly with a path forward and I'll take that every time.
Basically I was getting caught up because whenever I looked on my local machine, I could see the clear path the the file that was sitting on my hard drive. I didn't put two and two together that by using the API I was ( now quite obviously) pulling the doc off the online library. I think it was largely because I wanted to use the API to generate a list of collections that i could pick from so i didn't have to process my entire library the whole time. I didn't make the connection that I would be pulling the PDF's through the API as well. What can I say, it's my first day.

Anyways, I upgraded my storage plan, synched, and all seems to be working now. Once again I really appreciate your help. Thank you.

Mike

Reply all

Reply to author

Forward