Pdf.js Changelog

0 views
Skip to first unread message

Lucretia

unread,
Aug 5, 2024, 3:25:50 AM8/5/24
to nalbroomonbou
Ive been trying for a few hours replacing a link-based pdf.js with an npm install of pdfjs-dist, since I noticed that my links were not meant to be used as cdns and could become unstable as described here.

I could not find much documentation on how to make that work other than a few examples, and when Webpack is involved they are mostly with React, while I am simply using ES6 in a Django framework (static compiling on the desired django directory, without using the webpack-plugin.)


Is caused by worker-loader loading NodeTargetPlugin, which in turn runs require("module") which I think (but I'm not 100%) is for native node modules, which when running Webpack targeted for web is not relevant


The issue I had was that Webpack packaged pdf.worker.js as an esModule (the default for worker-loader), so the way it was require'd needs to be unwrapped with the default property on the imported esModule (said another way, the instantiation would have to be new PdfjsWorker.default()


I was able to mitigate this with the NormalModuleReplacementPlugin plugin, which is able to re-write the require statement based on a regex match/replace, which is matching the original require string and replacing it with one that sets the worker-loader option esModule=false, followed by the absolute path to the pdf.worker.js file on the local system:


And lastly, if you want to ignore the warnings about the missing FetchCompileWasmPlugin and FetchCompileAsyncWasmPlugin modules, you can setup the webpack IgnorePlugin to just ignore these imports, my assumption is they're WASM based and not actually needed


Zotfile is a Zotero plugin to manage your attachments: automatically rename, move, and attach PDFs (or other files) to Zotero items, sync PDFs from your Zotero library to your (mobile) PDF reader (e.g. an iPad, Android tablet, etc.) and extract annotations from PDF files.


ZotFile can rename and add the most recently modified file from the Firefox download or a user specified folder as a new attachment to the currently selected Zotero item. It renames the file using metadata from the selected Zotero item (user configurable), and stores the file as a Zotero attachment to this item (or alternatively, moves it to a custom location).


The user can also select any number of Zotero items and automatically rename and move all attachments of these items according to the user defined rules using metadata of the respective zotero item (batch processing).


To read and annotate PDF attachments on your mobile device, zotfile can sync PDFs from your Zotero library to your (mobile) PDF reader (e.g. an iPad, Android tablet, etc.). Zotfile sends files to a location on your PC or Mac that syncs with your PDF reader App (PDF Expert, iAnnotate, GoodReader etc.), allows you to configure custom subfolders for easy access, and even extracts the annotations and highlighted text to Zotero notes when you get the files back from your tablet. Instructions are below.


After highlighting and annotating pdfs on your tablet (or with the PDF reader application on your computer), ZotFile can automatically extract the highlighted text and note annotations from the pdf. The extracted text is saved in a Zotero note. Thanks to Joe Devietti, this feature is now available on all platforms based on the pdf.js library.


To read and annotate PDF attachments on your mobile device, zotfile can sync PDFs from your Zotero library to your (mobile) PDF reader (e.g. an iPad, Android tablet, etc.). For this purpose, Zotfile sends files to a location on your PC or Mac that syncs with your PDF reader App (PDF Expert, iAnnotate, GoodReader etc.), and gets them back when you have finished reading them.


Set up a folder on your PC or Mac that syncs with your tablet reader application. Files that are copied to this folder should automatically appear in your PDF reader application. One possibility is Dropbox, which is free for up to 2GB of space and works with most PDF reader apps. More detailed instructions as well as alternative options should be available on the website of your PDF reader App.


All wildcards are now defined in the hidden preference zotfile.wildcards.default and can be changed by the user. But I strongly suggest that you do not change this preference. Instead, there is a second hidden preference zotfile.wildcards.user that allows you to add and overwrite wildcards (hidden preference can be changed in about:config). This is a preference is for advanced user without any error checking so be careful what you do! By default, zotfile.wildcards.user is set to so that no user wildcards are defined. Below is an example JSON that defines wildcards for %1, %2, %3, %4 illustrating all the possibilities:


ZotFile uses the specified field as an input string and then applies the transformations specified in operations. The value of field can either be the name of a Zotero field (see 1) or a javascript object with item type specific field names (see 2). operations is an array of javascript objects and supports three types of transformations that are identified by the function element:


exec: Search for matches using regular expressions (%3). Zotfile uses the exec() function based on the regular expression defined in regex, and returns the element specified in group so that 0 returns the matched text and higher values the corresponding capturing parenthesis. If the match fails, this operation returns the input data.


A complete list of Zotero fields is available here (dateModified and dateAdded seem to be missing from that list) and all the item types are here. The fields for each item type are listed on this page. Zotfile defines a number of additional fields that can be used as part of wildcards: itemType is the language specific item type, formatedTitle is the title formatted using the options defined in the zofile preferences, author is the author string formatted using the zotfile preferences, authorLastF is the author string formatted as LastnameF, and authorInitials are the initial of the authors.


Most importantly, validate your json. Check out zotfile.wildcards.default for more examples (see below). Finally, the JSON has to be reformatted to one line that can be pasted into the preference field in about:config. Here is the example from above:


The information in this file might not be up to date but you can look at the default wildcards and learn something about user-defined wildscards here. The minified version in one line is here so that you can copy it to zotfile.wildcards.default if you screw up.


In background mode (mode=1, default), zotfile leaves zotero attachments at their current location and moves a copy of the file to the tablet folder (set in the zotfile preference window) when they are send to the tablet. Getting the file back from the tablet replaces the zotero attachment file and removes it from the tablet folder. This mode is recommended when you sync attachment files in your zotero library across multiple computers or when you index your attachments.


The foreground mode (mode=2) moves the attachment file to the tablet folder and links to this location from zotero. In this mode there is always only one copy of the file. You can not, however, sync linked attachments to the zotero server.


By default, zotfile asks the user whether an attachment should be send to the tablet that is already on the tablet, which can be useful to move it to a different subfolder. This user confirmation can be disabled with this option.


.pdfExtraction.NoteHtmlTagStart, .pdfExtraction.NoteHtmlTagEnd, .pdfExtraction.HighlightHtmlTagStart, .pdfExtraction.HighlightHtmlTagEnd, .pdfExtraction.UnderlineHtmlTagStart, .pdfExtraction.UnderlineHtmlTagEnd


These options allow the user to fine-tune the formatting of the extracted PDF annotations in the zotero note. They define the opening and closing html tag for different types of annotations. The default settings format highlighted text from the pdf normally, note text in italics ( for start and for end), and underline underlined text ( for start and for end). The end options for note, highlight and underline have to be the closing tag for the corresponding start option.


The new URL format also supported by Zotero is zotero://open-pdf/library/items/[itemKey]?page=[page] for pdf attachments in the personal library and zotero://open-pdf/groups/[groupID]/items/[itemKey]?page=[page] for items group libraries.


This is not really a new feature but with two recent changes in Zotero (see this and this pull request), it became much more useful! Simply click on the link that is part of your extracted annotations, and zotfile will open the pdf on the page with the annotation. The feature now works on Windows as well (thanks to aurimasv) and I have added support for Skim on Mac. Check out the documentation for some more details.


This version includes four improvements for the extraction of annotations. First, the new version greatly improves the detection of correct spaces between words. Second, the extraction is now based on the most recent pdf.js version (here is my fork with the modified version of pdf.js used in zotfile). With this update, zotfile should work with more pdfs. Second, the extraction is now about 40-60% faster (depending on the pdf) thanks to some improvements in the extraction code. Third, the extraction now runs in the background so that Zotero is not blocked while annotations are extracted.


The right pane now includes a row with the current tablet status such as No for files that are not on the tablet or [Basefolder] for files that are in the tablet base-folder. Click on this information to change the tablet status and open or reveal the file on the tablet (very convenient because double-clicking on the attachment opens the imported zotero attachment and not the file on the tablet).


The extracted annotations now include a link that opens the pdf file on the corresponding page. For the extracted annotation "This is my text" (zotfile 2013: 4), zotfile 2013: 4 is a link that opens the pdf on the page with the annotation. Currently, this feature only works from reports (right-click on item and select generate report) but future version of Zotero might be able to open the links directly from the note (see discussion here and here)

3a8082e126
Reply all
Reply to author
Forward
0 new messages