Overview handles more files now

40 views
Skip to first unread message

Adam Hooper

unread,
Apr 25, 2018, 6:27:58 PM4/25/18
to overview-users
Good news: Overview handles more file types.

For instance: Overview extracts zipfiles, treating them like folders.

Overview also handles JPEG and PNG images -- using OCR, if that's what you select.

It handles emails in "message/rfc822" format. (".eml" is the usual file extension.) Each attachment becomes a separate document. We're missing key features, though -- read on for workarounds.

Plus, experimentally, we extract Outlook PST files. Journalists who acquire PST files are heroes. Before today, that heroism led to sadness as they wondered how to probe their prize. Now it's easy. Hurrah! We're missing some features, though -- we'll get to that later.

Overview uses its existing "title" field to help you sift through documents. For instance, Overview might produce documents that look like "compressed.zip/sample.pst/Inbox/0001.eml/some-attachment.docx" from a zipped PST.

We also fixed longstanding bugs Overview had with HTML and plaintext files. Overview no longer blanks out the first page of HTML files. Text files render much more quickly; they're rendered in monospace; and Overview no longer wraps long lines during import.

The "One document per page" and "OCR" features work as you'd expect. For instance, if you choose "One document per page", then Overview will split a ".docx" attachment within an email within a PST.

This is live in overview-local and https://www.overviewdocs.com. If you're using overview-local, "./stop && ./update && ./start" to enable the new converters.

Want to hack Overview? Each converter is its own project. We'll happily help you build a new one -- in any programming language -- or alter the existing ones. See our GitHub repositories for code.

We've released early, so there are rough edges. Here are some annoyances and their workarounds:
  • During import of a zipfile containing 1,000 sub-files, Overview will report "0 files imported" throughout. That's because Overview only counts the zipfile (1), not the files within it (1,000). Workaround: trust the progress bar, not the text.
  • Overview hides email's "To", "From" and "Date" by default. The workaround:
    1. Open a document
    2. Open the "Fields" underneath
    3. "Organize Fields"
    4. Add "To", "From", "Date", "Reply-To", "Cc", "Bcc", and "Subject" fields -- capitalized exactly as I've capitalized them here.
      • (Don't get excited: "Bcc" is empty in all received emails. But it should appear in "Sent Items" folders.)
  • HTML emails won't include embedded images: embedded images will be separate documents, like attachments.
  • When Overview presents an email, it doesn't link to its attachments. There are two workarounds:
    • Search by document title. If an email has the title, "sample.pst/Inbox/0001.eml", its attachments all look like "sample.pst/Inbox/0001.eml/some-attachment.docx."
    • Add the "Message-ID" field and search by it. An email and its attachments have the same "Message-ID".
  • Overview doesn't handle email "threads". Workaround: search by "Subject". If the subject is "I'll be late," search 'Subject:"I'll be late"'. This will find all attachments in the thread, too. It works for most email threads.
    • Alternatively, enable the "References", "Message-ID" and "In-Reply-To" fields and use them in searches. Wielding these fields takes expertise, but it's more accurate than "Subject".
  • Overview's scheduling isn't fair: a big zipfile or PST can slow down other users' smaller imports. Don't feel guilty: that's our fault, and we can fix it.
  • overview-local will only convert one file of each type at a time.
This change affects everybody. All these new features may bring new bugs.

If your import is stuck at 0% for an hour or more, please email in...@overviewdocs.com so we can investigate.

If Overview doesn't convert a file properly, other users are probably suffering as well. Please email the file to in...@overviewdocs.com along with a brief description of the problem, so we can fix it for everybody. Better yet, please make your file public and add an issue on GitHub.

If Overview is too slow for your task, please email in...@overviewdocs.com and/or add an issue on GitHub. Describe what you're trying to do and what you expect to happen. Even if we can't fix your problem right away, you'll have added your voice: we'll keep you in mind in the future.

Enjoy life,
Adam
Reply all
Reply to author
Forward
0 new messages