OK, For those interested... First of all the issue of wrongly encoded PDFs has been bugging me for a (very) long time. The core of the problem lies in the way text is encoded in PDFs. Whereas regular file formats contain the character codes of the text, PDF contains only the Glyph IDs which directly refer to glyphs in the embedded font. These have no relationship to the underlying character codes. A well formed PDF will contain - for each embedded font - a mapping of glyphs to unicode characters, known as the 'toUnicode' map. The issue is that many (older) Hebrew fonts are not encoded using Unicode but use their own encoding scheme. There exist multiple encodings that various different font vendors have used for Hebrew. The program creating the document must be aware of the encoding and theoretically can create a correct mapping, if however a virtual printer is used, it has no way of knowing that the encoding is not Unicode and will create an incorrect 'toUnicode' entry for the font.
The solution is to replace this mapping with a correct mapping. Now, originally I planned on doing it manually, however some intensive googling found me some programs that can help. Firstly there is "axesPDF QuickFix" which recently added support for correcting 'toUnicode' maps (
https://www.axes4.com/axespdf-quickfix-features.html). There is also Infix Pro (
http://www.iceni.com/infix.htm) which has (also only recently) added that feature. Another helpful tool is the PDF debugger which is part of the PDFBox project, it shows a graphical representation of the mapping but doesn't have the ability to edit it.
Do you mean when viewing locally or from the web? If you are viewing locally, the file format is already optimized for that. For viewing from the web, the PDF must be linearized (AKA fast web view), this can be done using the free Adobe Reader (Edit->Preferences->Document->Save As optimizes for Fast Web View = true, and the save as).