Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Descrambling PDF to Text conversion (long)

3 views
Skip to first unread message

N_Cook

unread,
May 12, 2010, 10:29:51 AM5/12/10
to
Just for future reference , is there a general principle concerning pdf to
text that can be applied.
Not relevant now as got the thing repaired but in the process I was looking
for the pinout of Sony CXD 3058AR and only found this reference.
From the example I had in front of me
Vss was pins 17,31,37,46,78,79 , easy to establish
112 pins in total

I'm assuming a pinout of the IC appeared buried in this gibberish ,
presumably something else mixed in there as was horizontal to the CXD one
(part
only)
http://www.scribd.com/doc/18318751/Sony-Hcdne3compact-Disc-Deck-Receiv


. IC Block Diagrams - BD Board - IC101 CXD3058AR
IOVSS0 XTACN WDCK SYSM WFCK LMUT
85

SCOR

EXCK SBSO SQCK SQSO

COUT

SENS

CLOK XLAT

DATA

XPCK

XUGF

XRST

C2PO

ATCK SCLK

112 111 110 109

108 107 106 105 104 103 102 101

100 99

98 97 96 95

94 93

92 91 90 89

88

87

86

Approx. 400 mVp-p

150 Vp-p

3 Vp-p
CPU INTERFACE

SERVO INTERFACE

13 �s
2 IC101 ws (FEI) (CD Play mode) qs Q344 (Collector) (REC mode)

30.5 �s
ws IC801 qh (CF2)

XTSL

C4M

VDD

VDD

GFS

VSS

84 RMUT 83 IOVDD0

82 AVDD2 D/A COVERTER LPF 81 AOUT2 80 VREFR 79 AVSS2 78 AVSS1 LPF 77 VREFL
76 AOUT1 75 AVDD1 74 XVDD CLOCK GENERATOR PWM GENERATOR DIGITAL CLV SERVO
AUTO SEQUENCER 73 XTAI 72 XTAO 71 XVSS

Approx. 200 mVp-p

15 Vp-p

2.9 Vp-p

MIRR 1 DFCT 2 FOK 3 VSS 4 LOCK 5 MDP 6 SSTP 7 IOVSS1 8

MIRR DFCT FOK

13 �s
3 IC101 ek (RFACO) (CD Play mode)

100 ns

SFDR SRDR TFDR TRDR FFDR FRDR

9 10 11 12 13 14

70 IOVSS2 69 TES1 68 TEST DIGITAL OUT 67 DOUT 66 IOVDD2

IOVDD1 15 AVDD0 16 AVSS0 17 SERVO DSP

0.6 Vp-p
A/D CONVERTER

65 EMPHI 64 EMPH

E F TEI TEO

18 19 20 21

63 VDD TE D/A INTERFACE SELECTOR ERROR CORRECTOR VC FE 32k RAM 62 BCK 61
PCMD 60 VSS 59 LRCK

4 IC101 us (XTAO) (CD Play mode)

FEI 22 FEO 23 VC 24

EFM DEMODULATOR

3.4 Vp-p

A B C D

25 26 27 28

SUM

SUB CODE PROCESSOR

59 ns

ASYNMMETRY CORRECTOR ATT EQ AMP

DIGITAL PLL DC/DC CONVERTER 58 LRCKI 57 PCMDI

APC

29 30 31 32

33 34

35

36 37 38

39 40 41 42 43 44 45 46 47 48 49 50

51

52 53 54 55

56

RFDCO

RFACO

DDVROUT DDVRSEN

EQ_IN

PDSENS

AC_SUM

AVDD4

AVSS4

RFACI

AVDD3

BIAS ASYI ASYO

VPCO VCTL

AVSS3

CLTV FILO FILI PCO

RFC

LD

AVDD5

AVSS5

DDCR

PD

28

28

BCKI

28
PREVCC

Robert Macy

unread,
May 12, 2010, 11:06:50 AM5/12/10
to
On May 12, 7:29 am, "N_Cook" <dive...@tcp.co.uk> wrote:
> Just for future reference , is there a general principle concerning pdf to
> text that can be applied.
> Not relevant now as got the thing repaired but in the process I was looking
> for the pinout of Sony CXD 3058AR and only found this reference.
> From the example I had in front of me
> Vss was pins 17,31,37,46,78,79 , easy to establish
> 112 pins in total
>
> I'm assuming a pinout of the IC appeared buried in this gibberish ,
> presumably something else mixed in there as was horizontal to the CXD one
...snip....

not sure but how about using a free pdf to doc/txt tool?

http://www.Free-PDF-to-Word.com

N_Cook

unread,
May 12, 2010, 11:58:14 AM5/12/10
to
Robert Macy <ma...@california.com> wrote in message
news:c60df179-31f6-411e...@g1g2000pro.googlegroups.com...

http://www.Free-PDF-to-Word.com

missing the point. Where there is only the text version out there, derived
from a not available pdf.

Trying to reverse engineer one I did pdf to text myself.
80 pin chip data on pdf with 1 to 24 pins L to R along bottom and vertical
text text read from right , then anticlockwise. The horizontal text
appearing in the horizontal on left and right edges appears as though
scanned from right to left, ie swapped over.
Also grey tone blocking of Vcc and Gnd maked them get lumped together when
appearing in the vertical script


hr(bob) hofmann@att.net

unread,
May 12, 2010, 8:34:00 PM5/12/10
to
On May 12, 10:58 am, "N_Cook" <dive...@tcp.co.uk> wrote:
> Robert Macy <m...@california.com> wrote in message

Did you try converting the text file to .DOC first? I have done that
a few times with some success.

Colin Horsley

unread,
May 12, 2010, 8:45:20 PM5/12/10
to
"N_Cook" <div...@tcp.co.uk> wrote in message
news:hsedss$521$1...@news.eternal-september.org...

Just for future reference , is there a general principle concerning pdf to
text that can be applied.
Not relevant now as got the thing repaired but in the process I was looking
for the pinout of Sony CXD 3058AR and only found this reference.
From the example I had in front of me
Vss was pins 17,31,37,46,78,79 , easy to establish
112 pins in total

I'm assuming a pinout of the IC appeared buried in this gibberish ,
presumably something else mixed in there as was horizontal to the CXD one
(part
only)
http://www.scribd.com/doc/18318751/Sony-Hcdne3compact-Disc-Deck-Receiv

_________________

Why do you need to convert it? You can download and read the whole pdf file
from that link.
The IC pinouts are quite readable.

Colin @ CATronics


N_Cook

unread,
May 13, 2010, 3:36:35 AM5/13/10
to
Colin Horsley <horsle...@westnet.com.au> wrote in message
news:tZKdnfwXkq4o1nbW...@westnet.com.au...

Not available to me without registering for more unsolicted junk , I had to
grab the Google-cached version. I tried proxies etc but no admittance

Just in case anyone has cottoned on to the intention of the thread. The
following is for an 80 pin device originally on good pdf graphic as
24/16/24/16 pins. Starting pin 1 lower left corner and anticlockwise.
Horizontal pinning with text vertical and Gnd text white in black block and
Vcc in grey blocks. This is as a straight listing
1 DISCON#
2 VCC
3 GND
4 CLK24
5 GND
6 GND
7 A0
8 A1
9 A2
10 A3
11 A4
12 A5
13 GND
14 GND
15 A6
16 A7
17 GND
18 AGND
19 XIN
20 XOUT
21 AVCC
22 VCC
23 GND
24 EA
25 RESET
26 A8
27 A9
28 A10
29 A11
30 PC0/RxD0
31 PC1/TxD0
32 PC2/INT0#
33 PC3/INT1#
34 A12
35 A13
36 A14
37 A15
38 PC4/T0
39 PC5/T1
40 PC6/WR#
41 PB7/T2out
42 VCC
43 GND
44 PB0/T2
45 PB1/T2EX
46 PB2/RxD1
47 PB3/TxD1
48 D0
49 D1
50 D2
51 D3
52 PB4/INT4
53 PB5/INT5#
54 PB6/INT6
55 PC7/RD#
56 GND
57 D4
58 D5
59 D6
60 D7
61 BKPT
62 VCC
63 GND
64 SDA
65 SCL
66 WAKEUP#
67 NC
68 PA0/T0out
69 PA1/T1out
70 PA2/OE#
71 PA3/CS#
72 GND
73 PA4/FWR#
74 PA5/FRD#
75 PA6/RXD0out
76 PA7/RXD1out
77 USBD-
78 GND
79 USBD+
80 PSEN#


and this the mangled version via Foxit text capture

80 PQFP
14x20mm
label in middle

SDA BKPT PB2/RxD1 PB1/T2EX PC7/RD#
GND VCC D7 D6 D5 D4 GND PB7/T2out PB6/INT6 PB5/INT5# PB4/INT4 D3 D2 D1 D0
PB3/TxD1 PB0/T2 GND VCC
60 59 58 57 56 55 54 53 52 51 50 49 48 46 45 44 43 42 4164 63 62 61
47
PC6/WR#SCL
4065
PC5/T1WAKEUP#
66
39
PC4/T0
NC
3867
PA0/T0out
A15
3768
PA1/T1out
A14
3669
PA2/OE#
A13
3570
PA3/CS#
A12
3471
80 PQFP
GND
PC3/INT1#
3372
PA4/FWR#
PC2/INT0#
3273
14x20mm
PA5/FRD#
PC1/TxD0
3174
PA6/RXD0out
PC0/RxD0
3075
PA7/RXD1out
A11
2976
A10
USBD-
2877
GND
A9
2778
USBD+
A8
2679
PSEN#
RESET
2580
123456789 11 15 23
10 12 13 14 16 17 19 20 21 22 24
18
T
A6 A7A0 A1 A2 A3 A4 A5
EA
XIN
VCCVCC
GND GND GNDGND GND GND GND
AVCC
XOU
AGND
CLK24
DISCON#

I could not find a correlation between the mangled order and word length or
word start or end position


Scott

unread,
May 13, 2010, 10:30:56 AM5/13/10
to

"N_Cook" <div...@tcp.co.uk> wrote in message
news:hsga2c$hmk$1...@news.eternal-september.org...
Not that it will help, but the (non)order is likely the order in which is
was composed, using individual blocks of text. And it reflects the order of
those layers within the PDF. The PDF contains positioning information for
vector objects (like blocks of text) which is stripped out when converted to
text only.
Scott
Dunedin, FL


Heather

unread,
May 13, 2010, 9:58:35 AM5/13/10
to
It is possible to get high quality Cartier watches at low price at
http://www.luxuryowner.net/

Featured Cartier Watches collection:
http://www.luxuryowner.net/replica-cartier-watches.html

Cheap Cartier Baignoire Replica Watches:
http://www.luxuryowner.net/replica-cartier-baignoire-watches.html

Cheap Cartier Tank Americaine Replica Watches:
http://www.luxuryowner.net/replica-cartier-tank-americaine-watches.html

Cheap Cartier Tank Divan Replica Watches:
http://www.luxuryowner.net/replica-cartier-tank-divan-watches.html

Cheap Cartier Tank Francaise Replica Watches:
http://www.luxuryowner.net/replica-cartier-tank-francaise-watches.html


"N_Cook" <div...@tcp.co.uk> д����Ϣ����:hsedss$521$1...@news.eternal-september.org...

N_Cook

unread,
May 13, 2010, 10:26:53 AM5/13/10
to
Scott <nos...@vool.con> wrote in message
news:4bebff87$0$5008$9a6e...@unlimited.newshosting.com...

>
> "N_Cook" <div...@tcp.co.uk> wrote in message
> news:hsga2c$hmk$1...@news.eternal-september.org...
> > Colin Horsley <horsle...@westnet.com.au> wrote in message
> > news:tZKdnfwXkq4o1nbW...@westnet.com.au...
> >> "N_Cook" <div...@tcp.co.uk> wrote in message
> >> news:hsedss$521$1...@news.eternal-september.org...
> >> Just for future reference , is there a general principle concerning pdf
> >> to
> >> text that can be applied.

> >
> >


> Not that it will help, but the (non)order is likely the order in which is
> was composed, using individual blocks of text. And it reflects the order
of
> those layers within the PDF. The PDF contains positioning information for
> vector objects (like blocks of text) which is stripped out when converted
to
> text only.
> Scott
> Dunedin, FL
>
>

It shows its probably a hiding to nothing. The same presumably applies to
highlighting just a single block of "vertical " text. It will not copy
across unscrambled. Both Foxit and Acrobat mangle that pdf-text to text
conversion, maybe somewhere there is a pdf reader +highlighting/ pdf to txt
app ,that does not mangle.

The pinouts 1 to 24 and 41 to 64 in the above table of 1 to 80 listing, I
had to descramble manually


IanM

unread,
May 14, 2010, 4:53:35 AM5/14/10
to
Bugmenot usually has a couple of active Scribd IDs .......

I *WONT* run Flash here, but I can assure you PDFs can be downloaded
from Scribd (if you have a valid ID & password) without it. Obviously
the preview doesn't work! ;-)

--
Ian Malcolm. London, ENGLAND. (NEWSGROUP REPLY PREFERRED)
ianm[at]the[dash]malcolms[dot]freeserve[dot]co[dot]uk
[at]=@, [dash]=- & [dot]=. *Warning* HTML & >32K emails --> NUL:

vjp...@at.biostrategist.dot.dot.com

unread,
May 29, 2010, 12:19:57 AM5/29/10
to
If the text is stored as ASCII you can use any text editor like emacs
to find the relevant text. I used to do this with WORD files in UNIX
as well until they started using RAM techniques, splattering the text
all over the place. THere is also a program on UNIX called antiword.
And there is a UNIX filtering technique which I forgot where you
remove any binary longer than two characters.


- = -
Vasos Panagiotopoulos, Columbia'81+, Reagan, Mozart, Pindus, BioStrategist
http://www.panix.com/~vjp2/vasos.htm
---{Nothing herein constitutes advice. Everything fully disclaimed.}---
[Homeland Security means private firearms not lazy obstructive guards]
[Urb sprawl confounds terror] [Phooey on GUI: Windows for subprime Bimbos]

vjp...@at.biostrategist.dot.dot.com

unread,
May 30, 2010, 12:20:24 AM5/30/10
to
strings -n2 x.pdf

Is the command to debinary a pdf file in unix

0 new messages