formatting strings with italics/bold/etc

1,323 views
Skip to first unread message

Nathan Van Gheem

unread,
Dec 7, 2010, 4:08:09 PM12/7/10
to python-excel
Hello All,

I'm in need of being able to render inline italics and bold
formatting.

The readme at http://www.lexicon.net/sjmachin/xlrd.html mentions it
not being available for xlrd 0.6.1 in the "Formatting features not
included in xlrd version 0.6.1" section. Is this the case for 0.7.1
also? I assume it is because I can't seem to figure out how to do it.

If this is indeed not possible right now, how much would it take to
add this functionality? I would be willing to contribute if it seemed
like a worthy undertaking.


Thanks,
Nathan

John Machin

unread,
Dec 7, 2010, 5:25:42 PM12/7/10
to python...@googlegroups.com
On Wed, December 8, 2010 8:08 am, Nathan Van Gheem wrote:
> Hello All,

Hi Nathan,

>
> I'm in need of being able to render inline italics and bold
> formatting.
>
> The readme at http://www.lexicon.net/sjmachin/xlrd.html mentions it
> not being available for xlrd 0.6.1 in the "Formatting features not
> included in xlrd version 0.6.1" section. Is this the case for 0.7.1
> also? I assume it is because I can't seem to figure out how to do it.

It is not included. You are the first to ask, IIRC.

> If this is indeed not possible right now, how much would it take to
> add this functionality? I would be willing to contribute if it seemed
> like a worthy undertaking.

Let's start by roughing out the API:

Sheet object: would need a dictionary rich_text_sstx_map which would, for
each cell that had rich text, map (rowx, colx) to SST (shared string
table) index.

Book object: would need a dictionary rich_text_runlist_map which would,
for each string in the SST that had rich text, map SST index to the
corresponding list of formatting runs.

What is a "list of formatting runs"? Here's an example: Suppose you had a
string "plainbolditalic" formatted as the contents suggest.

plainbolditalic
012345678901234

The list of formatting runs would be:
[
(5, # offset where bold formatting starts
a_non_negative_integer), # an index into Book.font_list
(9, # offset where italic formatting starts
a_non_negative_integer), # an index into Book.font_list
]

Note that a Font object has many attributes besides bold and italic ...
what you do with these is up to you.

So for a given cell, you'd need something like this:

sstx = sheet.rich_text_sstx_map.get((rowx, colx))
if sstx is not None:
run_list = book.rich_text_runlist_map[sstx]
for offset, fontx in runlist:
font = book.font_list[fontx]
render()

Note that if (as in the example) the first run doesn't start with offset
0, you would need to fill in the gap using the "normal" font for the cell:

xf_index = sheet.xf_index(rowx, colx)
xf = book.xf_list[xf_index]
run_list[0:0] = (0, xf.font_index)

You might find it easier to work with the run_list if you appended
(len(the_text), None) to it ...

Comments, please.

Cheers,
John

Nathan Van Gheem

unread,
Dec 7, 2010, 5:43:59 PM12/7/10
to python...@googlegroups.com
Hi John,

Great! Thanks for the head start.

Is the xlrd source anywhere accessible so I could maybe do a branch for this?

I think what you're saying makes sense. That was kind of how I assumed
it would end up working since that is congruent with the way the rest
of the package works with fonts, formats, etc.

What was odd to me was when I was stepping through the code, I
couldn't seem to find the place where those styles were getting
stripped out. But I didn't dig too deep into it all. Would this just
be a different record code that is being used for inline fonts?

Should we continue discussing this on the list from here on out or
should I email you personally if I have further questions?


Thanks,
Nathan

> --
> You received this message because you are subscribed to the Google Groups "python-excel" group.
> To post to this group, send an email to python...@googlegroups.com.
> To unsubscribe from this group, send email to python-excel...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/python-excel?hl=en-GB.
>
>

Chris Withers

unread,
Dec 7, 2010, 6:58:23 PM12/7/10
to python...@googlegroups.com, Nathan Van Gheem
On 07/12/2010 22:43, Nathan Van Gheem wrote:
> Is the xlrd source anywhere accessible so I could maybe do a branch for this?

https://secure.simplistix.co.uk/svn/xlrd/trunk/

...but no, I'm afraid you're not getting commit rights at this stage.
Please work up a patch against the trunk.

> What was odd to me was when I was stepping through the code, I
> couldn't seem to find the place where those styles were getting
> stripped out.

It's not a case of stripping anything out, it's a case of writing new
code to parse record types that aren't currently parsed.

> Should we continue discussing this on the list from here on out or
> should I email you personally if I have further questions?

On list please!

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

John Machin

unread,
Dec 7, 2010, 7:42:49 PM12/7/10
to python...@googlegroups.com
On Wed, December 8, 2010 9:43 am, Nathan Van Gheem wrote:
> Hi John,
>
> Great! Thanks for the head start.

It's not a head start. Discussion of the API and agreement of what the
product is to be are essential prerequisites.

Other requirements:

1. The additional data structures are to be built only if formatting_info
is true.

2. The code must run on Python 2.1 to 2.7 inclusive.

3. No tabs, no camelCase, PEP8 rules about spaces next to punctuation, ...

John Machin

unread,
Dec 7, 2010, 8:12:41 PM12/7/10
to python...@googlegroups.com
On Wed, December 8, 2010 9:43 am, Nathan Van Gheem wrote:
> Hi John,

Hi. Sorry about the previous message -- I hit send by accident. Ignore it,
its content is repeated here.

>
> Great! Thanks for the head start.
>

It's not a head start. Discussion of the API and agreement of what the


product is to be are essential prerequisites.

Other requirements:

1. The additional data structures are to be built only if formatting_info
is true.

2. The code must run on Python 2.1 to 2.7 inclusive.

3. No tabs, no camelCase, PEP8 rules about spaces next to punctuation, ...

> Is the xlrd source anywhere accessible so I could maybe do a branch for
> this?

See Chris's answer :-)


> What was odd to me was when I was stepping through the code, I
> couldn't seem to find the place where those styles were getting
> stripped out. But I didn't dig too deep into it all. Would this just
> be a different record code that is being used for inline fonts?

No. Where it's done is in __init__.py in the function unpack_SST_table,
which is called by the Book method handle_sst ...

Please note that this is critical code, and is complicated by the need to
handle the SST being split over an SST record and 0 or more CONTINUE
records (a legacy of the days when 640Kb of memory was all anyone would
ever need) and the arcane rules about what data can be split where. Care
in coding and lots of testing on both real-world files and try-to-break-it
files are indicated.

You will note that there is a mild kludge with the variable rtsz in that
it holds the size of the rich text runs being skipped AND the size of the
phonetic (Furigana etc) stuff being skipped. You will need to unwind the
kludge by introducing a separate phosz variable.

> Should we continue discussing this on the list from here on out or
> should I email you personally if I have further questions?

On-list, please.

Cheers,
John

Nathan Van Gheem

unread,
Dec 10, 2010, 2:43:14 PM12/10/10
to python...@googlegroups.com
Hello,

Alright, I've got some time to put into this. The hardest part is just wrapping your head about the weird way the data is stored.

I have it working now basically how you described earlier. I figured we'd get something to work with before we spent too much time finalizing the api.

I've attached an svn diff along with a test file that is used with the tests I created for it.

Basically it looks like this, assuming you already have a Sheet and Book object:

>>> for offset, fontindex in sheet.rich_text_sstx_map.get((8,0)):
>>>         font = book.font_list[fontindex]

So right now the user is completely responsible for rendering it differently which is fine for me. 

I added a rich_text_runlist_map attribute to the Book class and a rich_text_sstx_map attribute to the Sheet class which does all of it.


What would be useful though is to have an html renderer which could look like this:

>>> from xlrd.renderers import html_cell_renderer
>>> html_cell_renderer(sheet, cell)

Which could translate: "some italic text" into "some <i>italic</i> text"

If people thought this sort of thing was useful we could add other renderers even(ReST, etc). If you do not accept that as part of the package, it'd probably just be something I'd do myself anyways.


Thanks,
Nathan


Cheers,
John

inline-styles.diff
inlinestyles.xls

Nathan Van Gheem

unread,
Dec 16, 2010, 4:00:17 PM12/16/10
to python...@googlegroups.com
Is there any reason that there has been no response with this?

John Machin

unread,
Dec 16, 2010, 5:15:27 PM12/16/10
to python...@googlegroups.com
On Fri, December 17, 2010 8:00 am, Nathan Van Gheem wrote:
> Is there any reason that there has been no response with this?
>
>
> On Fri, Dec 10, 2010 at 1:43 PM, Nathan Van Gheem
> <vang...@gmail.com>wrote:
>

Yes. I've been thinking about it.

Cheers,
John

John Machin

unread,
Dec 17, 2010, 8:31:19 PM12/17/10
to python-excel


On Dec 11, 6:43 am, Nathan Van Gheem <vangh...@gmail.com> wrote:
> Hello,
>
> Alright, I've got some time to put into this. The hardest part is just
> wrapping your head about the weird way the data is stored.
>
> I have it working now basically how you described earlier. I figured we'd
> get something to work with before we spent too much time finalizing the api.

The contents and semantics of the sheet-level data structure ARE the
API, hence the request for comments rather than code.

Treating your code as comments: I had proposed a book-level mapping
from sstx to runlist, and a sheet-level mapping from (rowx, colx) to
sstx. Your code has the sheet-level gadget mapping directly from
(rowx, colx) to runlist; *this is much better than my proposal*.
However this means that it needs to be renamed from rich_text_sstx_map
to rich_text_runlist_map.

>
> I've attached an svn diff along with a test file that is used with the tests
> I created for it.
>
> Basically it looks like this, assuming you already have a Sheet and Book
> object:
>
> >>> for offset, fontindex in sheet.rich_text_sstx_map.get((8,0)):
> >>>         font = book.font_list[fontindex]

Let's look at what sheet.rich_text_runlist_map.get((row, colx)) will
return:

1. (rowx, colx) out of bounds: None
2. cell type is not XL_CELL_TEXT: None
3. cell type is XL_CELL_TEXT, as the result of a formula: None
Note: formula results are not entered/enterable in the SST, and Excel
appears not to preserve rich text during formula evaluations. Simple
example: enter some rich text in cell A1, and =A1 in cell A2. Result:
A2 displays plain text.
4. cell type is XL_CELL_TEXT, resulting from an SST entry with no rich
text: an empty list
5. cell_type is XL_CELL_TEXT, resulting from an SST entry with rich
text: a runlist

Scenario 3 above presents a problem: users may assume that if the cell
type is XL_CELL_TEXT then there should be an entry in
Sheet.rich_text_runlist_map.

Scenario 4 presents a problem of memory waste ... my expectation is
that in a typical moderately large file, the percentage of text cells
that actually have associated rich text will be rather small. For all
the others, the cost of a tuple in the key and a (non-unique) empty
list in the value (36 + 36 bytes on a 32-bit Python) plus the space
occupied in the dict doesn't seem well spent.

I'm strongly suggesting that both the sheet-level mapping and the book-
level mapping are sparse i.e. there's an entry if and only if there's
a non-empty runlist associated with the key. This will of course be
documented for the sheet-level mapping; this has the side effect of
removing any concern about scenario 3. Not that in any case if we
wanted to fully populate the Book-level "mapping" it would be much
better as a list (the key is the SST index) just like
Book._sharedstrings.

Other issues:

A. Needs in-code documentation of Sheet.rich_text_runlist_map. Should
point out that it's a sparse mapping. Explain runlist.

B. Book.rich_text_runlist_map (1) needs a _ at the start of the name
(2) could usefully be deleted along with Book._sharedstrings when that
is deleted in some circumstances ... look for "del
self._sharedstrings" in __init__.py

C. Patch has extraneous empty lines and trailing spaces is a few cases

D. Please don't invent new jargon like "inlineStyles" when it's of
dubious interpretation (they're inline font objects, not styles) and
there is existing jargon ("rich text") anyway.

E. Unit tests are fine as far as they go. We also need some code to do
a thrash test that has a high chance of detecting problems with
continue records. This will be aided by the newly-in-svn xlwt code for
writing rich text. I would also like to see a round-trip check of rich
text driven by a text file containing test specs in xlwt.easyxf
format, to be used to create an xls file with xlwt, which can then be
read back using xlrd ... this would need a converter from xlrd.Font
object to xlwt.easyxf text. Any volunteers to help with this?

>
> So right now the user is completely responsible for rendering it differently
> which is fine for me.
>
> I added a rich_text_runlist_map attribute to the Book class and
> a rich_text_sstx_map attribute to the Sheet class which does all of it.
>
> What would be useful though is to have an html renderer which could look
> like this:
>
> >>> from xlrd.renderers import html_cell_renderer
> >>> html_cell_renderer(sheet, cell)
>
> Which could translate: "some italic text" into "some <i>italic</i> text"

Sounds OK in principle. Of course if the whole cell was in italic
(i.e. not inline rich text), you'd need to generate "<i>some italic
text</i>"

> If people thought this sort of thing was useful we could add other renderers
> even(ReST, etc). If you do not accept that as part of the package, it'd
> probably just be something I'd do myself anyways.

Let's get this show on the road first :-)

Cheers,
John

Nathan Van Gheem

unread,
Dec 17, 2010, 9:04:55 PM12/17/10
to python...@googlegroups.com
Hello,
 
> Hello,
>
> Alright, I've got some time to put into this. The hardest part is just
> wrapping your head about the weird way the data is stored.
>
> I have it working now basically how you described earlier. I figured we'd
> get something to work with before we spent too much time finalizing the api.

The contents and semantics of the sheet-level data structure ARE the
API, hence the request for comments rather than code.
Understood. I guess it just came down to me not being comfortable enough with the library to feel like I could offer more to the discussion than what you already have and if you didn't end up liking it, it's easy to change... Also, I'm strapped on time so I kind of needed to get something together internally for now at least...

I appreciate the time and thought you've put into your comments.

I will address your points and post back to the list as soon as I can.


Thanks,
Nathan


Cheers,
John

Nathan Van Gheem

unread,
Dec 28, 2010, 11:52:35 AM12/28/10
to python...@googlegroups.com
Hello Again,

Sorry for the wait... I've attached a diff again.

However this means that it needs to be renamed from rich_text_sstx_map
to rich_text_runlist_map.
Done.

I'm strongly suggesting that both the sheet-level mapping and the book-
level mapping are sparse
It now only adds to the dictionary if there is rich text. So any call to sheet.rich_text_runlist_map.get((row, col)) that doesn't exist will return None and we're not wasting memory.

A. Needs in-code documentation of Sheet.rich_text_runlist_map. Should
point out that it's a sparse mapping. Explain runlist.
Done.

B. Book.rich_text_runlist_map (1) needs a _ at the start of the name
(2) could usefully be deleted along with Book._sharedstrings when that
is deleted in some circumstances ... look for "del
self._sharedstrings" in __init__.py
Done and Done.

C. Patch has extraneous empty lines and trailing spaces is a few cases
Should be cleaned up now.

D. Please don't invent new jargon like "inlineStyles" when it's of
dubious interpretation (they're inline font objects, not styles) and
there is existing jargon ("rich text") anyway.
Agreed. I have renamed all the tests referring to it that way also.

E. Unit tests are fine as far as they go. We also need some code to do
a thrash test that has a high chance of detecting problems with
continue records. This will be aided by the newly-in-svn xlwt code for
writing rich text. I would also like to see a round-trip check of rich
text driven by a text file containing test specs in xlwt.easyxf
format, to be used to create an xls file with xlwt, which can then be
read back using xlrd ... this would need a converter from xlrd.Font
object to xlwt.easyxf text. Any volunteers to help with this?
Do you have a set of xls files on the svn anywhere that are a good test? I have added a test for a cell that contains a very long string, but I'm just doing this with open office and I'd feel better about it if there were a standard set of test files to play against. If there isn't one, I could try and round one up, creating files in various excel compatible software with lots of nasty data for tests if you'd like.


Thanks for your help,
Nathan
xlrd.diff
richtext.xls

John Machin

unread,
Dec 28, 2010, 9:22:23 PM12/28/10
to python-excel


On Dec 29, 3:52 am, Nathan Van Gheem <vangh...@gmail.com> wrote:
> Hello Again,
>
> Sorry for the wait... I've attached a diff again.

So I see. I'm sorry that I was not more explictit. In response to your
mentioning an HTML renderer, I said "Let's get this show on the road
first :-)", "this show" meaning the basic rich text functionality.
That was not intended to be an invitation to include code for
including your renderer into xlrd in the patch, and I find it
difficult to imagine how you could construe it otherwise.

Treating it as a request for code review:

1. Using (row, col) instead of (rowx, colx) or (row_index, col_index)
is severely deprecated.

2. Your to_html function has several problems:

+html_tags = {
### This should be a tuple of tuples, no dict functionality is
required
+ 'b' : lambda font: bool(font.bold or font.weight >= 700),
### using bool is (1) redundant (2) not consistent with underline (see
below) (3) fails with Python 2.1
+ 'i' : lambda font: font.italic,
### text constants should be unicode constants
### so that (1) decoding is done at compile time (2) I don't have to
thnk about it when doing
### the Python 3.x port
+ 'u' : lambda font: font.underlined or font.underline_type > 0,
+ 'sup' : lambda font: font.escapement_type == 1,
+ 'sub' : lambda font: font.escapement_type == 2
+}
+
+def to_html(string, font):
+ if not font:
+ return string
### don't shadow built_in module names
### what about a swift exit if "string" is empty? otherwise you get
e.g. "<b></b>"
+ ### extraneous blank line (first of many)
+ for tag, func in html_tags.items(): ### if a dict were actually
required, this should be iteritems
+ if func(font):
+ string = "<%(tag)s>%(string)s</%(tag)s>" % locals()
### I find that
### strg = u"<%s>%s</%s>" % (tag, strg, tag)
### is somewhat more readable
+
+ return string

3. Your html_renderer function has several problems:

(a) if not runlist or len(runlist) == 0:
return value

At this stage, runlist should be either None or a list (or maybe a
tuple); if a list etc, it should not be empty (sparse, remember) but
in any case "not []" is treated as true; I can't imagine a case where
"or len(runlist) == 0" would be required. Secondly, the "value"
should be formatted according to the "cellfont"; there's no guarantee
that "cellfont" doesn't require any markup.

(b) if runlist[0][0] != 0:

The != 0 is not necessary. If this is true, the code mucks about
getting "font" which AFAICT is the same as the "cellfont" it already
has (and if it's not the same, it should be!).

(c) don't use "range(...)" for iteration; 2to3 changes that to
list(range(...)) which is an extra function lookup and call.

(d) Don't you need to escape characters like <, >, and & ??

(e) The double loop is rather baroque.

Attached file has suggested replacement function.
I have lots of nasty xls files. Many of them can't be put in a public
repository because of NDAs. Some of them were created not with Excel,
OOo Calc, Gnumeric or the apache.jakarta.poi.etc.etc thingy (the
authors of all of which know what they are doing most of the time) but
with some unknown wet-Sunday-afternoon works-on-my-machine kit. I
doubt whether any of them have any rich text in them. When I get your
patch integrated, I'll be writing a script to scan my whole pathology
museum checking for rich text.

My main concerns with rich text is that the changes do not affect
reading files with no rich text or reading files with rich text where
the richness is of no interest to the reader, and there are no
problems caused by the CONTINUE records. Hence the mention of a
"thrash test". Ervin Hegedüs provided one for xlwt; it concentrated on
ensuring that current xlrd could read back the data correctly,
ignoring the rich text runlists. Now we need one that will do that and
check that the rich text runlists match what xlwt was supposed to
write. I shall do that myself.

>
> Thanks for your help,

Likewise.

What I'm going to do now/soon is to put the actual source code changes
(i.e. not the renderer stuff) in and do a few tests myself.

Cheers,
John

John Machin

unread,
Dec 28, 2010, 10:37:59 PM12/28/10
to python-excel
On Dec 29, 1:22 pm, John Machin <sjmac...@lexicon.net> wrote:
> 3. Your html_renderer function has several problems:
>
>
> Attached file has suggested replacement function.

... better late than never :-) It's been uploaded to the group files
area.

Cheers,

John

Nathan Van Gheem

unread,
Dec 28, 2010, 11:08:22 PM12/28/10
to python...@googlegroups.com
Hi Jon,

 
So I see. I'm sorry that I was not more explictit. In response to your
mentioning an HTML renderer, I said "Let's get this show on the road
first :-)", "this show" meaning the basic rich text functionality.
That was not intended to be an invitation to include code for
including your renderer into xlrd in the patch, and I find it
difficult to imagine how you could construe it otherwise.
Right. I had mentioned I would end up doing it anyway as I needed it for a client. I didn't misunderstand you--I just needed it done. You could have just ignored the file--I do my own internal releases anyways. I hope I didn't waste too much of your time with it.

At this stage, runlist should be either None or a list (or maybe a
tuple); if a list etc, it should not be empty (sparse, remember) but
in any case "not []" is treated as true; I can't imagine a case where
"or len(runlist) == 0"  would be required. Secondly, the "value"
should be formatted according to the "cellfont"; there's no guarantee
that "cellfont" doesn't require any markup.
 Just an artifact of when there were empty lists in...

Again, thanks for the help in putting this feature into action and I appreciate all the work you do for the library--it's a great help to many people.




Cheers,
John

John Machin

unread,
Dec 30, 2010, 7:23:21 PM12/30/10
to python...@googlegroups.com
On Wed, December 29, 2010 3:08 pm, Nathan Van Gheem wrote:
> Hi Jon,
>
>
>
>> So I see. I'm sorry that I was not more explictit. In response to your
>
> mentioning an HTML renderer, I said "Let's get this show on the road
>
> first :-)", "this show" meaning the basic rich text functionality.
>
> That was not intended to be an invitation to include code for
>
> including your renderer into xlrd in the patch, and I find it
>
> difficult to imagine how you could construe it otherwise.
>
> Right. I had mentioned I would end up doing it anyway as I needed it for a
> client. I didn't misunderstand you--I just needed it done. You could have
> just ignored the file--I do my own internal releases anyways. I hope I
> didn't waste too much of your time with it.

Hi Nathan, I do hope that pointing out a bug or two wasn't a waste of my
time :-)


>
> At this stage, runlist should be either None or a list (or maybe a
>
> tuple); if a list etc, it should not be empty (sparse, remember) but
>
> in any case "not []" is treated as true; I can't imagine a case where
>
> "or len(runlist) == 0" would be required. Secondly, the "value"
>
> should be formatted according to the "cellfont"; there's no guarantee
>
> that "cellfont" doesn't require any markup.
>
> Just an artifact of when there were empty lists in...

Sorry, I don't understand; an empty list doesn't make a difference; "or
len(runlist) == 0" is redundant in any case:

| >>> same = lambda rl: (not rl or len(rl) == 0) == (not rl)
| >>> map(same, [None, [], [(9, 99)]])
| [True, True, True]

I was however wrong about this: """Secondly, the "value"


should be formatted according to the "cellfont"; there's no guarantee

that "cellfont" doesn't require any markup.""" --- I ignored this
shapeshifter:

value = to_html(value, cellfont)

However there's still a bug lurking there; if the "cellfont" includes an
attribute requiring markup, value will become e.g. u"<i>foo</>". Then, if
there is rich text and the runlist doesn't start at offset 0, it starts
operating on the marked-up value when it should work on the raw value:

val = value[:runlist[0][0]]

> Again, thanks for the help in putting this feature into action and I
> appreciate all the work you do for the library--it's a great help to many
> people.

And many thanks to you too.

Progress report:

Changes to xlrd put into working copy. Passes elementary tests.

Efficiency improvement in LABELSST code (just about every text cell in an
Excel 97-2003 file goes through there).

Rounded out the functionality offering by adding support for rich text in
Excel 5.0/95 files (no SST; need to slurp rt info out of the end of
RSTRING records); Excel 2003 files saved as 5.0/95 deliver the same rt
results (limits: max offset 255, max font index 255).

By the way, your test file richtext.xls has a problem: cell A15 contains a
41590-character string. This is OK for OOo Calc and gnumeric but the max
for Excel 2003 and 2007 is 32767. When file is opened, Excel goes into
"recovery" mode which blows away ALL the strings from the SST, leaving
only the number cells. Worse: it doesn't even say what the problem was.
xlwt already (since a month ago) restricts strings to 32767 characters
(was 65535 limit because the length is an unsigned int). I'll add a
warning to xlrd.

I'm making good progress with a combination thrash test and round trip
test. My idea of a nasty file: 384 different fonts {Excel 2003 maxes out
at "about 400" fonts, 100 rows each with a 32767-character string each
with a 32767-element runlist. This is round-tripping OK with xlrd. Excel
is happy enough with this. If you select all cells and change the font
height, it blinks and flashes for say 30 seconds. That's relatively OK.
Trying to open a file with only 8191-byte strings, OOo Calc goes into
lala-land; after about 20 MINUTES of watching it grab extra memory very
slowly in 4Kb chunks, I killed it. Haven't tried Gnumeric yet. I'll do
some more mucking about (this is fun!) and publish the script. Note that a
32K-char unicode string takes 64KB and then you need to add 4 x 32Kb for
the runlist; each CONTINUE record is only about 8200 bytes. So it looks
like we don't have any problems with SST data being split over CONTINUE
records.

Cheers,
John


Reply all
Reply to author
Forward
0 new messages