Huge Files

195 views
Skip to first unread message

Dennis Whiteman

unread,
Nov 13, 2008, 9:32:23 AM11/13/08
to BBEdit Talk, sup...@barebones.com

One of the things I've always liked about BBEdit (and TextWrangler) is
the ability to open really large text files. I used this a lot when I
was hosting web sites to edit very large log files, trimming a few
entries from last year and/or adding them to next years.

It was not uncommon for me to open files 1-gig or larger, so long as I
had more RAM on the Mac than the size of the file. When you opened
such large files, you might get the spinning donut of death for five
minutes or more, but BBEdit would open the file and allow you to make
some changes and then save it.

I have a project right now that has a 668MB MySQL dumped sql file and
I'm unable to open it in BBEdit. I get a MacOS Error code -116 with an
option to copy it to the clipboard, but of course it fails. In this
case, I'd really like to do a search and replace on that file to fix
some character encoding problems I'm having.

It's been a while since I tried this. Is this something that occurred
with Mac OS X at some point (I'm running Leopard) or is this something
with BBEdit?

Dennis

Charlie Garrison

unread,
Nov 13, 2008, 9:45:37 AM11/13/08
to bbe...@googlegroups.com
Good morning,

On 13/11/08 at 6:32 AM -0800, Dennis Whiteman
<fast...@gmail.com> wrote:

>I have a project right now that has a 668MB MySQL dumped sql file and
>I'm unable to open it in BBEdit. I get a MacOS Error code -116 with an
>option to copy it to the clipboard, but of course it fails. In this
>case, I'd really like to do a search and replace on that file to fix
>some character encoding problems I'm having.

Make sure it's not using the mysql (or any other) language file
to do syntax highlighting. I don't really know if that's the
problem, but I can imagine it's making BBEdit's job harder (&
maybe too hard).


Charlie

--
Charlie Garrison <garr...@zeta.org.au>
PO Box 141, Windsor, NSW 2756, Australia

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
http://www.ietf.org/rfc/rfc1855.txt

Maarten Sneep

unread,
Nov 13, 2008, 9:47:52 AM11/13/08
to bbe...@googlegroups.com
On 13 nov 2008, at 15:32, Dennis Whiteman wrote:

> It was not uncommon for me to open files 1-gig or larger, so long as I
> had more RAM on the Mac than the size of the file. When you opened
> such large files, you might get the spinning donut of death for five
> minutes or more, but BBEdit would open the file and allow you to make
> some changes and then save it.

BBEdit uses a unicode representation internally. The advantage to use
any character you fancy is offset for some by the larger memory
requirements. It seems you've been bitten by this.

> I have a project right now that has a 668MB MySQL dumped sql file and
> I'm unable to open it in BBEdit. I get a MacOS Error code -116 with an
> option to copy it to the clipboard, but of course it fails. In this
> case, I'd really like to do a search and replace on that file to fix
> some character encoding problems I'm having.

Error code -116 is a memory error, as you probably suspected (or found
with Google, as I did).

> It's been a while since I tried this. Is this something that occurred
> with Mac OS X at some point (I'm running Leopard) or is this something
> with BBEdit?

BBedit, I'd say, although the switch was probably inevitable.

split and cat in the terminal can probably help you here. Split the
file into multiple parts, edit as needed, and cat them back together.

Maarten

David Kelly

unread,
Nov 13, 2008, 10:28:54 AM11/13/08
to bbe...@googlegroups.com
On Thu, Nov 13, 2008 at 03:47:52PM +0100, Maarten Sneep wrote:
>
> split and cat in the terminal can probably help you here. Split the
> file into multiple parts, edit as needed, and cat them back together.

Sed, awk, or perl, can stream the file in, make the change(s), and write
the result to a new file using very little memory. And very little time
more than it would take to copy the file.

--
David Kelly N4HHE, dke...@HiWAAY.net
========================================================================
Whom computers would destroy, they must first drive mad.

Dennis Whiteman

unread,
Nov 13, 2008, 10:56:04 AM11/13/08
to BBEdit Talk
Thanks. I've used SED before and it works very well for scripted
search and replace.

In this case, I may not be aware of all the things I want to change so
being able to quickly and visually glance trough the file would be
helpful. For example, I might not know the names of some of the tables
I want to change.

I suspected it was a problem either with syntax coloring or character
encoding or both and the resources they user over plain text. I
received a link from support to this part of the FAQ saying that only
files up to 384MB are supported...

<http://faq.barebones.com/do_getanswer.php?record_id=36>

I need to split that file into chunks smaller than that. You would
think this is one of the reasons that 64-bit computing was invented,
but at least there's a workaround.

Dennis

Doug McNutt

unread,
Nov 13, 2008, 3:44:23 PM11/13/08
to bbe...@googlegroups.com
At 06:32 -0800 11/13/08, Dennis Whiteman wrote, and I snipped a bit:

>
>It was not uncommon for me to open files 1-gig or larger, so long as I
>had more RAM on the Mac than the size of the file. When you opened
>such large files, you might get the spinning donut of death for five
>minutes or more, but BBEdit would open the file and allow you to make
>some changes and then save it.
>
>It's been a while since I tried this. Is this something that occurred
>with Mac OS X at some point (I'm running Leopard) or is this something
>with BBEdit?

Depending on time intervals it's possible that BBEdit's change to
internal use of 16 bits per character has made your old experience
off by a factor of two.

But then it is UNIX which ought to be capable of page swapping to
allow, slowly, for huge files up to 32 bits of address.

--

--> From the USSA, the only socialist country that refuses to admit it. <--

Johan Solve

unread,
Nov 13, 2008, 4:38:08 PM11/13/08
to bbe...@googlegroups.com
Here's an earlier discussion about this, with an authoritative answer:
http://www.listsearch.com/BBEdit/Thread/index.lasso?4878#22517

At 15.12 -0500 2006-11-12, Rich Siegel wrote:
>Because of a limitation in the OS, the largest file size we can currently read is 768M when represented as Unicode. Note the qualification; since BBEdit represents files internally as Unicode, a character takes up two bytes, so today's practical limit is 384M on disk (since that'll expand to 768M when represented as Unicode). Working around the limitation is on our to-do list.
--
Johan Sölve [FSA Member, Lasso Partner]
Web Application/Lasso/FileMaker Developer
MONTANIA SOFTWARE & SOLUTIONS
http://www.montania.se mailto:jo...@montania.se
(spam-safe email address, replace '-' with 'a')

David Kelly

unread,
Nov 13, 2008, 5:07:54 PM11/13/08
to bbe...@googlegroups.com
On Thu, Nov 13, 2008 at 10:38:08PM +0100, Johan Solve wrote:
>
> Here's an earlier discussion about this, with an authoritative answer:
> http://www.listsearch.com/BBEdit/Thread/index.lasso?4878#22517
>
> At 15.12 -0500 2006-11-12, Rich Siegel wrote:
> >Because of a limitation in the OS, the largest file size we can
> >currently read is 768M when represented as Unicode. Note the
> >qualification; since BBEdit represents files internally as Unicode, a
> >character takes up two bytes, so today's practical limit is 384M on
> >disk (since that'll expand to 768M when represented as Unicode).
> >Working around the limitation is on our to-do list.

OK, that is not so much a limitation as its a deliberate limit imposed
by the OS to prevent a process from claiming all the system resources.

For example the default in FreeBSD is 512MB per process. It can be
overridden with a kernel tuning parameter. Perhaps the same is true of
MacOS X?

OTOH I wouldn't recommend it for casual use. Some of the tables a kernel
allocates for tracking resources are statically allocated at boot time
for efficiency and to make sure they exist. The larger the allowed max
process size the larger the process tables have to be to track memory
allocated to the process. All processes no matter how small have to have
these large tables because one never knows how large the process will
grow. So the larger the process memory size limit the more overhead is
used tracking all processes and/or threads.

Steve Kalkwarf

unread,
Nov 13, 2008, 5:41:33 PM11/13/08
to bbe...@googlegroups.com
On Nov 13, 2008, at 5:07 PM, David Kelly wrote:

>
> On Thu, Nov 13, 2008 at 10:38:08PM +0100, Johan Solve wrote:
>>
>> Here's an earlier discussion about this, with an authoritative
>> answer:
>> http://www.listsearch.com/BBEdit/Thread/index.lasso?4878#22517
>>
>> At 15.12 -0500 2006-11-12, Rich Siegel wrote:
>>> Because of a limitation in the OS, the largest file size we can
>>> currently read is 768M when represented as Unicode. Note the
>>> qualification; since BBEdit represents files internally as
>>> Unicode, a
>>> character takes up two bytes, so today's practical limit is 384M on
>>> disk (since that'll expand to 768M when represented as Unicode).
>>> Working around the limitation is on our to-do list.
>
> OK, that is not so much a limitation as its a deliberate limit imposed
> by the OS to prevent a process from claiming all the system resources.

It's not as simple as raising limits. Some of the APIs BBEdit uses
requires the use of Handles, and for reasons out of our control,
Handles are currently limited to representing 768M of memory.

Working around this limit is still on the to-do list.

Steve

johnde...@gmail.com

unread,
Nov 13, 2008, 6:16:56 PM11/13/08
to bbe...@googlegroups.com
At 07:56 -0800 13/11/08, Dennis Whiteman wrote:

>Thanks. I've used SED before and it works very well for scripted
>search and replace.
>
>In this case, I may not be aware of all the things I want to change so
>being able to quickly and visually glance trough the file would be
>helpful. For example, I might not know the names of some of the tables
>I want to change.

If you run a Perl script in the Terminal then you can display
whatever you need as you go along, but you must let us know how you
quickly and visually glance through 668 megabytes. Sounds rather
like lightly tripping barefoot through a snowdrift!

JD

David Kelly

unread,
Nov 13, 2008, 9:30:14 PM11/13/08
to bbe...@googlegroups.com

Thats what I said. To the effect the table allocation for tracking
resources was statically allocated and therefore limited. That this is
often deliberate. Remember each process is running in a virtual memory
space so somehow the OS has to keep track of the pages allocated to
the process and there is a tradeoff between doing this efficiently vs
fast. It would appear Apple drew that line at 768 MB. Even so,
sometimes those lines can be moved early during the boot process
before the lines have been drawn.

> Working around this limit is still on the to-do list.

Means one has to change from a flat memory model where the OS and MMU
move data on/off disk for you to one where you have to do it manually
in software. This is exactly the sort of thing that made WordPerfect
and WordStar's fame in CP/M days, the ability to edit files limited
only by disk space rather than the usual 58k or 60k maximum RAM
available.

Classic MacOS used Handles to page CODE resources in/out. Allowed
programs bigger than 128k to run in the original Mac.

Rich Siegel

unread,
Nov 13, 2008, 10:11:43 PM11/13/08
to bbe...@googlegroups.com
On 11/13/08 at 9:30 PM, dke...@hiwaay.net (David Kelly) wrote:

>Thats what I said. To the effect the table allocation for
>tracking resources was statically allocated and therefore limited.

That actually has nothing to do with it. :-) Other allocations,
e.g. via malloc() do not share the same limitation. It's an
implementation detail of NewHandle() in the guts of the OS, and
is for us to work around at some future point.

R.
--
Rich Siegel Bare Bones Software, Inc.
<sie...@barebones.com> <http://www.barebones.com/>

Someday I'll look back on all this and laugh... until they
sedate me.

David Kelly

unread,
Nov 14, 2008, 12:46:39 AM11/14/08
to bbe...@googlegroups.com

On Nov 13, 2008, at 9:11 PM, Rich Siegel wrote:

>
> On 11/13/08 at 9:30 PM, dke...@hiwaay.net (David Kelly) wrote:
>
>> Thats what I said. To the effect the table allocation for
>> tracking resources was statically allocated and therefore limited.
>
> That actually has nothing to do with it. :-) Other allocations,
> e.g. via malloc() do not share the same limitation. It's an
> implementation detail of NewHandle() in the guts of the OS, and
> is for us to work around at some future point.

Very interesting. malloc() does work for 2 GB. Broke at 3 GB:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>

#define GIGABYTES 2000000000UL

main()
{
uint32_t i;
char *cp;

cp = malloc( GIGABYTES );
printf("cp = 0x%08x\n", cp );

for( i = 0 ; i < GIGABYTES ; i++ )
cp[i] = 0x55;

printf("sleeping\n");
sleep( 1000 );

Bee

unread,
Nov 17, 2008, 11:43:05 PM11/17/08
to bbe...@googlegroups.com

On Nov 13, 2008 6:32 AM, Dennis Whiteman wrote:
> I have a project right now that has a 668MB MySQL dumped sql file
> and I'm unable to open it in BBEdit. I get a MacOS Error code -116

I have been using 0xED to open huge files, just opened a 3.52 GB
file, opens almost instantly on an ancient PowerBook G3 Pismo 500. It
is a Hex Editor but the window is fully sizeable so the text area can
be quite large. Search/Replace is fast, scrolling is fast. And it is
a free program. Many other great features.

<http://www.macupdate.com/info.php/id/22750/0xed>

--
Bill
Santa Cruz, California


Reply all
Reply to author
Forward
0 new messages