Surprising but convenient

21 views
Skip to first unread message

Jacob Jozef

unread,
Dec 19, 2021, 11:50:17 AM12/19/21
to Racket Users

Hi

 

I start a thread printing a file. When the file will be very long (say 50 GB) the thread takes too much time and I kill it when it takes more time than I want to permit. Outside the thread I clean up. Closing the output-port before deleting the file takes much time. But to my surprise I can delete the file before closing the port, which hardly takes time.

Nice.

Is this intended behaviour?

After deleting the file, the port remains open, but writing to it does nothing. Correct?

 

Thanks, Jos

 

 

 

 

Sage Gerard

unread,
Dec 19, 2021, 12:08:22 PM12/19/21
to racket...@googlegroups.com

If I understand this correctly, there's a difference between deleting a reference to 50 GB (like an inode), and actually writing 50 GB.

When you write to an output port, you are writing to a buffer in memory. This prevents the slow downs you've witnessed, because storage mediums are comparably slow. "Flushing" with (flush-output) or plumbers on port closure actually sends bytes outside of the process.

I wouldn't try using the same port after deleting a file. If Racket and the operating system initially agreed on a file descriptor that is now invalid, then you'll need to address that by opening a new port.

Note that I don't Racket's implementation details here. I'm recalling what I've seen happen across languages.

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/EEC04679-E526-4215-B4DE-502B5B10567A%40hxcore.ol.

Bruce O'Neel

unread,
Dec 19, 2021, 12:19:54 PM12/19/21
to Sage Gerard, racket...@googlegroups.com
HI,

It is a touch unclear what you mean by deleting the file and on which OS you are using.

On Linux and similar OSes  the rm command just calls the unlink system call.  This removes the file from the directory and if the link count is now 0 then the file is removed from disk.  But this last part happens only if the file is not open.  If it open you get a situation where it is still on disk but not visible in any directory.

As long as the process does not close the file then all is good.  You can write and read to the file with no problems.  But when the file is closed then the file will disappear from disk.

More than once in my life there has been the "where did all the disk space go?" conversation that has resulted in a multi megabyte, multi gigabyte or now multi terabyte file being found and deleted.  Without figuring out that it was open and therefore won't go away until some process is killed as well.

Now none of the above could apply to Racket.  One would need to know the gory details of how file writing is implemented. 

But this would explain what you see.

cheers

bruce


Jacob Jozef

unread,
Dec 19, 2021, 2:05:52 PM12/19/21
to Bruce O'Neel, Sage Gerard, racket...@googlegroups.com

Thanks to Bruce and Sage.

 

I work with DrRacket 8.3 CS under windows 10. Indeed, I can see that during closing the port before deleting the file, there is activity on the output device for some time. Probably because of flushing buffers. I have experimented a bit with output of 1 GB to a 16 GB USB stick. I have no idea where buffers with capacity of 1 GB or more may reside. I can see that after deletion and closing, the 1GB is freed.

 

Jos

George Neuner

unread,
Dec 19, 2021, 11:37:51 PM12/19/21
to racket...@googlegroups.com
On Sun, 19 Dec 2021 17:50:11 +0100, Jacob Jozef
<jos....@gmail.com> wrote:

>I start a thread printing a file. When the file will be very long (say
>50 GB) the thread takes too much time and I kill it when it takes
>more time than I want to permit. Outside the thread I clean up.
>Closing the output-port before deleting the file takes much time. But
>to my surprise I can delete the file before closing the port, which
>hardly takes time.

It will take less time to stop the thread if you work line by line, or
in small(ish) groups of lines, and flush the output port each time you
go back for more input.

As for deleting the file, all you have done is remove the directory
entry ... the file itself won't go away until the last open handle on
it is closed. That particular behavior /is the same/ in Windows and
Unix/Linux.


>After deleting the file, the port remains open, but writing to it does
>nothing. Correct?

No. Your program still has an open handle to the file and it is still
reading from it. Only the file's directory entry is gone.

You need to close the open file handle in your program in order to
truly delete the file. Depending on your code, simply killing the
thread may not do that.


>Thanks, Jos

Hope this helps,
George

Jacob Jozef

unread,
Dec 20, 2021, 7:02:10 AM12/20/21
to George Neuner, racket...@googlegroups.com

Thanks George,

I got it now and after some experimenting have seen what you describe.

Jos

 

PS,

I do close the port after killing the thread and deleting the file. I came across this because I have an object that is written on about two pages with sharing enabled, but takes more than GBs to write without sharing enabled. I included my thread as an example. It is in file The-Little-LISPer.rkt on joskoot/The-Little-LISPer (github.com). The object is the result of having a meta-recursive interpreter interpret its own source code. Following the style of the book by Danial P. Friedman and Mathias Felleisen, non primitive functions are represented by symbolic expressions. In the same GitHub is file interpreter.rkt which represents functions by procedures. The latter writes “#<procedure>” when interpreting its own source code. (When you want to run my code, you may have to install my package test, which also is in my GitHub repositories (documentation included)).

 

Jos

--

You received this message because you are subscribed to the Google Groups "Racket Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages