In article <7810mk$1i...@nnrp1.dejanews.com> , robert_m_mi...@yahoo.com wrote:
> What's the fastest way to write a copy-file routine? It seems like there's no > way to do block reads and writes, and the fastest copy-file routine I can > write is about 4-5 times slower than the unix `cp', because it reads a byte, > checks for the eof condition, then writes the byte for each and every byte.
Portably? I think the best you can do is to allocate a large buffer and use read-sequence and write-sequence. Use file-length to determine the size of the buffer (and to decide whether you need to copy the file in chunks).
Here's a start. This code does not do chunking -- you must be able to allocate a buffer large enough to hold the file.
The speed of this will depend upon your Lisp vendor's implementation of read/write-sequence.
See the CLHS for additional hints on efficiency of the read/write-sequence functions.
What's the fastest way to write a copy-file routine? It seems like there's no way to do block reads and writes, and the fastest copy-file routine I can write is about 4-5 times slower than the unix `cp', because it reads a byte, checks for the eof condition, then writes the byte for each and every byte.
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
* robert_m_mi...@yahoo.com | What's the fastest way to write a copy-file routine? It seems like | there's no way to do block reads and writes...
there is. READ-SEQUENCE and WRITE-SEQUENCE. apply it to vectors of some reasonable machine size (8192 is frequently large enough to hold at least one complete disk block) having an element-type of either CHARACTER or (UNSIGNED-BYTE 8), and make the streams match that type.
#:Erik -- SIGTHTBABW: a signal sent from Unix to its programmers at random intervals to make them remember that There Has To Be A Better Way.
> In article <7810mk$1i...@nnrp1.dejanews.com> , robert_m_mi...@yahoo.com > wrote:
> > What's the fastest way to write a copy-file routine? It seems like there's no > > way to do block reads and writes, and the fastest copy-file routine I can > > write is about 4-5 times slower than the unix `cp', because it reads a byte, > > checks for the eof condition, then writes the byte for each and every byte.
> Portably? I think the best you can do is to allocate a large buffer and use > read-sequence and write-sequence. Use file-length to determine the size of > the buffer (and to decide whether you need to copy the file in chunks).
> Here's a start. This code does not do chunking -- you must be able to > allocate a buffer large enough to hold the file.
> The speed of this will depend upon your Lisp vendor's implementation of > read/write-sequence.
> See the CLHS for additional hints on efficiency of the read/write-sequence > functions.
This works great. I tried writing a version with chunking, and it's just as fast as the Linux `cp' on ACL, but it hangs CMUCL for some reason. Since it's so basic, I'm sure I screwed up somewhere. Have a look:
> What's the fastest way to write a copy-file routine? It seems like there's no > way to do block reads and writes, and the fastest copy-file routine I can > write is about 4-5 times slower than the unix `cp', because it reads a byte, > checks for the eof condition, then writes the byte for each and every byte.
> -----------== Posted via Deja News, The Discussion Network ==---------- > http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Think about what happens if SIZE is less than 8192.
BTW, stylistic note: any time a number outside the range -1:1, e.g. 8192, appears multiple times in a piece of code, it's probably a candidate for a DEFCONSTANT. In the case of this function, DEFPARAMETER might be more appropriate, so you can experiment with different size buffers simply by changing the value (for those of you who have wondered, this is where DEFPARAMETER gets its name).
-- Barry Margolin, bar...@bbnplanet.com GTE Internetworking, Powered by BBN, Burlington, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Don't bother cc'ing followups to me.
robert_m_mi...@yahoo.com writes: > This works great. I tried writing a version with chunking, and it's just as > fast as the Linux `cp' on ACL, but it hangs CMUCL for some reason. Since it's > so basic, I'm sure I screwed up somewhere. Have a look:
This is a bug in CMUCL. Funnily, this bug is dependant on the size of the buffer. If the size exceeds a certain limit, read-sequence suddenly starts to return 0, although it reads the sequence correctly. I'll forward this post to the CMUCL bug mailing-list.
CMUCL's read-sequence has already a serious bug in current versions, which was exposed by the get-file-as-string I recently posted, and reported to the cmucl-imp mailing-list by Christopher R. Barry. This bug is a flaw in the handling of read-sequence by the compiler, which incorrectly flushes the call, because read-sequence is set-up as a flushable (i.e. non side-effecting) function. Douglas Thomas Crosher located the bug and produced a patch, so future versions of CMUCL will not be affected by this.
* Barry Margolin <bar...@bbnplanet.com> | BTW, stylistic note: any time a number outside the range -1:1, e.g. 8192, | appears multiple times in a piece of code, it's probably a candidate for | a DEFCONSTANT. In the case of this function, DEFPARAMETER might be more | appropriate, so you can experiment with different size buffers simply by | changing the value (for those of you who have wondered, this is where | DEFPARAMETER gets its name).
in this case, I think it's also worth pointing out that the size of the string is still available. (length buffer) will yield 8192.
FWIW, here's my cut at it:
(defconstant disk-block-size 8192)
(defun copy-file (from to) (with-open-file (in from :direction :input) (with-open-file (out to :direction :output :if-exists :supersde) (let ((buffer (make-array disk-block-size :element-type (stream-element-type in)))) (loop for read = (read-sequence buffer in) until (zerop read) do (write-sequence buffer out :end read))))))
#:Erik -- SIGTHTBABW: a signal sent from Unix to its programmers at random intervals to make them remember that There Has To Be A Better Way.
Rainer Joswig wrote: > Do you really want character as the element type?
No. None of the proposed solutions (all of which have tried to copy files as characters) has been portably sufficient to the task of copying a file efficiently. Surprise: It cannot be done portably! It is necessary to make some assumptions about the filesystems etc.
First, a character stream won't suffice because it isn't necessary on every operating system and every file system that any file can smoothly be processed as a stream of characters. Three example glitches are worthy of mention: First, the ANS requires that line breaks in an external character file be translated to and from the single Lisp character #\newline. This isn't a problem on most Unix systems, but on (losing) Windows this requires the Lisp IO routines look at every character coming in and every character going out. Second, some characters (e.g. ^Z on Windows, which is taken as EOF unless special care is taken) won't read on all systems. Third, the existence of the pretty printer suggests that character output streams should try to keep track of the current column position. Consider a program that does:
If mumble-string doesn't end with a #\newline then the pretty printer will need to know the column it ends in to determine if and where to insert line breaks while printing foo.
Now, it might be DWIM for write-sequence to imply that I'm not concerned about tracking the line position, or about line break processing, but that isn't implied by anything in the language standard -- and is sometimes definitely _not_ the case!
X3J13 added read-sequence and write-sequence with good reason and the need they were intended to fill was very real. But they were not based on sufficient real use and were therefore not carefully considered in light of the range of problems they were intended to solve. I know: I was there, I voted for them, and I've been trying to live with them ever since. I have some thoughts on how the IO specification for this industrial, real-world language might be brought into the real world in a future revision of the standard...
Meanwhile, back to the original problem. If one is willing to assume that any file can be copied as a stream of (UNSIGNED-BYTE 8), which is true on all current systems on my aquaintance, then this would be my approach:
(defun copy-file (from to) (with-open-file (out to :element-type '(unsigned-byte 8) :direction :output :if-exists :error :if-does-not-exist :create) (let ((buf (make-array 2048 :element-type '(unsigned-byte 8)))) (with-open-file (in from :element-type '(unsigned-byte 8)) (loop as x = (read-sequence buf in) until (= x 0) do (write-sequence buf out :end x)))) out))
I originally wrote this as part of ACL and casual tests showed it to be essentially as fast as cat on Unix. I think there is a heftier, exported version with lots of options in the 5.0 release.
>Do you really want character as the element type?
And the answer is, of course, no, as Steven Haflich post states. The following works on ACL for Windows, and copies Windows binaries, which the previous version did not):
COPY-FILE old-pathname new-pathname &key :if-exists :fork [Function] copies old-pathname to new-pathname and returns new-pathname. If new-pathname already exists and the value of :if-exists is :error, Macintosh Common Lisp signals an error. If its value is nil, Macintosh Common Lisp returns nil; if it is :overwrite, the previous file is overwritten and new-pathname is returned. :fork can be :data, :resource, or :both. The default is :both.
CL-HTTP has:
HTTP:STREAM-COPY-UNTIL-EOF from-stream to-stream &optional copy-mode [generic-function] Copies all input from FROM-STREAM to TO-STREAM. COPY-MODE can be any of :TEXT, :BINARY, or :CRLF, but defaults to :TEXT. Ports and applications may specialize this method to optimize data transfer rates.
And CL-HTTP also has:
HTTP:COPY-FILE from-pathname to-pathname &key copy-mode &allow-other-keys [generic-function] A portable file copy. COPY-MODE is one of :TEXT, CRLF, or binary.
This function copies one file to another. from-pathname identifies the source file and must refer to a single file and contain no wild com- ponents. to-pathname identifies the destination file and can contain wild components, which are eliminated after merging the defaults by means of :translate-wild-pathname.
copy-file first attempts to open from-path. When that has happened successfully, it parses to-pathname and merges it (using merge- pathnames) against the link-opaque truename of from-pathname and a version of :newest. The output file specified by to-pathname is opened with :if-exists :supersede. The processing of to-pathname has the following result for version numbers.
Source Target Result
>foo>a.b.newest >bar> Retains the version number >foo>a.b.newest >bar>x Makes a new version of >bar>x.b
The defaults for to-pathname come from the link-opaque truename of from-pathname. For systems without links, this is indistinguishable from the truename. Otherwise, the link-opaque truename depends on whether from-pathname contains an :oldest or :newest version. If it does not and if it is fully defaulted, with no wild components, the pathname is its own link-opaque truename. If a pathname x contains an :oldest or :newest version, the link-opaque truename is the pathname of the file or link that corresponds to x, with the version number filled in. For example, copying the LMFS file >a>p1.lisp to
>b> results in >b>p1.lisp, with the version of >a>p1.lisp.newest in-
herited. This is so whether >a>p1.lisp.newest is a real file, a link, or a rename-through link.
By default, copy-file copies the creation date and author of the file.
Following is a description of the other options:
:element-type This argument specifies the type of Lisp object transferred by the stream. Anything that can be rec- ognized as being a finite subtype of character or integer is acceptable. In particular, the following types are recognized:
:characters Possible values: :default copy-file decides whether this is a binary or character transfer ac- cording to the canonical type of from-pathname. You do not need to supply this argument for standard file types. For types that are not known canonical types, it opens from-pathname in :default mode. In that case, the server for the file system containing from-pathname makes the character-or-binary de- cision.
t Specifies that the transfer must be in character mode.
nil Specifies that the transfer must be binary mode (in this case, you must supply byte-size if using a byte size other than 16).
:byte-size Specifies the byte size with which both files are opened for binary transfers. You must supply :byte-size when :characters is nil and the byte size is other than 16. Otherwise, copy-file deter- mines the byte size from the file type for from- pathname. When from-pathname is a binary file with a known canonical type, it determines the byte size from the :binary-file-byte-size property of the type. When the file does not have a known type, it requests the byte size for from-pathname from the file server. When the server for the file system con- taining from-pathname cannot supply the byte size, it assumes that the byte size is 16.
:report-stream When :report-stream is nil (the default), the copying takes place with no messages. Otherwise, the value must be a stream for reporting the start and successful completion of the copying. The com- pletion message contains the truename of to- pathname.
:create-directories Determines whether directories should be created, if needed, for the target of the copy. Permissible val- ues are as follows:
t Try to create the target directory of the copy and all superiors. Re- port directory creation to *standard-output*.
nil Do not try to create directories. If the directory does not exist, handle this condition like any other error.
:query If the directory does not exist, ask whether or not to create it. This is the default.
Erik Naggum <e...@naggum.no> writes: > * Jeff Sandys <Jeffrey.P.San...@NOboeingSPAM.com> > | Why not just issue a shell command?
> calling out to a Unix process suspends the entire Allegro CL process. > this is not good for multiprocessing.
I thought that they promised that version 5.0 would fix that on most Unix platforms via os level threading. I only have 5.0 on Linux which still uses user level threading so I cannot verify that until they send us our Solaris version...
* eric dahlman <dahl...@cs.colostate.edu> | I thought that they promised that version 5.0 would fix that on most | Unix platforms via os level threading.
Allegro CL 5.0 uses OS-level threading only under Bill Gates' regime.
| I only have 5.0 on Linux which still uses user level threading so I | cannot verify that until they send us our Solaris version...
no difference between Linux and Solaris in this regard in Allegro CL 5.0.
BTW, I don't _want_ OS-level threading. it has far more problems than it could ever hope to solve. OS-level threading is something you do when you can't get it right in the language, IMNSHO. C needs it. that's almost enough reason to avoid it.
#:Erik -- SIGTHTBABW: a signal sent from Unix to its programmers at random intervals to make them remember that There Has To Be A Better Way.
One reason for using OS level threads is to allow multiprocessing. Probably there is an alternative way of doing that without having to create processes? Also, could you tell what major problems you see with OS level threads so I can avoid them (I am using LWW) and how you would design applications that scale well on multiprocessor machines. I am interested in others' experiences with multithreading or multiprocessing with LWW or ACL for B.G.R.
> * eric dahlman <dahl...@cs.colostate.edu> > | I thought that they promised that version 5.0 would fix that on most > | Unix platforms via os level threading.
> Allegro CL 5.0 uses OS-level threading only under Bill Gates' regime.
> | I only have 5.0 on Linux which still uses user level threading so I > | cannot verify that until they send us our Solaris version...
> no difference between Linux and Solaris in this regard in Allegro CL 5.0.
> BTW, I don't _want_ OS-level threading. it has far more problems than it > could ever hope to solve. OS-level threading is something you do when > you can't get it right in the language, IMNSHO. C needs it. that's > almost enough reason to avoid it.
> #:Erik > -- > SIGTHTBABW: a signal sent from Unix to its programmers at random > intervals to make them remember that There Has To Be A Better Way.
* Robert Monfera <monf...@fisec.com> | One reason for using OS level threads is to allow multiprocessing.
obviously, Allegro CL has multiprocessing within the same Unix process.
| Probably there is an alternative way of doing that without having to | create processes?
a call to MAKE-PROCESS does indeed create a process, but not in the Unix sense with a separate process idea and memory and all that.
| Also, could you tell what major problems you see with OS level threads so | I can avoid them (I am using LWW) and how you would design applications | that scale well on multiprocessor machines.
the problem with scaling on multiprocessor machines is not the OS-thread or not OS-thread issue, but how Lisp wants a single address space and you are changing the semantics of the language as a whole when you make each thread a separate address space or almost a separate address space.
BTW, I have struggled with cooperating processes on two machines and I really thought I could cut corners relative to the full theory, but it turns out I can't, so I had to settle for one master and multiple slaves. whether the slaves are Lisp processes in the same Unix process, Lisp processes in separate Unix processes on the same machine, or Lisp processes in Unix process elsewhere is not an issue: the interaction between them is based on a low-bandwidth exchange of s-expressions. where I have used run-reasons in the Allegro CL multiprocessing model, I send forms for evaluations to the slaves.
| I am interested in others' experiences with multithreading or | multiprocessing with LWW or ACL for B.G.R.
well, it would be nice if we had a journal in which to publish such things, instead of _just_ the newsgroup and the Web. perhaps at the next Lisp Users and Vendors or Lisp Users Group Meeting.
#:Erik -- SIGTHTBABW: a signal sent from Unix to its programmers at random intervals to make them remember that There Has To Be A Better Way.
Erik Naggum <e...@naggum.no> writes: > the problem with scaling on multiprocessor machines is not the OS-thread > or not OS-thread issue, but how Lisp wants a single address space and you > are changing the semantics of the language as a whole when you make each > thread a separate address space or almost a separate address space.
Kinda OT...
Awhile back I was at one of my friend's places and in his bookshelf he had some cheesy "Time Life Illustrated Guide to Artificial Intelligence" or something like that, and I was bored so I cracked it open and looked through it a bit. It mentioned this computer called the Connection Machine that had 65,000 processors or so (I've never heard of any computer having even 1/8th that many processors, so I think they might have got their info messed up, but maybe not). In CLTL2 Guy Steele mentioned Connection Machine Lisp when talking about the XAPPING format control string for FORMAT, and on Rainier's page there's a picture of one (I can't really make it out though).
I tried doing some web searches for it, but I couldn't find any pictures or documentation of the hardware or software.
Not that this has anything to do with the immediate problem at hand, but it's cool that Lisp has been implemented before to do such massively parallel processing.
* cba...@2xtreme.net (Christopher R. Barry) | Not that this has anything to do with the immediate problem at hand, but | it's cool that Lisp has been implemented before to do such massively | parallel processing.
massively parallel computing really isn't like a multiprocessor: there aren't any "threads" in them. MPC is about breaking down a sequential problem into time-independent parts and has a granularity that is miniscule compared to the very bulky "thread" thingies. e.g., an MPC conceptually executes the several computations in a LET binding, time- independent assignments, and the like on separate processors. some of the benefits of this is actually achieved in modern processors for a small number of parallel computations that use independent registers.
OS-level threads are really the wrong solution to the wrong problem: because the OS likes to suspend the whole process and wake it up when the kernel has serviced a request (which is bad design), you "need" a means to make the kernel let some other, independent part of the process run, but the right way is to send requests to the kernel for execution and pass a continuation along, so you can go on with other things. so this OS-level thread stuff really is about blocking system calls, not so much about exploiting symmetrical multiprocessors. however, it wouldn't be very hard to create a Common Lisp with SMP support -- it just had to work its way down to the machine level in an unprecedented fashion, which the OS-level threads themselves make it very hard for us to do, because they solve the problem "well enough" that people don't bother to do all the extra work it takes to get it _right_.
#:Erik -- SIGTHTBABW: a signal sent from Unix to its programmers at random intervals to make them remember that There Has To Be A Better Way.
In article <87ogntrr8s....@2xtreme.net>, cba...@2xtreme.net (Christopher
R. Barry) wrote:
[snip]
> It mentioned this computer called > the Connection Machine that had 65,000 processors or so (I've never > heard of any computer having even 1/8th that many processors, so I > think they might have got their info messed up, but maybe not). [snip] > I tried doing some web searches for it, but I couldn't find any > pictures or documentation of the hardware or software.
[snip]
65k processors is correct.
read hillis' dissertation, probably available as an MIT tech report. it was also published under the title 'the connection machine'. it makes interesting reading.
there's also the following reference:
Wholey, Skef. An experimental implementation of connection machine lisp. in Peter Lee (Ed.), "Topics in Advanced Language Implementation".
i don't know much about traditional approaches to multiprocessing but enjoyed these readings.
In article <87ogntrr8s....@2xtreme.net>, Christopher R. Barry <cba...@2xtreme.net> wrote:
>Awhile back I was at one of my friend's places and in his bookshelf he >had some cheesy "Time Life Illustrated Guide to Artificial >Intelligence" or something like that, and I was bored so I cracked it >open and looked through it a bit. It mentioned this computer called >the Connection Machine that had 65,000 processors or so (I've never >heard of any computer having even 1/8th that many processors, so I >think they might have got their info messed up, but maybe not). In >CLTL2 Guy Steele mentioned Connection Machine Lisp when talking about >the XAPPING format control string for FORMAT, and on Rainier's page >there's a picture of one (I can't really make it out though).
I worked for Thinking Machines, and the CM-1 and CM-2 did support up to 64K processors. However, these processors are extremely simple, 1-bit processors; there were something like 16 or 32 of them on a chip. The idea was modeled on the brain -- a complex network of extremely simple components.
The followon system, the CM-5, used SPARC processors. I think the architecture was designed to scale potentially to 16K, but the largest we could make with the technology at the time was somewhere between 1K and 4K (I don't remember).
>I tried doing some web searches for it, but I couldn't find any >pictures or documentation of the hardware or software.
W. Daniel Hillis, the founder of TMC, wrote a book on the Connection Machine, published by MIT Press.
The company filed for bankruptcy in 1994; when they came out, the only thing left was the division working on data warehousing, and that's what TMC currently does. The hardware is no longer sold (another company is still providing maintenance on the ones that are still in operation).
-- Barry Margolin, bar...@bbnplanet.com GTE Internetworking, Powered by BBN, Burlington, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Don't bother cc'ing followups to me.
We had a 16k processor CM-1 at Yale when I was a grad student. I recall having great fun (and miserable headaches) trying to program in *LISP (the name of the lisp for the CM). Poking around on my bookshelf, I found I still have a *LISP manual from 1990 (version 5.2). I (and many others) wrote a fairly fast backprop/neural network learner in *LISP, but the most popular thing we did was a version of Conway's game of life that used the LEDs on the computer itself for a display. Very high coolness factor.
Since Hillis and the CM came from the AI lab, there was an initial bias toward Lisp. Unfortunately, according to my friends who were at TMC, it became clear fairly quickly that most of the people actually putting up money to buy the machine wanted *FORTRAN, so *LISP languished.
Larry
-- Lawrence Hunter, PhD. National Library of Medicine phone: +1 (301) 496-9303 Bldg. 38A, 9th fl, MS-54 fax: +1 (301) 496-0673 Bethesda. MD 20894 USA email: hun...@nlm.nih.gov