On Fri, 30 Oct 2009 12:19:46 +0100, Pascal J. Bourguignon wrote: > Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
>> It would be nice if there were >> a graphical differences tool- that is one that would didplay the >> picture and highlight the differnces. Not undoable I'm sure.
> (Unfortunately, it's only a pixel diff/merge operation, we would have > to use more sophisticated algorithms to detect in a concise form block > transformations; but even in this case, this gimp script is rather > effective, visually).
Yes, as someone with image processing, pattern recognition and AI background I can tell you, there is no such algorithms. You will need a complex scene analysis in order to get at the level of a diagram. Even this does not work. Because image segmenting and line detection do not really do. Such a trivial task, but alas, no algorithm can compete human in image segmenting. But even if you passed through, got geometric regions, lines etc. To analyse their connections, shapes (e.g. the scene), that does not work. Diagrams generated by GUI tools skip these steps, because the tool places this information into the intermediate file.
The point is that the task is immense and though we have solved it during our evolution, my guess is that it leaves no free "computational" resources to analyse the picture at a much detailed level. It wasn't evolutionary needed too.
On Tue, 27 Oct 2009 10:09:25 -0400, Joshua Cranmer
<Pidgeo...@verizon.invalid> wrote: >On 10/26/2009 06:22 PM, Richard Harter wrote: >> The antithesis is that it isn't turtles all the way up, or at >> least it shouldn't be turtles all the way up. That is, the kinds >> of languages and programming technologies that we use in >> large scale programming should be quite different from the kind >> of languages used for programming in the small.
>I think most people would agree that methodologies for small-scale >programs are different from those for large-scale programs, although >there might be considerable disagreement over the dividing line. I >certainly am not going to be used a full-blown OOP paradigm for writing >even something like an automatic code generator; on the other hand, I >shudder at the thought of writing something so complex as an operating >system in a functional paradigm.
IIANM people have written operating systems in functional languages without difficulty.
>An interesting project would be to take a large corpus of various mature >open-source programs of varying size and see if there is a correlation >between size and the use of certain language features or patterns.
Perhaps this would be an appropriate project for CS PhD candidates.
>> One difference between traditional programs and data flow >> programs is that traditional programs use LIFO semantics whereas >> data flow programs use FIFO semantics. That is, a traditional >> program puts data on a stack and gets data back on the same >> stack. In data flow programs each element gets data from a queue >> and puts data to other queues.
>One nit to point out: at this stage, many programs don't follow a >procedural model of programming. OOP is the dominant paradigm, and I >don't see the sequence of data flow as being LIFO. Yet you continually >refer to procedural programming as those that make up `traditional >programs.
To nit a nit - the word "procedural" doesn't appear at all in the article. I used the term "traditional imperative". More to the point, data flow is LIFO. By this I don't mean all data flow; rather I am referring to the data passed to functions/methods via calling sequences and the data returned from them via return statements. Likewise the flow of control is LIFO. That is, when a function/method exits it returns control to the place it was called from.
>> Another difference is that the connectivity of traditional >> programs is deeply embedded in the code. To pass data from A to >> B, A calls B. That is, the caller has to specify where the data >> goes. In data flow programs the connectivity can be separate >> from the code. A doesn't pass data directly to B; instead it >> passes data to the run time system which in turn passes the data >> to B. The caller does not have to specify where the data goes.
>Two words: function pointers. You can also go with virtual or dynamic >method dispatch, but function pointers is shorter and sounds better.
Well, no. Function pointers et al are still specifications of destination. More than that, A actually sends the data immediately to B via a call. Consider the following two lines of code which look very similar:
func(x); /* func is a function pointer */ send(x); /* send is a messaging primitive */
In the first line "func is bound (directly or indirectly) to the actual function that will act on x. In the second "send" is not bound to the function that will act on x.
>> * Concurrency and parallelism are natural. Code can be >> distributed between cores and even across networks. Many of the >> problems associated with threads disappear.
>They don't disappear, they're pushed into the runtime system. From >practical experience, that means they probably bubble up and annoy you >in the programming language.
What kind of practical experience have you had with data flow languages?
>> * Data flow networks are natural for representing process. This >> is particularly true for transaction processing.
>Maybe I'm just being a stupid idiot here, but I don't see how data flow >is natural for representing some common processes. For example, the >HTML5 tokenization process. I suppose you can flow current-state output >back around and into itself as next-state input, but that's not exactly >natural.
I sounds as though you're asking how one would implement a state machine in a dataflow language. It would be straightforward enough, but there's not much point in doing it unless it gives you cheap thrills.
>This also brings up a question of how the language deals with loops in >the dependency graph. The solutions I see either bring back the problems >with threading or could create unreliability.
>> * Message passing gets rid of the problems associated with shared >> memory and locks.
>Just as credit default swaps and other financial machinations got rid of >risk :-).
It's the other way around. Threading, shared memory, mutexes, and locks are the equivalent of default swaps. :-)
The argument really is straightforward. Data flow languages typically require that shared memory be immutable. If it's not immutable then it's not shared. This works. There are prices. Mostly they are performance prices.
>From what I can tell, similar problems arise, but they're in a >different form (data flow dependency graphs, particularly the fact that >they're typically not acyclic).
Perhaps you could give an example of what you are thinking of. It is not quite clear to me why you think this is a problem.
>> * Data flow programs are more extensible than traditional >> programs. Elements can be readily composed into composite >> elements from the bottom up rather than top down.
>Even if you consider good old procedural programming, I would say that >the exact same thing is easily doable in traditional programs. I can >take code that does asynchronous socket reads and turn it into a MIME >decoder by creating a module that calls the asynchronous socket read >module appropriately. It's also an example of your "bottom-up composition."
>> Some significant disadvantages:
>> * Using data flow programming in the large pretty much requires >> that it be used from the start. That is, converting traditional >> programs into data flow programs is difficult because the >> structuring is so different.
>I don't see it that way. Ultimately, most libraries are merely "pass X >in as input, get Y out as output"--it shouldn't be too hard to make that >an atomic data-flow program block.
The problem doesn't lie in the libraries; it lies in the superstructure and program organization.
> On Fri, 30 Oct 2009 12:19:46 +0100, Pascal J. Bourguignon wrote: > > Nick Keighley <nick_keighley_nos...@hotmail.com> writes: > >> It would be nice if there were > >> a graphical differences tool- that is one that would display the > >> picture and highlight the differnces. Not undoable I'm sure.
> > (Unfortunately, it's only a pixel diff/merge operation, we would have > > to use more sophisticated algorithms to detect in a concise form block > > transformations; but even in this case, this gimp script is rather > > effective, visually).
> Yes, as someone with image processing, pattern recognition and AI > background I can tell you, there is no such algorithms. You will need a > complex scene analysis in order to get at the level of a diagram. Even this > does not work. Because image segmenting and line detection do not really > do. Such a trivial task, but alas, no algorithm can compete human in image > segmenting. But even if you passed through, got geometric regions, lines > etc. To analyse their connections, shapes (e.g. the scene), that does not > work. Diagrams generated by GUI tools skip these steps, because the tool > places this information into the intermediate file.
I had in mind an easier problem than comparing two pixel collections. I was assuming the diagram was a collection of shapes (typical sort of thing graphical design tools mess around with). I was hoping such diagrams could be compared fairly easily, though in another post you dash this hope! :-(
> The point is that the task is immense and though we have solved it during > our evolution, my guess is that it leaves no free "computational" resources > to analyse the picture at a much detailed level. It wasn't evolutionary > needed too.
> On Fri, 30 Oct 2009 03:32:30 -0700 (PDT), Nick Keighley wrote: > > On 29 Oct, 18:15, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> > >> On Thu, 29 Oct 2009 05:10:37 -0700 (PDT), Nick Keighley wrote: > >>> On 27 Oct, 08:55, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> > >>>> * Unmaintainable code. Look at large data flow programs (e.g. in DiaDem, > >>>> Simulink, LabView). It is impossible to use reasonable software processes > >>>> on them. How to compare two diagrams?
> >>> convert it to a textual represention then run diff on it. I'm not > >>> saying it's trivial but I don't think its intractable either.
> >> Are pixel positions and sizes of the blocks relevant to the comparison? > >> (:-))
> > I presume the smiley means you can answer that yourself...
> No, equivalence of two directed graphs is not a simple task.
oh, shame. I was perhaps hoping it was doable if the two diagrams were "similar"
> >> The very argument that a text form is somewhat better raises a suspicion > >> why not to use it from the start?
> > where did I say better? They have different strengths and weaknesses. > > Programmers I know spend a fair amount of time drawing strange > > diagrams on whiteboards. Personnally, I think they should then throw > > the diagrams away and code it up but a lot of people like diagrams. > > You can have both.
> Yes, physicists draw lots of diagrams too. Nevertheless the language of > physics is differential equations. The role of diagrams is always > supplementary, they cannot serve as a language.
someof the UML people seem to think it can. (Perhaps UML + something else as I'm not sure UML itself has any semantics).
I know I'm posting from ignorance here, but I'd be interested in learning something
> > yes, but I fail to see why the graphical form presents any different > > problems from the textual form.
> It is easier to analyse textual form for both humans and computers.
> >> Ah, that is maybe because there were only alphanumeric display device back > >> then? (:-))
> > blinking lights and switches. If you can't read the machine code in > > binary you aren't a Real programmer. Octal is for quiche eaters.
> Yep, that was fun. You could immediately see if the program ran in a cycle, > the pattern repeated itself. Blue screen, that's not suckers! Real men read > the program counter of the crash location looking at the front panel LEDs! > (:-))
> > I think that most of the problems inherent in any large-scale > > programming project result from the inherent 'fragility' of all > > programming languages.
> > If you compare a large computing project with a large engineering > > project there are clear similarities, but one very significant > > difference is that almost any undetected error in code has the potential > > to result, somewhere down the line, in catastrophic outcomes; whereas if > > a nail is not quite hammered in as far as specified or if a light > > fitting (or even a window) is put in the wrong place then the building > > usually remains safe, functional, and fit for purpose.
> > If someone has some ideas about a programming language/paradigm that is > > fault-resistant, not only in the sense that it reduces the number of > > actual bugs and/or errors produced, but also in the sense that many bugs > > have no significant effect on function and behaviour then large-scale > > projects may be a lot easier to manage.
> > Whether this will ever be a possibility remains, in my opinion, unknown > > to date.
> In engineering works there are "factors of safety" to take account > of variability of materials, unpredictability of actual loadings > and inaccuracies in construction etc. etc. In computing, analogous > has been done. For results for control of critical (including > in particular potentially dangerous) processes, one employs > double or triple hardware to take care of the possibility of > malfunction of hardware. As far as I know, in order to better > detect programmer errors, one similarly employs different and > independent teams of programmers to do the same job and then with > test cases to compare the results of the resulting programs. > However, if I don't err, this comparision is only done in the design > phase of the software and later only the work of one of the teams > is selected for practical application. A safer way, I think, would > have these presumably equivalent software (resulting from different > teams, employing preferably different programming languages and > environments) in actual production runs to always work parallelly > on multiple hardware (of possibly different types), so as to futher > reduce the risk of errors, since testing of software in the design > phase might not be thorough enough to uncover all errors that may be > present. Of course, errors could not be "absolutely" eradicated, > in accordance with Murphy's Law.
Yes - it is true that paralelling the hardware and software with result comparison migh help reduce software errors, but I think it may be a while before I can load up a [Windows/Linux/Chrome] operating system on my new [AMD/Intel/Via(?)] chipset...
mike wrote: > mok-kong.shen wrote: [snip] >> .......... A safer way, I think, would >> have these presumably equivalent software (resulting from different >> teams, employing preferably different programming languages and >> environments) in actual production runs to always work parallelly >> on multiple hardware (of possibly different types), so as to futher >> reduce the risk of errors, since testing of software in the design >> phase might not be thorough enough to uncover all errors that may be >> present. Of course, errors could not be "absolutely" eradicated, >> in accordance with Murphy's Law.
> Yes - it is true that paralelling the hardware and software with result > comparison migh help reduce software errors, but I think it may be a > while before I can load up a [Windows/Linux/Chrome] operating system on > my new [AMD/Intel/Via(?)] chipset...
It depends of course on how critical the consequence of an eventual error would be in one's application. In allday works, the tolerance is relatively high. That's why almost all commonly used software repeatedly have updates, which not only introduce new features but also often correct errors, and yet people use them. One takes that risk consciously, just like anyone taking a flight knows that there is a very small but certainly non-zero probability that his plane would have troubles en route and he might not arrive at his destination. He must judge whether it is wise for him to take the risk. That is unfortunately life.
On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote: >>> Now if you need to write a million of source line, then just don't >>> do it. Use metaprogramming to generate this million of source lines >>> from a smaller source. And so on, you can add layers of >>> metaprogramming all you need to compact your sources and always have >>> something of manageable size.
>> So, you're saying that *every* programming problem can be solved in at >> most a few tens of thousands of lines of code?
> Can you not specify all programming problem in less that a few > thousands of lines of specification?
Yes but that is usually a top-down specification, and while implementing lots of little decisions still have to be made.
So usually, such specifications are not complete.
> Well, you can always write more detailed specifications, but I can > assure you that sales peoples will always be able to put the whole > specifications of your software on a 2-page booklet.
Yes, but strictly speaking that is a summary. Not a full specification.
IOW you reduce complexity by removing details, and describe first order operation only. Not by introducing an abstraction that reduces data.
>> Certainly some problems can, but most can't. Metaprogramming is just >> a form of compression, and there is no compression system that can >> reduce every source below a given size.
I like that analogy.
>> Some problems really are irreducibly complex, and demand complex >> solutions.
> Yes indeed. However, assuming a big ontology (eg. take wikipedia, or > even the whole web), wouldn't it be possible to express the needs for > any software in less than ten thousands lines, and let the > sufficiently smart system develop it, filling in the blanks in the > specifications with all the knowledge it can extract from the web?
I don't think so. Trying to do that you just move a lot of assumptions and decisions in the architecture of the interpreter of those thousand lines, which will make that interpreter harder to reuse.
On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Yes indeed. However, assuming a big ontology (eg. take wikipedia, > or even the whole web), wouldn't it be possible to express the needs > for any software in less than ten thousands lines, and let the > sufficiently smart system develop it, filling in the blanks in the > specifications with all the knowledge it can extract from the web?
You can do that: but now your program's correctness depends upon the correctness of every detail in Wikipedia, or even the whole web, as well as that of both the "sufficiently smart" system that extracts information from it, and the ten thousand lines of specification.
And yet, there will still be programs that cannot be specified in less than ten thousand lines (unless the lines are of unbounded length).
> Or take the problem actually in the other dirrection. Would you > trust any implementation of a system that has orders of magnitude > more than ten thousand lines of specifications?
Not with the fate of the human species, no. As an acceptable element of risk to my own life, yes. You probably have done so too: air traffic control systems do have much more than ten thousand lines of specification. So do the systems that actually allow the pilots to fly the jets. So do hospital patient record systems, including records of medications to be administered.
> How can you ensure these specifications are consistent? How can you > ensure that they're effectively implemented?
I cannot. I could not even read most of the specifications if I wanted to, nor the source code of virtually any system upon which my life may depend. Even if I could compare them, most of them would require specific knowledge in domains that could take a decade to achieve the required level of background knowledge to adequately check their correctness.
> Wouldn't you be more able to understand and check the specifications > if they were shorter, that is indeed, given the ultimate limits to > compression, if what they specified was less complex or of a more > limited scope?
If that were possible. At some point reduction in length can only come with less comprehensibility, and beyond that there is a point where no reduction in length is possible at all.
> If you accept that big systems must be decomposed into small > programs,
I do not. Sometimes it is appropriate to decompose a big system into small programs, sometimes it does not help much. Sometimes it makes the problem worse by greatly increasing the complexity of interactions between programs.
> Then the degree of automatization in the process of translating the > specifications into executable code is only a matter of advancement > of the techniques, while the size of the executable code is only > (roughly) a function of the number of metaprogramming levels used.
Resource requirements for a given task can scale exponentially with the number of metaprogramming levels used, and frequently do. Also, the concept of "leaky abstraction" usually applies, becoming worse with every level added.
Note that most large real-world systems have enormous numbers of details that are both highly specific and necessary to correct function. They will not be present in the pre-existing programming environment because they are specific to the particular problem. They cannot be ignored because they are necessary. Hence they must all be represented in both the specification and resulting system source code, which irreducibly blows both their lengths well past ten thousand lines.
On Wednesday 04 November 2009 21:15, Tim Little wrote:
> On 2009-10-28, Pascal J. Bourguignon <p...@informatimago.com> wrote: >> ... However, assuming a big ontology (eg. take wikipedia, >> or even the whole web), wouldn't it be possible to express the needs >> for any software in less than ten thousands lines, and let the >> sufficiently smart system develop it, filling in the blanks in the >> specifications with all the knowledge it can extract from the web?
> You can do that: but now your program's correctness depends upon the > correctness of every detail in Wikipedia, or even the whole web, as > well as that of both the "sufficiently smart" system that extracts > information from it, and the ten thousand lines of specification.
> And yet, there will still be programs that cannot be specified in less > than ten thousand lines (unless the lines are of unbounded length).
Let me take another tack. Suppose I want to create a specification language that is so powerful that ANY program can be specified in less than 10 000 lines.
But wait, there are an infinite number of computer programs! If I limit my specification language to about 75 characters (upper and lower case letters plus numbers and punctuation), I can only write about 75^(10 000 * 80) = 10^1500049. That's a lot of program, but not infinite.
Ok, but I really don't care about specifying bizarre "programs" in my language. I'll settle for only specifying "reasonable" ones.
Hold on, how can I decide beforehand which programs are reasonable (that is, those I should be able to specify) and those that aren't??
I hope it is clear that specifications are no magic bullet.
Yes, I much prefer a short specific (in an "easily understood" language) to a long, possibly complex-for-decent-performance program. But not experience has shown that many perfectly reasonable tasks don't have succinct specifications nor programs.