Optimization Techniques In Engineering Pdf

0 views

Skip to first unread message

Barton Ostby

unread,

Aug 4, 2024, 5:05:17 PM8/4/24

to slobbuetingti

Myquestion: Are those practices myths or reality? In the context of an application with heavy performance criteria (Like High Frequency Trading Software) could those practices supersede "good development" practices?

In my experience exactly the opposite. Compilers tend to have problems creating good code for huge functions. I've had examples where execution time was really important to me, and I got considerable measurable speedup by extracting small loops into separate functions - code generated for the same time critical loop was better when it was in a separate function. (What actually happened was that a simple loop had all variables in registers, while the same loop inside a large function used stack memory for the same variables).

PS. I just find it a bit disturbing seeing all these suggestions that are not very likely to speed up your code, but make it a lot less readable and maintainable. What about a simple trick that any C++ developer should know: Say you have a standard vector containing instances of class T. Now compare these two loops:

The first loop causes a copy of every single item in the vector to be constructed and destructed; two potentially expensive operations. The second loop does no such thing at all. That single character "&" can make a loop 100 times faster under the right circumstances.

Not specific to C++ but one of the most common misconceptions I see is the idea that things will be faster if you pull all your data together before moving to the next step of your algorithm. This manifests itself in a number of ways. For example:

These approaches are disastrous from an performance perspective and they can be challenging to eliminate once in place. These designs often escape the abstraction boundaries and therefore it often requires a major re-write to move to a more efficient solution. For example, if you are generating a large document for another application to consume, you can't simply break it up into smaller pieces without modifying the downstream application. Or if your application is built to expect an array or other type with a known size, its not always trivial to replace that with an iterator.

As I remember correctly, each of these techniques had their justification around 20 to 30 years ago on slow hardware, especially for some not-so-sophisticated compilers or interpreters, in cases where even micro-optimization could bring a required performance gain.

Today, using modern compilers for languages like C++, Java or C#, I think there is not one of them where the examples listed in the question will bring a speed benefit in optimized code, not even in hot-spots. In some older, but still used language-implementations like VBA, some of those techniques may still be valid - sometimes.

Even today there exist a lot of small embedded devices which don't have huge CPU power and which are usually programmed in C. I am not actually sure about the optimization quality of the C compilers for those platforms, but I can imagine that there may be cases where some of those optimization techniques you mentioned still matter.

The next step is looking for work that you do repeatedly for no good reason. Like sorting an array repeatedly. Calculating the same dataset repeatedly. Reading the same data from a database multiple times, and so on. Avoiding that can give you massive savings.

This completely neglects the idea of the common-case and rare case branches of code. People won't get far neglecting this difference. There are factors to consider like the increased cost of more distant jumps and icache misses.

There is an unusual and not-so-commonly-cited case where inlining a somewhat large function can result in improved performance, but it has to do with constant propagation and dead code elimination more than calling overhead or its interference with register allocation. Take an example like this:

... will fail to eliminate the branching overhead of the if' statements within the function. That's a bit unfortunate and there are speed-ups to be gained here if we just brute force inline the function, but ideally, we just need 3 versions of the function: one with cull with a known compile-time value of true, another with false, and another with a cull value that can only be deduced at runtime (l-value). To inline every single call to the function is rather brutish and bloated in terms of code generation but maybe the best practical option we have in C in response to a hotspot.

The only practical overhead to an object that applies in a language like C++ is what the programmer introduces. Naturally, there will be some if you introduce virtual functions in the form of a vptr and virtual dispatch, but there is no practical overhead to an object that you don't add yourself as with the case of Java or C#. That said, I have written many posts on here about trapping ourselves in performance-critical cases by designing our objects too granularly, like trying to represent a pixel of an image as an object, or a particle of a particle system as an object. That has nothing to do with compilers or optimizers but human design. If you have a system that has a boatload of dependencies to a teeny Pixel object, there is no breathing room to change that to, say, loopy SIMD algorithms without extremely invasive changes to the codebase by modeling things at such a granular level. For very loopy performance-critical stuff, it does help to avoid dependencies to teeny objects storing very little data, but not because of some object overhead. It's a human overhead of trapping yourself to a design that leaves little room for bulky, meaty optimizations. You don't leave much room for yourself to optimize an Image operation if the majority of your code depends on working with a Pixel object. You'll work yourself towards needing to rewrite your entire engine if you have so many dependencies to teeny objects that interfere with your ability to make broad, meaningful performance improvements without rewriting almost everything.

Variables don't take any memory or resources. I keep repeating myself here like a broken record but variables don't require memory. Operations require memory. If you add x+y+z, the computer can only perform scalar additions (excluding SIMD) on two operands at a time. So it can only do x+y and it needs to remember the sum to add z to it. This is what takes memory and I've found this so widely misconceived. Variables don't take memory. Variables are just human-friendly handles to the memory that results from operations like these. Understanding this is key to really understanding how our compilers and optimizers work so that we can better understand the results from our profilers.

That said, there are some cases where reusing an object can net a performance improvement. For example, in C++, if you use std::vector to store small, temporary results for each iteration of a loop with a million iterations, you can see substantial performance improvements hoisting the vector out of the loop and clearing it and reusing it across iterations. That's because defining a vector object is much more than a variable declaration: it involves initializing the vector, and subsequent push_backs and resize and the like will involve heap allocations and possibly a linear-time copy construction algorithm (or something akin to a memcpy with trivially copy-constructible types).

This overhead works towards zilch if you use a tiny_vector or small_vector implementation which uses a small buffer optimization in loops where you don't exceed the size of the small buffer. Unfortunately, the standard library doesn't offer SBOs for std::vector (although oddly enough they prioritized making std::string use one in C++11+). But forget the idea that variables have overhead. They have none. It's the operations involved with constructing the vector and inserting elements into it which have an overhead in this case.

A quick glance at disassembly should show no difference here unless you're using complex objects, and by "complex", I mean beyond a simple random-access iterator (maybe something which allocates memory per iteration that the compiler failed to optimize away). 99.9% of the time, there should be no difference for most people. That said, I have never understood stubborn C++ programmers who refuse to use prefix notation in favor of postfix in places where it makes no difference when the prefix notation is guaranteed to be as fast or faster than postfix. Still, the stubborn ones are very rarely causing inefficiencies with their stubbornness (at least from a runtime standpoint -- maybe we could still save the compiler some extra work though and reduce build times by a small amount favoring ++i across the board, especially if we involve UDTs).

If you make a decent profiler a best friend of yours, you'll see first-hand a lot of cases where the general rules of thumbs are mostly right and where they're mostly misleading. I recommend that. Make a profiler your best friend if you haven't already. It is definitely the case that a lot of things passed off as "general wisdom" are misleading, and some which definitely are not and good advice. But the key to thinking critically and telling the difference comes from measuring with a good tool that breaks things down for you in detail.

Application of computational optimization techniques to constrained engineering design. Theory and application of gradient-based and gradient-free nonlinear algorithms for unconstrained and constrained problems. Robust design methods.

In the real world, there are many problems in which it is desirable to optimize one or more objective functions at the same time.These are known as single and multi-objective optimization problems respectively and continuous research is being conductedin this field and nature inspired heuristic optimization methods (also called advanced optimization algorithms) are proving to bebetter than the classical deterministic methods and thus are widely used. These algorithms have been applied to many engineeringoptimization problems and proved effective for solving some specific kinds of problems. In this paper, a review of the most popularoptimization algorithms used in different problems related to the civil engineering during the last two decades is presented. It ishoped that this work will be useful to researchers involved in optimization.