Analysis Of Algorithm Pdf

0 views

Skip to first unread message

Griselda Hamiter

unread,

Aug 4, 2024, 4:17:33 PM8/4/24

to breedoverov

Theterm "analysis of algorithms" was coined by Donald Knuth.[1] Algorithm analysis is an important part of a broader computational complexity theory, which provides theoretical estimates for the resources needed by any algorithm which solves a given computational problem. These estimates provide an insight into reasonable directions of search for efficient algorithms.

In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input. Big O notation, Big-omega notation and Big-theta notation are used to this end.[2] For instance, binary search is said to run in a number of steps proportional to the logarithm of the size n of the sorted list being searched, or in O(log n), colloquially "in logarithmic time". Usually asymptotic estimates are used because different implementations of the same algorithm may differ in efficiency. However the efficiencies of any two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor called a hidden constant.

Exact (not asymptotic) measures of efficiency can sometimes be computed but they usually require certain assumptions concerning the particular implementation of the algorithm, called a model of computation. A model of computation may be defined in terms of an abstract computer, e.g. Turing machine, and/or by postulating that certain operations are executed in unit time.For example, if the sorted list to which we apply binary search has n elements, and we can guarantee that each lookup of an element in the list can be done in unit time, then at most log2(n) + 1 time units are needed to return an answer.

Time efficiency estimates depend on what we define to be a step. For the analysis to correspond usefully to the actual run-time, the time required to perform a step must be guaranteed to be bounded above by a constant. One must be careful here; for instance, some analyses count an addition of two numbers as one step. This assumption may not be warranted in certain contexts. For example, if the numbers involved in a computation may be arbitrarily large, the time required by a single addition can no longer be assumed to be constant.

A key point which is often overlooked is that published lower bounds for problems are often given for a model of computation that is more restricted than the set of operations that you could use in practice and therefore there are algorithms that are faster than what would naively be thought possible.[8]

Run-time analysis is a theoretical classification that estimates and anticipates the increase in running time (or run-time or execution time) of an algorithm as its input size (usually denoted as n) increases. Run-time efficiency is a topic of great interest in computer science: A program can take seconds, hours, or even years to finish executing, depending on which algorithm it implements. While software profiling techniques can be used to measure an algorithm's run-time in practice, they cannot provide timing data for all infinitely many possible inputs; the latter can only be achieved by the theoretical methods of run-time analysis.

Since algorithms are platform-independent (i.e. a given algorithm can be implemented in an arbitrary programming language on an arbitrary computer running an arbitrary operating system), there are additional significant drawbacks to using an empirical approach to gauge the comparative performance of a given set of algorithms.

Take as an example a program that looks up a specific entry in a sorted list of size n. Suppose this program were implemented on Computer A, a state-of-the-art machine, using a linear search algorithm, and on Computer B, a much slower machine, using a binary search algorithm. Benchmark testing on the two computers running their respective programs might look something like the following:

Based on these metrics, it would be easy to jump to the conclusion that Computer A is running an algorithm that is far superior in efficiency to that of Computer B. However, if the size of the input-list is increased to a sufficient number, that conclusion is dramatically demonstrated to be in error:

Computer A, running the linear search program, exhibits a linear growth rate. The program's run-time is directly proportional to its input size. Doubling the input size doubles the run-time, quadrupling the input size quadruples the run-time, and so forth. On the other hand, Computer B, running the binary search program, exhibits a logarithmic growth rate. Quadrupling the input size only increases the run-time by a constant amount (in this example, 50,000 ns). Even though Computer A is ostensibly a faster machine, Computer B will inevitably surpass Computer A in run-time because it is running an algorithm with a much slower growth rate.

It is clearly seen that the first algorithm exhibits a linear order of growth indeed following the power rule. The empirical values for the second one are diminishing rapidly, suggesting it follows another rule of growth and in any case has much lower local orders of growth (and improving further still), empirically, than the first one.

The run-time complexity for the worst-case scenario of a given algorithm can sometimes be evaluated by examining the structure of the algorithm and making some simplifying assumptions. Consider the following pseudocode:

A given computer will take a discrete amount of time to execute each of the instructions involved with carrying out this algorithm. Say that the actions carried out in step 1 are considered to consume time at most T1, step 2 uses time at most T2, and so forth.

In the algorithm above, steps 1, 2 and 7 will only be run once. For a worst-case evaluation, it should be assumed that step 3 will be run as well. Thus the total amount of time to run steps 1-3 and step 7 is:

A more elegant approach to analyzing this algorithm would be to declare that [T1..T7] are all equal to one unit of time, in a system of units chosen so that one unit is greater than or equal to the actual times for these steps. This would mean that the algorithm's run-time breaks down as follows:[12]

The methodology of run-time analysis can also be utilized for predicting other growth rates, such as consumption of memory space. As an example, consider the following pseudocode which manages and reallocates memory usage by a program based on the size of a file which that program manages:

In this instance, as the file size n increases, memory will be consumed at an exponential growth rate, which is order O(2n). This is an extremely rapid and most likely unmanageable growth rate for consumption of memory resources.

Algorithm analysis is important in practice because the accidental or unintentional use of an inefficient algorithm can significantly impact system performance. In time-sensitive applications, an algorithm taking too long to run can render its results outdated or useless. An inefficient algorithm can also end up requiring an uneconomical amount of computing power or storage in order to run, again rendering it practically useless.

Analysis of algorithms typically focuses on the asymptotic performance, particularly at the elementary level, but in practical applications constant factors are important, and real-world data is in practice always limited in size. The limit is typically the size of addressable memory, so on 32-bit machines 232 = 4 GiB (greater if segmented memory is used) and on 64-bit machines 264 = 16 EiB. Thus given a limited size, an order of growth (time or space) can be replaced by a constant factor, and in this sense all practical algorithms are O(1) for a large enough constant, or for small enough data.

The Multi-Mission Algorithm and Analysis Platform (MAAP) is a joint effort between NASA and ESA (European Space Agency) designed to support collaborative research. MAAP brings together relevant data, algorithms, and computing capabilities in a common cloud environment to address the challenges of sharing and processing data from field, airborne, and satellite measurements related to NASA, ESA, and other space agency missions.

Data from NASA and ESA biomass missions such as AfriSAR and the Global Ecosystem Dynamics Investigation (GEDI) mission are serving as the initial test set for MAAP. Future missions such as ESA's BIOMASS and the joint NASA/Indian Space Research Organization (ISRO) NISAR will gather unprecedented data about Earth, but will also generate exponentially higher data volumes than any currently operating Earth observation missions. The high volume and heterogeneous nature of the data from these missions (which are collected from satellites, aircraft, and ground stations at various resolutions, coverages, and processing levels) present data storage, processing, and sharing challenges. These challenges present the scientific community with a unique and immediate need for improved data sharing and collaboration strategies.

In order to make biomass relevant data more discoverable and usable, NASA and ESA are working together to make data and metadata more interoperable across organizations. The MAAP platform is being developed as a virtual open and collaborative environment, leveraging cloud technologies to facilitate open data use across aggregated datasets.

Interoperability requires a new approach to data stewardship. The MAAP data team has explored new methods of metadata curation to meet the needs of the Earth observation research community. The data publication workflow enables traditional data publication and data aggregation activities such as organizing and storing data and developing metadata, then makes these data discoverable to users in a centralized, cloud-based location.

Metadata are collected and published to the Common Metadata Repository (CMR) via the Metadata Management Tool (MMT). The MAAP data team also has developed tools to support data stewardship and has created a process that is flexible, collaborative, and adaptive to support more effective data discovery and use.