A useful way to look at computer program is: there is data arranged in
certain ways and functions try to rearrange/modify it. When we are satisfied
with rearrangements and modifications, we stop. This is not something one writes in exam
and will be considered correct but it is a useful picture.
This
view of program laid emphasis on data vis-a-vis functions or algorithms. Often, functions are given primary
importance, and data is considered secondary. This is why students know more about
functions than they know about containers/iterators after doing their
first course in programming. Perhaps the real reason is that programming
had been a CS forte so far. CS has its own data-structure course
running in parallel along with "introduction to programming". Others
borrows more from 'introduction to programming' and very little form
'data-structures'. This can lead to an impression that data-structures
are not as important as functions. Perhaps this is not important when so
much can be done just with arrays (Matlab/Scilab/C).
Container is an abstract concept: it
holds data/data-structures. Merely holding data is not enough, we need
to specify how data should be arranged so that we can access data in
some definite way. Usually we don't know the exact location of a
particular element and we have to "travel the container" to find it. The
traveling part is accomplished using what is called
iterators. Think C++ where each
container (vector, list, set, map) comes with its own iterators.
These containers are traversed sequentially: second comes after first,
third after second etc. They allow us to store any type of data but we
can only arrange it "along a line" or "in a plane" . They are not
general enough to arrange data in any way we like.
Graphs can be
used to arrange data in any order we like (not just in line or in a
plane, but more). We usually attach data to nodes or edges. Edges
defines the arrangement pattern and can also define the direction in
which traveling is possible on them. This flexibility of arranging data
makes them very attractive. Their presence is rather ubiquitous in all
branches of natural sciences. All self-respecting programming languages
which claim to "compute" comes with a heavy graph support in its
libraries [1]. A good and lucid introduction to graph is "Graph theory:
with applications to engineering and computer science" by Narsing Deo.
In
C++, two prominent libraries are boost-graph and LEDA graph. LEDA came
out of academia and it was the standard before the advent of boost. LEDA
is not open-source anymore but a
free-version is available.
Compared to boost, it is easier to use and compiles faster. Its
documentation is great. Boost, on the other hand, in under constant
development and fairly advanced. It has more functionality than LEDA.
Unfortunately, boost is rather heavily dependent on advanced C++
(Templates, Meta-programming, etc), and it makes it rather frightening
to even a decent programmer of C++. If you plan to use graph in any of
your academic project, prefer LEDA. With such a heavy dose of templates,
the error message generated by compiler does not make much sense to
most of us. If you are comfortable with templates and
generic-programming in C++ then boost should be your first choice. Boost
documentation is relatively poor than LEDA.
Python has
networkx which is great for all practical purpose for graphs of modest size. It is pure-python which means it is very slow. Python
graph-tool is
another option. It is bindings to boost-graph and therefore it is as
fast as boost graph. And you get Python ease and expressiveness for
free. It is available in repositories of all major Linux and perhaps for
Windows too. If you are not into programming languages but want to
learn at least one for building stuff, Python can be a great choice.
To store graph, you can use
graphviz
format which is now extensively used. To visualize graph, you can use
graphviz again. Programs 'dot', 'neato' etc. can be used to turn a graph
file to a fine graph in various formats (png, eps, pdf, ps, jpg etc).
See
this collection
of large graphs stored and plotted using graphviz. Another option is to
store graph in graphml format which is XML. This might even help
drawing figures for report.
Knuth made a graph-library called 'Stanford Graph Base' in C.
[1] Graphs which are also trees have a well-defined recursive nature. With
some effort, directed acyclic graphs (DAG) can also be defined
recursively. But a general graph is too costly to define recursively. That
is why Graphs and functional programming languages, so
reputed to handle recursion in a very elegant manner, do not mix easily.
Dilawar