On 2015-10-04 05:54:21 +0000, IanD said:
> Hadoop is gaining everywhere
>
> I just think that VMS clusters forwarded VMS to a respectable position
> today, for it survive, it needs to go forward again
>
> I've seen some clusters with 10 nodes in them, no way to pool all those
> resources together is just a crying shame IMO
Ten is a trivially-small cluster. Configurations with thousands of
servers are common. Those are not OpenVMS clusters, obviously.
Interconnection speeds are inversely proportional to distances
involved, and adding to this problem, the OpenVMS-supported
interconnects are also comparatively old and slow. This means that
big SMP boxes and big OpenVMS-style DLM,
distributed-write-shared-storage clusters have overhead, and that the
overhead increases faster than the configurations can be scaled up.
Find a way to scale server computing linearly and generically, and
there's at least a PhD waiting for you.
Interestingly, your applications can run faster reading and writing to
servers across the network than to local disk storage, too. This
strictly with the hardware speeds, and before any consideration of the
glacial speed of OpenVMS file I/O in comparison to other platforms.
Hadoop <
http://hadoop.apache.org> is one of many packages that allows
better use of resources, and the easier and incremental addition of
servers, etc.
Other Apache projects, and which can be associated with Hadoop configurations:
Kafka distributed messaging
http://kafka.apache.org/
Flume log aggregation
http://flume.apache.org
Mesos distributed computing
http://mesos.apache.org
Spark data mining
http://spark.apache.org
Java 8 support will help with getting these and other Java and
JVM-based bits running on OpenVMS.
There's also OpenMP distributed multiprocessing, which ties into the
compilers and the run-time:
http://openmp.org LLVM has hooks, here.
Outside of cases where OpenVMS brings specific additional benefits to
the project, there's very little reason to use OpenVMS for Hadoop or to
add OpenVMS to a Hadoop configuration, as there are pricing and support
and server management disincentives to incorporating OpenVMS boxes into
Hadoop configurations.
My own as well as the more general fondness for technologies and tools
here in the comp.os.vms newsgroup aside, it's the financials and the
staffing costs and the associated efforts to consolidate onto fewer
platforms and fewer tools and fewer applications that drive more than a
few of the decisions. That, and how much work you can get done, given
the available too-low staffing. TCO is nice, but any discussion of
TCO that omits the application costs, the tools, management interfaces
and staffing costs is incomplete. That there aren't enough OpenVMS
folks available have also been reported, but that's arguably a
statement that the necessary staff for the competitive platforms are
cheaper. Maintaining and lowering the fixed costs — of which salaries
are part — is basic business operations, after all. In short, the
finances have to work, or the tech is irrelevant.
> Concepts like bitcoin's blockchain could add extra security to VMS and
> be at the forefront. Banks are already getting together to work on
> bitcoin technology (I'll leave my dislike of that they actually want to
> do aside for now). The blockchain could take VMS security one step
> further than just encryption
Blockchains target distributed consensus-based transactional integrity
and the associated detection of tampering (or however the Blockchain
folks phrase that statement), and not "traditional" data security.
Blockchains can certainly be applicable for certain sorts of
transaction logs and for security logs in computing, which are cases
analogous to what the cryptocurrencies use the technique for.
AFAICT, blockchains also get rather hairy across multiple files — each
file likely ends up with a blockchain, and now you have to figure out
how to roll back across multiple files and across multiple chains, and
quite possibly somebody will decide to hang a blockchain off the
application's own activity.
(Dealing with blockchain rollover is an issue that would have to be
more cleanly handled as otherwise that data grows without bound. This
because a busy transaction log on a busy server can grow extremely
quickly. But I digress.)
(Blockchains and data encryption don't mix all that well, AFAICT, and
particularly with data that must be kept private — re-encrypting
encrypted data with upgraded ciphers would corrupt the blockchain — and
with data that must be expunged for business or legal reasons. But I
digress again.)
Other mechanisms that have been used with OpenVMS for this general goal
are write-once tape drive cartridges and write-once optical media.
Both are readily available, for those that need this.
The general benefits of blockchains for other sorts of files and for
other server environments is rather less clear (to me). That there'll
be added overhead is clear.
Implementing anew or porting libbitcoin or analogous would probably get
folks most of what they need, and (just) for the files that they want
and need blockchains. That, and getting a semi-modern certificate
store and related baggage implemented on OpenVMS, too.
Transactional and data integrity against accidental or intentional data
corruptions is nice and useful, but recoverability and rollback is key
to many operations; to the recovery from these corruptions or hacks.
Blockchains can help identify this, but somebody then has to implement
the rollbacks, and that can and will be implementation- and
application-specific work.
If you're doing banking or financials, then blockchains can help. If
you're after transactional integrity and recovery, there are other
options and alternatives to blockchains. Some generic examples of
alternatives: Apple Time Machine and other tools all target integrity
and the ability to restore, with some definite limits. OpenVMS
expects the system manager to establish archiving locally and entirely
site-specific, often with bespoke, artisanal DCL procedures created
from the finest field-grown organic bits. DVCS packages such as git
and mercurial can also be used for integrity and restoration. The
Rational ClearCase VOBs — a virtual disk volume that instantiates a
particular software baselevel or particular release — are a solution to
working with "dumb" tools.
OpenVMS has a good 1980s-style we-have-a-backup-window backup
mechanism. Many sites lack that window and are either moving to
databases with online backup capabilities, are quiescing and splitting
off HBVS volumes, or (less desirably and less reliably) are betting
that BACKUP /IGNORE=INTERLOCK or the entirely analogous
controller-level data replication mechanisms will be restorable.
OpenVMS itself is also still stupidly* solving the same problems within
its own components and tools using component-specific punched-card
record layouts and not databases, too — 1980s-style RMS record storage
only gets you so far, and the inevitable data changes and additions are
a pain to design and deploy incrementally and — in the absence of RMS
journaling or analogous — transactional integrity (this integrity
against partially-completed update I/O sequences and crashes and data
corruptions, and which is not really what blockchains help with or
recover from) is questionable.
As was stated repeatedly at the 2015 Boot Camp, the VSI priority is the
x86-64 port, and — with very few exceptions — everything else waits for
that work, and that work specifically targets server-oriented
computing. VSI is targeting the installed base. Whether in five or
ten years, and after the port is available and stable and in use, they
can provide new features and tools to then start picking up enough new
applications and new installations to at least offset the retirements
of existing applications and environments, and preferably to increase
the installed base? Some few additions were mentioned as semi-distant
— my words, not VSI — future possibilities, but that's after the x86-64
port is available.
——
*the current strategy is wise in the short term, and utterly foolhardy
in the medium and long term.
--
Pure Personal Opinion | HoffmanLabs LLC