+1 for this. I believe I initiated some of that back and forth
yesterday on twitter ;-)
I was asking if people use ohai (or facter as an alternative) as
standalone tools and not part of chef or puppet, in order to glean
information about their servers.
Grig
-Isaac
I am currently using facter with a custom agent (we may just consolidate with puppet) in addition to puppet to report facts about systems up in to a central inventory system.
-Isaac
On Nov 3, 2010, at 8:10 AM, Grig Gheorghiu wrote:
I am also unsure what you mean by silos in tools ? Both facter and ohai
are great tools. Some have pointed out that ohai provides more detail which
is a fair point. Some have also pointed out that ohai outputs JSON and
facter doesn't but that is more of a presentation issue then anything else.
It would be trivial to write a wrapper to massage facter output into
something that looks very much like ohai output.
Call me confused.
Vladimir
I take 'silos' to mean that they're both doing essentially the same
thing, but there is no common way to interpret their output (which is
much more terse for facter than it is for ohai).
Ideally there would be some JSON structure with well-defined key names
(such as "machine", "os", etc) and a tool which would run the plugin
of your choice (facter, ohai, maybe others) and output the
correctly-formatted JSON. Then other tools could consume that output
and store it in the NoSQL engine of your choice ;-)
Grig
Scott McCarty wrote:
> +1, I would also love to see a standard form
>
We (Puppet Labs) would happily support a standard format. It's
something Luke and I have discussed a bit in the past.
Regards
James Turnbull
- --
Author of:
* Pro Linux Systems Administration
(http://www.amazon.com/gp/product/1430219122/)
* Pulling Strings with Puppet
(http://www.amazon.com/gp/product/1590599780/)
* Pro Nagios 2.0
(http://www.amazon.com/gp/product/1590596099/)
* Hardening Linux
(http://www.amazon.com/gp/product/1590594444/)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEVAwUBTNGNFSFa/lDkFHAyAQJ2+QgAxtEw4ZYwI4lMSd+Xqp/hzqeZ908aGzNJ
tFz6hOxOiY6Y1Y3maraBCQMzEmwemxchYWNHHUbOvCfZpN1YT/Or/kN6hTzvdqL3
7EoBKfHU5oItoaqlZG3QO0QxsFjXie1OJIrRctAAjT0l82Dmy8k9W5+0K4Am6KW/
/Wnxrdgg/bBzB9Jl6K7YZQ+3ZoJSFlXAnazZ6OdtboZMkMUN2DMU9Y7FrrwHZmc0
nmRPbPjvZSK8eX6MMUJ8ZBb3LTkC2NvLK8TGhEIJGB6tCkyz225KyWEJTndb3NF4
fF1jO5l81/mMvHRgb2bO6bsEWW4n/oHp8wym838d9QS/7OD6X3QD/w==
=22Kp
-----END PGP SIGNATURE-----
John E. Vincent wrote:
> Grig beat me to it but look at it from the perspective of someone
> who's writing tools to interact with the data. Admittedly, it's not
> too hard now to support both but as newer products come on the scene
> it risks getting unweildy. I could see a world where Chef gets it's
> information from Facter and Puppet speaks to Ohai.
I don't see that as an unfeasible goal. I draw the requirements into:
1. Data interchange
2. Data emission
2. is a solved problem. If you want Facter facts for example in YAML
you can. Easy to add a JSON output also (if someone wants to send a
patch supporting JSON to us that'd be awesome BTW -
http://projects.puppetlabs.com/issues/5193). Ditto on Ohai and others.
1. is trickier but still do-able I think. Rather than thinking about it
just as data formats it might be better to consider it as an API and a
format, perhaps:
http://datasource/data/network/interface
Then you don't care how the data is stored internally as long as you can
query it.
>
> Personally I have no concern which format tools use internally but I'd
> love to simply have a way to get information from both systems in a
> standard accepted format. It could be as simple as a command line arg
> to facter and ohai or exposing it over a REST interface.
+1
>
> What I don't want to do right now is implementation details. I'm
> really just trying to guage interest in the idea and possibly draft a
> first round for describing one component - i.e. system - and what the
> community thinks are the basic bits of information needed to describe
> 'system'. From there, i could see other objects like
> 'network' (possibly too abstract) or better yet 'application' (i.e.
> has a version, a name, a path, whatever).
Count us in - myself and Nigel Kersten would be happy to be involved.
>
> Maybe it's pie in the sky. Most attempts and a standardized language
> fail miserably.
True but will guaranteed fail if no one starts it. :)
Regards
James
- --
Author of:
* Pro Linux Systems Administration
(http://www.amazon.com/gp/product/1430219122/)
* Pulling Strings with Puppet
(http://www.amazon.com/gp/product/1590599780/)
* Pro Nagios 2.0
(http://www.amazon.com/gp/product/1590596099/)
* Hardening Linux
(http://www.amazon.com/gp/product/1590594444/)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEVAwUBTNGTHiFa/lDkFHAyAQJvqQgA4QkfoSdb51RraZNihcC3gWO+s1Qfs7Pg
3XqQgT53Q4iMqSRqfFAysCHfg33nH++1iruH8nlLkVerkJv6FB8i+4GJTBOowau8
kzkc4uYhxNlr/vMq//mQa5zbBHlNUNnhcziMdv32V0v0fYxV0DZnO3sd7BA1GGOu
Cc65+s+kY2KSdRZRV7nuY2GxO4GFmZjXVzZgOml7XsfB1naHoMrNVZI7tpdEdY8i
d5as6XDLv3beXS9384zGDWbwoRMk1NHV440awvWEUz0IHQpeWScVIO1QPfwDqrSX
n98y37xShyZZB+PhOSsz9yT8JMkrDmr85d2TU7Je9PGKrm7W6wQESA==
=+qDA
-----END PGP SIGNATURE-----
I'd love to see this happen.
There are better areas we can all innovate in compared to data collection.
I should also note that we've started the early stages of organizing a
FOSDEM[1] devroom around config management, and so far the projects
who have expressed interest are Puppet, Chef, cfengine, bcfg2.
More involvement is always welcome from other config management
projects, and I'd be overjoyed if we had something concrete to
coalesce around at FOSDEM with regard to common metadata.
Nigel
[1] - http://fosdem.org/2011/
Note that you could pretty easily replace Facter with Ohai or whatever else in most of Puppet today - it's accessed through a plugin interface that's pretty trivial to replace:
https://github.com/puppetlabs/puppet/blob/master/lib/puppet/indirector/facts/facter.rb
Just write an 'ohai.rb' or equivalent, and set 'facts_terminus = ohai' and it should work.
Today you'd still need Facter for some framework-level pieces (e.g., seeing if providers are suitable) but at least the data being sent to the server would now be from Ohai.
Of course, the whole point of Facter was that people wouldn't have to write another one of these tools, but since they were written anyway, we'd like to do what we can to make it easier to swap them out. That does feel like going the wrong direction, though.
--
I don't want any yes-men around me. I want everybody to tell me the
truth even if it costs them their jobs. -- Samuel Goldwyn
---------------------------------------------------------------------
Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199
Apologies for top posting in advance.
Let me clarify that that sort of swapping isn't my intention. My goal is simply to decide on a possible standard format for getting that information from the system.
I just used that as an example. For instance, in vogeler I'm working on a way to get baseline information from a system. Right now that consists of shelling out and either running ohai or facter and parsing the output. I'd love to have an api (don't need full blown rpc) that provides an agreed upon subset of configuration data in an agreed upon format.
Sent from my Droid. Please excuse and spelling or grammar mistakes.
Apologies for top posting in advance.
Let me clarify that that sort of swapping isn't my intention. My goal is simply to decide on a possible standard format for getting that information from the system.
I just used that as an example. For instance, in vogeler I'm working on a way to get baseline information from a system. Right now that consists of shelling out and either running ohai or facter and parsing the output. I'd love to have an api (don't need full blown rpc) that provides an agreed upon subset of configuration data in an agreed upon format.
--
Of the thirty-six ways of avoiding disaster, running away is best.
-- Chinese Proverb
Not at the level of a single system, if you're really paying
attention. Puppet, Chef, Cfengine, Bcfg2, they all install packages on
RedHat the same way, with roughly the same amount of characters
involved in the process.
> So to answer the last question/statement, maybe we're getting caught
> up in semantics. Or maybe I'm projecting my personal wants and it's
> really NOT an issue for most people. That was part of what I wanted to
> ask originally.
>
> So maybe a common "dictionary" is what I'm wanting. Everyone agrees
> that a "host" is made up of a hostname, an ip address and an operating
> system. Or maybe we throw out operating system. I don't want to over-
> complicate it.
What I hear you asking for is a common data format for automatically
discovered data about systems. For example, what does "ipaddress"
mean in Ohai? What does it mean in Facter? In Ohai, it means the IP
address of the interface that has the default route configured on it.
(That works most of the time, except when it doesn't, and you can
override it if you had to.) It also means "the one I want to use most
often if I have more than one", but that's a human meaning.
I think getting agreement on "how you determine" will be much harder
than "how I find it". For example:
{ "ipaddress": "127.0.0.1" }
Makes sense to everyone. If we all agree on that, poof, puff the
magic standard.
We can knock a large number of these out really easily - many of the
top-level Ohai keys are the same as top-level Facter keys, and I think
you could basically call a v1 of something like this done simply by
identifying where they overlap.
Now things get more complicated as you get deeper into the system.
For example, Ohai displays lots of data about your file systems:
"filesystem": {
"/dev/disk0s2": {
"block_size": 512,
"kb_size": 211812352,
"kb_used": 194387604,
"kb_available": 17168748,
"percent_used": "92%",
"mount": "/",
"fs_type": "hfs",
"mount_options": [
"local"
]
},
...
}
On my laptop, Facter collects none of that data. Assuming it would be
useful for them to have it, would they use my hash? :) Did we mess it
up and need to change a key?
Similarly, for data that does overlap, take network interfaces:
Facter:
interfaces => lo0,gif0,stf0,en1,fw0,en0,cscotun0,en2,en3,vboxnet0,vmnet1,vmnet8
ipaddress_en2 => 10.37.129.2
Chef:
"network": {
"interfaces": {
"lo0": {
...
}
}
}
If you wanted the equivalent of the interfaces key in Facter in Ohai,
you would do:
ruby-1.9.2-p0 > o[:network][:interfaces].keys
=> ["lo0", "gif0", "stf0", "en1", "fw0", "en0", "cscotun0", "en2",
"en3", "vboxnet0", "vmnet1", "vmnet8"]
So outside of solving this problem on this thread (and I really don't
want to try), you see kind of the details about what you're asking.
Getting to a place where we all agree on how to discover the data
would be the hardest thing - getting to a place where we agree on what
the data structure looks like under the hood at the basic level
("feels like a hash of hashes to me, bob") is easy, and in the middle
is negotiation about keyspaces.
I think there is value in the keyspace negotiation.
I think there may be one "easy" path, which would be for us to patch
Ohai to have a Facter compatible mode that spits out the flat keyspace
- but that's a compatibility thing, not an "easier for the end user"
thing.
> Honestly the OS is becoming a commodity at this point anyway. I should
> really be creating SHA1 fingerprints of my infrastructure components
> based on capabilities. Is this Ubuntu host capable of running Apache
> 2.2 with these given DSOs and does it have two nics? Yes? Then its
> fingerprint is the same as this RHEL5 host over here. When we're
> operating at the scale that virtualization allows, I don't have time
> to be concerned with some of the lower level stuff. It's the same as
> the difference between troubleshooting an OS problem or saying "Screw
> it. I don't have time for this. Kick the blade and get it back in
> service"
Neat idea.
> As I said, please don't take anything I've said as an indictment
> against any particular tool or vendor. I've used almost all the tools
> out there over the last 15 years and each one has a special place in
> my heart ;)
You're clearly totally evil and partisan, man.
Adam
--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: ad...@opscode.com
Enthusiastic +1 on this v1 ;-)
> So outside of solving this problem on this thread (and I really don't
> want to try), you see kind of the details about what you're asking.
> Getting to a place where we all agree on how to discover the data
> would be the hardest thing - getting to a place where we agree on what
> the data structure looks like under the hood at the basic level
> ("feels like a hash of hashes to me, bob") is easy, and in the middle
> is negotiation about keyspaces.
>
> I think there is value in the keyspace negotiation.
>
So let's start the negotiation!
It would be very good if we got to the point where asking for a
certain key value was as easy as doing a query with curl on an EC2
instance against http://169.254.169.254/latest/meta-data/
Here's the keys that are spit out:
# curl -s http://169.254.169.254/latest/meta-data/
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
hostname
instance-action
instance-id
instance-type
kernel-id
local-hostname
local-ipv4
placement/
public-hostname
public-ipv4
public-keys/
ramdisk-id
reservation-id
If you query for a specific key you get its value:
# curl -s http://169.254.169.254/latest/meta-data/local-ipv4
10.209.177.142
Grig
Well, SNMP certainly has a standard for getting some of this data over the network. There are a few reasons none of the tools use it for discovery:A. They don't need the protocol/service overhead - they just wanted a library.
B. Publishing extensions properly would require an OID assignment from IANA, an a tracking process. Making adding custom data cumbersome.
C. For deeply nested or complex data an snmp walk is pretty inefficient.
It's more likely you would see a plugin for SNMP to publish facter/ohai data than you would see facter/ohai replaced with snmp discovery.Best,Adam
On Nov 4, 10:07 am, Scott McCarty <scott.mcca...@gmail.com> wrote:Correlation of SNMP indexes with actual important data is a pain in
> Am I missing something or was much of this solved by SNMP? I understand that
> things like package lists, and deeper OS stuff are not solved, but wouldn't
> it be prudent to try and expand upon the UC Davis MIBS for this?
>
> Scott M
>
the ass. I have to do three lookups to correlate an interface with an
ip address at minimum. Maybe two. I'm kind of over SNMP at this point
except where I HAVE to use it. Interestingly enough installed software
IS enumerated in a MIB already. I don't remember which one it is
offhand though and it doesn't address unmanaged software for obvious
reasons.
You can absolutely make SNMP work just fine. But it's fundamentally a
network discovery protocol, and we're not doing network discovery.
I'm not saying it's not nice to have, but really, the right
relationship between tools like Ohai and SNMP is as a data source for
exposure via SNMP. (Or, if you want, gathering data *from* SNMP and
presenting it)
> For real time stuff like knife or Mcollective, I am sympathetic.
Good! :) Because there is a reason we keep writing it.
> I just hear a lot fhe same problems coming up as with SNMP and it feels like
> reinventing a lot of the wheel again because SNMP is too esoteric (it annoys
> me too, but it might be wrapped).
It's not because SNMP is too esoteric, it's because SNMP solves a
different problem.
> The other thing, is when dealing with network devices, none of this stuff is
> going to be on a cisco router?
Then of course we will do discovery with the means at hand.
Gonéri, I'm putting together a combined configuration management
devroom at FOSDEM where it looks like we may have input from Puppet,
Chef, cfengine, OpenQRM and bcfg2.
I think devoting some of that time to interoperability between
inventory systems could be awesomely productive for us all.
This is a great idea, thank you for the invitation. We can also do a
little presentation of GLPI
and FusionInventory if you want.
--
Gonéri Le Bouder
> Okay. I've taken a swipe at what I think makes sense as a first draft
> of some basic system data:
>
> https://gist.github.com/712574
>
> I want to clarify some things going through my head when I crafted
> that:
>
> 1) Out with the old
> This means making some assumptions about modern hardware (by modern I
> mean the last 5 or so years). This means that memory is specified in
> MB and physical disk is specified in GB.
> 2) Try NOT to nest too deep
> This was REALLY hard to do when it came to disks and network but I
> think I got it to something useful. You might wonder about the format
> but I tried to make it easy to get to the relevant information. It's a
> bit more work to ensure that you preserve order but it keeps the
> structure rather shallow. I can easily grab the count of disks/
> interfaces and use that as the positional index to get specifics about
> a given interface.
...I would probably lean toward an array-of-hash data for disks and
interfaces, because it keeps facts about the same object grouped in a single
data structure.
That makes it easier to iterate over them without needing to pass a
potentially large number of data structures around, or pack them myself.
Also, your network facts are ... interesting. How do you represent this
machine in your proposed structure? (Also, keep in mind that this is the
*simplified* version of this, because we had to make an emergency rollout
without actually putting in the HA links to the second system, or the support
for multiple paths up to our data center. Those should land in the next few
days. :)
Oh, and please note eth2 - multiple IP addresess, but none of the usual label
aliasing for it. Plus, I omitted a bunch of essentially duplicate interfaces
that had already been seen.
root@fitz-fw01:~# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:15:17:f4:2e:69 brd ff:ff:ff:ff:ff:ff
3: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 100
link/ether 00:15:17:f4:2e:68 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.1/24 brd 192.168.2.255 scope global eth4
inet6 fe80::215:17ff:fef4:2e68/64 scope link
valid_lft forever preferred_lft forever
[...]
5: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 100
link/ether 00:15:17:f4:2e:6a brd ff:ff:ff:ff:ff:ff
inet 203.214.67.82/29 brd 203.214.67.87 scope global eth2
inet 203.214.67.83/29 brd 203.214.67.87 scope global secondary eth2
inet 203.214.67.84/29 brd 203.214.67.87 scope global secondary eth2
inet 203.214.67.85/29 brd 203.214.67.87 scope global secondary eth2
inet 203.214.67.86/29 brd 203.214.67.87 scope global secondary eth2
inet6 fe80::215:17ff:fef4:2e6a/64 scope link
valid_lft forever preferred_lft forever
[...]
8: vlan1@eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:15:17:f4:2e:6b brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/24 brd 192.168.1.255 scope global vlan1
inet6 fe80::215:17ff:fef4:2e6b/64 scope link
valid_lft forever preferred_lft forever
9: vlan101@eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:15:17:f4:2e:6b brd ff:ff:ff:ff:ff:ff
inet 192.168.254.4/24 brd 192.168.254.255 scope global vlan101
inet6 fe80::215:17ff:fef4:2e6b/64 scope link
valid_lft forever preferred_lft forever
10: vlan201@eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:15:17:f4:2e:6b brd ff:ff:ff:ff:ff:ff
inet 192.168.201.1/24 brd 192.168.201.255 scope global vlan201
inet6 fe80::215:17ff:fef4:2e6b/64 scope link
valid_lft forever preferred_lft forever
[...]
13: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
link/[65534]
inet 192.168.20.1 peer 192.168.20.2/32 scope global tun0
> 3) Minimal facts/Avoid transient data
> I tried to avoid any transient data at this first swipe.
The network details contain a whole lot of transient data: on many of our
system we have duplication of IP addresses across multiple machines, or HA
pool addresses that are present on one machine or another only if they are the
active master, or similar complexities.
[...]
> Thoughts?
The network side falls short in representing even medium-complexity machines
like mine above - which is hardly uncommon. Puppet/facter currently fall
pretty far short in representing that, because they made the same sort of odd
assumptions about how networks look.
Otherwise, it is good to see someone pushing this sort of standardization.
Regards,
Daniel
--
✣ Daniel Pittman ✉ dan...@rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
I thought about when I use information like that, and it is pretty much always
"walk over the set of disks, filter on some field, act on some others", so to
me that makes more sense. I can see your argument, though, too. :)
>> That makes it easier to iterate over them without needing to pass a
>> potentially large number of data structures around, or pack them myself.
>>
>> Also, your network facts are ... interesting. How do you represent this
>> machine in your proposed structure? (Also, keep in mind that this is the
>> *simplified* version of this, because we had to make an emergency rollout
>> without actually putting in the HA links to the second system, or the support
>> for multiple paths up to our data center. Those should land in the next few
>> days. :)
> < snip>
>> The network side falls short in representing even medium-complexity machines
>> like mine above - which is hardly uncommon. Puppet/facter currently fall
>> pretty far short in representing that, because they made the same sort of odd
>> assumptions about how networks look.
>
> I snipped out all the text above but I'm not ignoring it. I realized I left
> out a few things (netmasks/routing anyone? heh) and was pondering the best
> way to represent aliases or if to include them at all.
*nod*
> Let me explain that.
>
> I would personally consider HA addressing and MOST secondary addressing to
> be "application level" configuration. The data I'm trying to represent is
> basic information about a system.
Well, for the systems in question it literally does have, and respond to, a
range of services on all those different addresses. (Also, most large web
servers are going to have multiple IP addresses - because HTTPS support still
needs them, darn it all.)
So, those are not really "application level" in any meaningful way that
whatever random address gets picked as the first one is - and are often *less*
"application level", since they are what the host actually communicates using.
It is probably important to think about how this information is going to be
used, also: in pretty much every case I have needed to know about an IP
address (as opposed to the hostname, or some other identifier) it is because
we need to do something meaningful with it.
In which case having partial information is going to eventually fail. It
could be binding services to an interface, or building firewall rules, or
ensuring that the on-disk network configuration matches the running
configuration, or any number of things - but the missing information is
eventually going to be the information that I actually need this time.
I can kind of see an effort to distinguish these addresses as somehow less
important than the underlying management address - but on at least some
servers we have a separate management network, and the only parts we care
about configuring (or reporting) are the front-end parts.
[...]
> If I'm wrong on the aliasing/HA addresses please let me know but that's my
> personal opinion ;)
I think that it would be a fundamental mistake to bake in the utterly wrong
idea that there is any distinction between those addresses, frankly. Which,
obviously, is my opinion - but is informed by the frequency at which people do
get this wrong.
(The same fault comes up with "gateway" facts and other network related things
every time, too, because they look like they are simple, one-value items, but
they turn out to be full of lurking complexity in modern large-scale
networks.)
Hello,
I'm Walid Nouh, I work with Goneri on the FusionInventory project, and also on GLPI (http://glpi-project.org), which is an asset management software.
Okay. I've taken a swipe at what I think makes sense as a first draft of some basic system data: https://gist.github.com/712574
I am +1 on Joe's modifications which use arrays of hashes. Those also
map very well to document-oriented DBs like CouchDB and MongoDB.
Grig
I've not been on this list too long so missed the start of this
thread. Clearly this discussion ressonates with my interest in facter.
> Okay. I've updated the gist: https://gist.github.com/712574 with a new
> network block. I merged what I considered relevant L2 and L3 facts
> into one section. I also added secondary addresses in there. I left
> out default route because for now but added in the default gateway for
> each interface.
Previously I was quite heavily involved in systems with OpenFirmware
and I always liked the representation from IEEE 1275 and as exported
in /proc/device-tree.
I guess it depends on whether you want to represent buses as in
sysfs/OF or merely logical entities such as disks.
One feature I like about OF is the ability to have aliases and the
fact they are simply mapped as part of the tree so it is easy to
discover.
> I'm trying to keep what would be considered monitoring data(% free on
> a given disk or mem in use) data out of the format for now but it's
> obviously up for discussion.
Thinking about data about devices is an interesting thought. Disks
don't really have free space - that's a property of an FS which is
elsewhere, and can even span multiple disks. Same with memory usage -
that's really a property of the kernel/running system not of the
hardware map.
I think it'd be important to distinguish between these. I quite like
what I've seen in the json representation so far but I'd like to play
with it a bit more.
Paul
>> Okay. I've updated the gist: https://gist.github.com/712574 with a new
>> network block. I merged what I considered relevant L2 and L3 facts
>> into one section. I also added secondary addresses in there. I left
>> out default route because for now but added in the default gateway for
>> each interface.
>
> Previously I was quite heavily involved in systems with OpenFirmware
> and I always liked the representation from IEEE 1275 and as exported
> in /proc/device-tree.
>
> I guess it depends on whether you want to represent buses as in
> sysfs/OF or merely logical entities such as disks.
>
> One feature I like about OF is the ability to have aliases and the
> fact they are simply mapped as part of the tree so it is easy to
> discover.
My gut says that the json representation should in many cases be an abstraction above what might be included in device-tree. That said parsing something like device-tree and sysfs to produce it makes sense.
>> I'm trying to keep what would be considered monitoring data(% free on
>> a given disk or mem in use) data out of the format for now but it's
>> obviously up for discussion.
>
> Thinking about data about devices is an interesting thought. Disks
> don't really have free space - that's a property of an FS which is
> elsewhere, and can even span multiple disks. Same with memory usage -
> that's really a property of the kernel/running system not of the
> hardware map.
>
> I think it'd be important to distinguish between these. I quite like
> what I've seen in the json representation so far but I'd like to play
> with it a bit more.
I agree, we should keep running state metrics out of the hardware map.
Does this really bother folks? From a data structure point of view,
these really are hashes - they have a unique identifier. While I'm
certainly sympathetic to MongoDB being kind of lame about key
identifiers, I'm not that sympathetic. :)
The difference here can be significant - think about how you look the
data up:
{
"disks": {
"/dev/sda1": {
"size": "100"
}
}
}
If you wanted to know if /dev/sda1 exists:
data["disks"].exists?("/dev/sda1")
Will do the job, in constant time. Whereas:
{
"disks": [
{
"name": "/dev/sda1",
"size": "100"
}
]
}
data["disks"].find { |d| d["name"] == "/dev/sda1" }
Does it in linear time.
This will happen to you every time you want to do this kind of lookup,
which is pretty frequently.
Take this into another language without Ruby's block syntax, and it gets
even stranger.
exists($data{"disks"}{"/dev/sda1"})
grep { $_{"name"} == "/dev/sda1 } $data{"disks"}
I care a lot more about that than I do Mongo's key choices.
Best,
Adam
> Okay. I've updated the gist: https://gist.github.com/712574 with a new
> network block. I merged what I considered relevant L2 and L3 facts into one
> section. I also added secondary addresses in there. I left out default route
> because for now but added in the default gateway for each interface.
That looks pretty good to me, compared to the last lot. The structure makes a
lot more sense, I think.
You should probably update your example to show what an interface looks like
with two unequal priority default gateways on an interface, though. (Which
is a real example: we have a leased line, and a VPN fallback, for our
connection up to our other data center, so two gateways, same interface, same
network, different metrics. :)
> I also added a few top level identifiers. Timestamp right now would be the
> timestamp when the information was last gathered as opposed to
> generated. Basically a freshness check?
I would suggest you name it explicitly for what it contains, because otherwise
someone won't read the spec right, make an assumption, and get upset. (Not
that I would ever do that or anything. ;)
Maybe 'collected_time' or 'collected_at'? Anyway, something that makes it
harder to guess what the timestamp is from would be good.
> The role and provisioned fields I'm not sure about but I added them just the
> same. The thought is that if you were using this information from
> Chef/Puppet/Whatever to populate another database somewhere would you want
> it? I think so.
We have servers that have multiple "roles", just to annoy. (...or, perhaps,
we have some roles that are no broader than a single server, so were not
individually named. :)
[...]
>> Well, for the systems in question it literally does have, and respond to, a
>> range of services on all those different addresses. (Also, most large web
>> servers are going to have multiple IP addresses - because HTTPS support
>> still needs them, darn it all.)
>>
>> So, those are not really "application level" in any meaningful way that
>> whatever random address gets picked as the first one is - and are often *less*
>> "application level", since they are what the host actually communicates using.
>>
>> It is probably important to think about how this information is going to be
>> used, also: in pretty much every case I have needed to know about an IP
>> address (as opposed to the hostname, or some other identifier) it is because
>> we need to do something meaningful with it.
>>
>> In which case having partial information is going to eventually fail. It
>> could be binding services to an interface, or building firewall rules, or
>> ensuring that the on-disk network configuration matches the running
>> configuration, or any number of things - but the missing information is
>> eventually going to be the information that I actually need this time.
>>
>> I can kind of see an effort to distinguish these addresses as somehow less
>> important than the underlying management address - but on at least some
>> servers we have a separate management network, and the only parts we care
>> about configuring (or reporting) are the front-end parts.
>
> Here's where we might differ in philosophy. I treat the hardware that
> something runs on as transient for lack of a better word (and despite my
> previous usage).
So do I, frankly. I pretty much ignored the "hwaddress" parts of the data,
for example, because they are transient. That routing, and the associated
firewall stuff? That is actually part of the role of those machines, not
something added on.
I think our mismatch is at the level of what is "hardware" and what isn't,
rather than over the basic concepts. :)
> Yes, there are basic firewall rules that exist for hosts but I separate
> those outside of the firewall rules that my apache server or database might
> need. When I apply a theoretical role of ApacheServer to a box the
> following happens:
>
> - Install Apache
> - Apply basic apache config bound to, say, the secondary NICs IP address
We differ here, because "secondary" and "primary" are not particularly
meaningful in our environment. We might have "unclassified",
"medical-in-confidence", and "management" interfaces attached to a machine,
though.
[...]
> In my mind the secondary IP doesn't belong to the box, it doesn't belong to
> being an apache server. It belongs to the role of serving my SSL enabled
> site.
*nod* I absolutely agree. If I was going to express things the same way you
do, though, the primary IP of the Apache server would be the *service*
address, and the secondary IP would be the management one.
> This is something of a contrived example but I think it makes the point. The
> philosophical difference is that the role of the box could change at any
> time. The base platform itself is transient because hardware sucks. The
> same would apply to firewalls or proxy servers or whatever.
...but would you expect the management address to change when the hardware
did? If so, why?
[...]
> All I REALLY care about at a larger scale is:
> - Do I have a box attached to network X and network Y (say management
> network and external/cluster network). I don't care if the secondary
> NIC is bound or not yet but a management iface needs to be there.
*nod*
> - Does it have the appropriate amount of free disk space and memory to
> serve the top most role I want for it (ApacheServer::MySSLEnabledSite)
> - Does it have connectivity to any shared resources if appropriate
> (SAN/NAS/whatever)
*nod* Me either. Which is why I think a flat list of addresses, rather than
the primary/secondary distinction, is the right one.
(Incidentally, are you sure you want to continue the illusion that the IP
address is tied to the interface, rather than being a property of the
machine? I probably would, but I figure I may as well ask. :)
[...]
> I can totally appreciate that perspective. With regards to routing
> specifically, I left that out too by mistake. As I sit here and think about
> how to represent it, I honestly have no idea. This goes back to the first
> thing - data structure.
*nod* For what it is worth: the *only* way I can possibly imagine
representing it is a logical view of the routing table. You can't really
simplify that, and even that is going to hurt. (Hello, source-based routing
on Linux, I love you and your multiple routing tables. :)
> I think you're right in that an array of hashes makes more sense for
> disk/network but I still want to keep it as skinny as possible. Remember
> that the first round of this is simply basic facts about hardware and OS
> that can be used to determine if a box is appropriate for a higher level
> role.
*nod* I think that ditching the primary/secondary distinction for the
addresses on an interface makes sense. (Include a "default source address" if
you really want to be able to reconstruct that.)[1]
Something like this gives the same information, but without imposing local
administrative distinctions on them, I think:
"network":{
"devices": [
{
"name": "eth0",
"address": ["192.168.1.1/24", "192.168.1.3/24", "10.0.0.0/24"],
"sourceip": "192.168.1.1",
"hwaddress": "01:01:01:01:01:01",
"speed": 1000,
"mtu": 9000,
}
]
}
Regards,
Daniel
Footnotes:
[1] Technically, Linux does have a primary/secondary distinction, but that
only dictates what addresses stick around or go away when an interface is
brought down, and is mostly irrelevant to this management layer.
Ew - that's a hash:
"network":{
"devices": {
"eth0": {
"address": ["192.168.1.1/24", "192.168.1.3/24", "10.0.0.0/24"],
"sourceip": "192.168.1.1",
"hwaddress": "01:01:01:01:01:01",
"speed": 1000,
"mtu": 9000
}
}
}
}
--
Opscode, Inc.
Adam Jacob, CTO
Constant vs linear time is perfectly reasonable and something I didn't take into account. My thought is that in practice these lists will short, rarely more than 100 or 1000 elements, it's unlikely we will be spend much wall clock time iterating to find an ethernet interface. Realistically parsing json will probably be more of an efficiency killer than iterating through lists but I don't think a binary format is realistic because it's not human readable. To that end my only concern is we would be compromising aspects like readability, compatibility (like with MongoDB) and usability for what will likely be a small albeit non-zero efficiency gain.
> Take this into another language without Ruby's block syntax, and it gets
> even stranger.
>
> exists($data{"disks"}{"/dev/sda1"})
> grep { $_{"name"} == "/dev/sda1 } $data{"disks"}
Fair enough, that ain't pretty.
I don't think you're really compromising on readability - with JSON in
particular, you'll get no promised key order, so things like "name" as
an attribute in the array-of-hashes will never be in the same place
visually. I'm the king of just pretty-printing JSON and calling it
usable, and take it from me: if you want it readable, you're going to do
it in a custom format. :)
As for compatibility, I would say that MongoDBs issues with dots in
keyspace is for folks who want to use this data in MongoDB to handle. If
there is a compatibility issue, it's with MongoDB not accepting all
valid JSON. (Which they are perfectly clear about - it's "JSON-style",
not JSON)
Adam
> On Wed, Nov 24, 2010 at 03:33:46PM -0800, Joe Williams wrote:
>> Constant vs linear time is perfectly reasonable and something I didn't take into account. My thought is that in practice these lists will short, rarely more than 100 or 1000 elements, it's unlikely we will be spend much wall clock time iterating to find an ethernet interface. Realistically parsing json will probably be more of an efficiency killer than iterating through lists but I don't think a binary format is realistic because it's not human readable. To that end my only concern is we would be compromising aspects like readability, compatibility (like with MongoDB) and usability for what will likely be a small albeit non-zero efficiency gain.
>
> I don't think you're really compromising on readability - with JSON in
> particular, you'll get no promised key order, so things like "name" as
> an attribute in the array-of-hashes will never be in the same place
> visually. I'm the king of just pretty-printing JSON and calling it
> usable, and take it from me: if you want it readable, you're going to do
> it in a custom format. :)
Sure, I certainly see your point although I think I personally still prefer an array-of-hashes.
> As for compatibility, I would say that MongoDBs issues with dots in
> keyspace is for folks who want to use this data in MongoDB to handle. If
> there is a compatibility issue, it's with MongoDB not accepting all
> valid JSON. (Which they are perfectly clear about - it's "JSON-style",
> not JSON)
Heh, "JSON-style", there's a MongoDB /dev/null joke in there some where.
Really though, I agree, valid JSON is valid JSON and if a system isn't compatible with the standard that's their fault. Regardless, we shouldn't knowingly shut them out if we can help it. Although it is most certainly impossible to make everyone happy.
Why?
> Really though, I agree, valid JSON is valid JSON and if a system isn't compatible with the standard that's their fault. Regardless, we shouldn't knowingly shut them out if we can help it. Although it is most certainly impossible to make everyone happy.
Right - I'm just advocating that if we're going to have an extensible
system, limiting the valid key name space to omit dots is bad mojo,
since dots often appear in valid places.
> On Wed, Nov 24, 2010 at 04:20:17PM -0800, Joe Williams wrote:
>>> I don't think you're really compromising on readability - with JSON in
>>> particular, you'll get no promised key order, so things like "name" as
>>> an attribute in the array-of-hashes will never be in the same place
>>> visually. I'm the king of just pretty-printing JSON and calling it
>>> usable, and take it from me: if you want it readable, you're going to do
>>> it in a custom format. :)
>>
>> Sure, I certainly see your point although I think I personally still prefer an array-of-hashes.
>
> Why?
The aforementioned reasons but of those probably most importantly, I like that idea of each attribute having an explicit name. Programmatically this allows one to know ahead of time how to get what they are looking for, i.e. the "name" of the device or the "size" or "mtu". The key describes what the attribute is, so if you want a device name you get data["devices"][1...n]["name"] not data["devices"].keys[1..n]. In both cases if one doesn't know what device they want they still have to iterate through them until they find it, except in my example one can bank on the "name" being the name of the device not the name implicitly being the key.
Additionally as Paul Nasrat alluded to earlier in the the end these are all PCI busses and etc, eth0 is just the alias the operating system gave pci0000:02. I'm not sure it deserves to be treated any differently than the MTU or IP address which also describe that PCI bus.
Even in your example it sucks - you want the name of "what"? :) The
zeroth device? You're basically saying you want to iterate all the
time, which I don't think you actually do.
If the issue is that you want the key to be flexible, feel free to dupe
the data into a "name" field.
There is a semantic value to these data structures - if you want to know
the "size" of something, that something has a name. If you want to get
the answer to the question "what is the size of /dev/sda1", in your
world we need to iterate - in mine you just ask. You still have to make
a choice (you only get one key, after all), but you'll make life easier
for people most of the time.
By sticking an array in front of the data you are explicitly creating a
new semantic, and rather than having people understand what you are
using as a key, your forcing them to understand how you determine a
consistent order for the resulting hash. You can't opt-out of there
being a key in the middle, you can just make it something that has
discoverable semantic meaning or something that has opaque semi-random
meaning.
> Additionally as Paul Nasrat alluded to earlier in the the end these
> are all PCI busses and etc, eth0 is just the alias the operating
> system gave pci0000:02. I'm not sure it deserves to be treated any
> differently than the MTU or IP address which also describe that PCI
> bus.
There are certainly other options for valid keys, the pbi-bus identifier
being one of them. I would argue it's a bad one, as it is very rarely of
any value to the end-user. The number of use-cases where you look
something up mentally by PCI bus identifier is mind-bogglingly low, in
comparison to how often you look something up by interface identifier.
> On Wed, Nov 24, 2010 at 05:49:05PM -0800, Joe Williams wrote:
>> The aforementioned reasons but of those probably most importantly, I
>> like that idea of each attribute having an explicit name.
>> Programmatically this allows one to know ahead of time how to get what
>> they are looking for, i.e. the "name" of the device or the "size" or
>> "mtu". The key describes what the attribute is, so if you want a
>> device name you get data["devices"][1...n]["name"] not
>> data["devices"].keys[1..n]. In both cases if one doesn't know what
>> device they want they still have to iterate through them until they
>> find it, except in my example one can bank on the "name" being the
>> name of the device not the name implicitly being the key.
>
> Even in your example it sucks - you want the name of "what"? :) The
> zeroth device? You're basically saying you want to iterate all the
> time, which I don't think you actually do.
>
> If the issue is that you want the key to be flexible, feel free to dupe
> the data into a "name" field.
That's a good compromise but duping data sucks. :P
> There is a semantic value to these data structures - if you want to know
> the "size" of something, that something has a name. If you want to get
> the answer to the question "what is the size of /dev/sda1", in your
> world we need to iterate - in mine you just ask. You still have to make
> a choice (you only get one key, after all), but you'll make life easier
> for people most of the time.
Right, if you know what you want ahead of time it works great. If you don't you still have to iterate and having an explicit keys for each attribute gives you more to work with. In my mind the "name" of the device is an attribute of the device not the device itself. I look at the array in this case as an unordered list of device descriptions not devices.
> By sticking an array in front of the data you are explicitly creating a
> new semantic, and rather than having people understand what you are
> using as a key, your forcing them to understand how you determine a
> consistent order for the resulting hash. You can't opt-out of there
> being a key in the middle, you can just make it something that has
> discoverable semantic meaning or something that has opaque semi-random
> meaning.
I'll argue that I'm not creating a new semantic because "devices" signifies that it is the key for something that is iterable, regardless if it's a hash, array, etc.
>> Additionally as Paul Nasrat alluded to earlier in the the end these
>> are all PCI busses and etc, eth0 is just the alias the operating
>> system gave pci0000:02. I'm not sure it deserves to be treated any
>> differently than the MTU or IP address which also describe that PCI
>> bus.
>
> There are certainly other options for valid keys, the pbi-bus identifier
> being one of them. I would argue it's a bad one, as it is very rarely of
> any value to the end-user. The number of use-cases where you look
> something up mentally by PCI bus identifier is mind-bogglingly low, in
> comparison to how often you look something up by interface identifier.
Certainly, I was just using it as an example.
How so?
data[:devices].each do |name, device_data|
...
end
while(my($name, $device_data) = each %{$data{'devices'}}) {
...
}
Or:
data[:devices].each do |device|
...
end
foreach my $device (@{$data{'devices'}}) {
...
}
I agree that I like the second form better when I'm iterating (kind of
obviously) but it's not because any data is missing.
> In my mind the "name" of the device is an attribute of the device not
> the device itself. I look at the array in this case as an unordered
> list of device descriptions not devices.
So do I - I'm just saying the data has more than one use case.
Iteration is one, but direct access is another. In the hash form, you
get both: iterating is easy, and direct access is at least possible. In
the array form, you loose the second use case entirely.
> I'll argue that I'm not creating a new semantic because "devices"
> signifies that it is the key for something that is iterable,
> regardless if it's a hash, array, etc.
It's both a key for something iterable and a key for a direct lookup for
a device. There is a reason you called it "name", after all. :)
>> In my mind the "name" of the device is an attribute of the device not
>> the device itself. I look at the array in this case as an unordered
>> list of device descriptions not devices.
>
> So do I - I'm just saying the data has more than one use case.
> Iteration is one, but direct access is another. In the hash form, you
> get both: iterating is easy, and direct access is at least possible. In
> the array form, you loose the second use case entirely.
>
>> I'll argue that I'm not creating a new semantic because "devices"
>> signifies that it is the key for something that is iterable,
>> regardless if it's a hash, array, etc.
>
> It's both a key for something iterable and a key for a direct lookup for
> a device. There is a reason you called it "name", after all. :)
You have convinced me, locking out direct access would be suboptimal. To that end direct access is probably worth more than having explicit keys for each attribute, "name" or otherwise.
i'm not overly sympathetic either especially with the attitude that
10gen has in general but since this is an "interchange" format of
sorts, I'd like for it to be as flexible as possible.
> The difference here can be significant - think about how you look the
> data up:
>
> {
> "disks": {
> "/dev/sda1": {
> "size": "100"
> }
> }
> }
>
> If you wanted to know if /dev/sda1 exists:
>
> data["disks"].exists?("/dev/sda1")
>
> Will do the job, in constant time. Whereas:
>
> {
> "disks": [
> {
> "name": "/dev/sda1",
> "size": "100"
> }
> ]
> }
>
> data["disks"].find { |d| d["name"] == "/dev/sda1" }
>
> Does it in linear time.
>
Excluding other languages, I had to give this a test. As far as ruby
goes, I've always been "semi-smart" about coding practices (using ''
instead of "" to avoid the interpolation pass, using << instead of +=
because it's faster. I was really curious about basic lookup speed
between the two so I gave it a go:
(gist and json files are here - https://gist.github.com/716336)
jvincent@jvx64:~/development/json-tests$ ruby bm.rb
Testing without cleanup
user system total real
{}.has_key? 3 0.000000 0.000000 0.000000 ( 0.000087)
[].find 3 0.000000 0.000000 0.000000 ( 0.000067)
{}.has_key? 10 0.000000 0.000000 0.000000 ( 0.000376)
[].find 10 0.000000 0.000000 0.000000 ( 0.000129)
Testing with cleanup
user system total real
{}has_key? 3 0.000000 0.000000 0.000000 ( 0.000065)
[].find 3 0.000000 0.000000 0.000000 ( 0.000058)
{}.has_key? 10 0.000000 0.000000 0.000000 ( 0.000061)
[].find 10 0.000000 0.000000 0.000000 ( 0.000085)
That was with 1.9.2 (which ships with JSON support OOB).
I'm no benchmark wizard so I might have screwed something up. Short of
forcing GC to run, it APPEARS that array.find is faster. I haven't
delved into WHY that is. Maybe an Array is a faster native data
structure than a Hash? I wouldn't say there are any GROSS time
differences between them.
As for which way to go? Like I said, I'm no fan of 10gen and MongoDB
so I won't let that be the massive deciding factor but my personal
preference is to not use transient values as key names if possible.
John
Yep - which means to me not restricting the values in the keyspace
beyond what is already in JSON.
> Excluding other languages, I had to give this a test. As far as ruby
> goes, I've always been "semi-smart" about coding practices (using ''
> instead of "" to avoid the interpolation pass, using << instead of +=
> because it's faster. I was really curious about basic lookup speed
> between the two so I gave it a go:
.. snip ..
> I'm no benchmark wizard so I might have screwed something up. Short of
> forcing GC to run, it APPEARS that array.find is faster. I haven't
> delved into WHY that is. Maybe an Array is a faster native data
> structure than a Hash? I wouldn't say there are any GROSS time
> differences between them.
There isn't at small scale. Check out:
http://en.wikipedia.org/wiki/Time_complexity
A hash lookup happens in constant time - the time it takes to hash the
key, essentially. An array traversal (like find) happens in linear time
- as you add more elements to the array, it takes longer to find. (Now,
if the item you are looking for is first in the array, it may always
be short, because the find method might quit after the first item is
found, for example.)
This was Joe's point from earlier in the thread - given the size of the
array input thats likely, the differential for walking the array or
looking up an item in a hash is likely minimal.
require 'benchmark'
hash_buddy = Hash.new
array_buddy = Array.new
0.upto(10000000) do |number|
hash_buddy[number] = true
array_buddy << number
end
Benchmark.bm do |x|
x.report("hash lookup 10m:") { hash_buddy.has_key?(10000000) }
x.report("array lookup 10m:") { array_buddy.find { |i| i == 10000000 } }
x.report("hash lookup 1:") { hash_buddy.has_key?(1) }
x.report("array lookup 1:") { array_buddy.find { |i| i == 1 } }
end
And you'll get this:
user system total real
hash lookup 10m: 0.000000 0.000000 0.000000 ( 0.000017)
array lookup 10m: 0.900000 0.010000 0.910000 ( 0.905898)
hash lookup 1: 0.000000 0.000000 0.000000 ( 0.000008)
array lookup 1: 0.000000 0.000000 0.000000 ( 0.000010)
Notice the array lookup takes almost a second for the 10m case, while
the hash lookup remains constant.
Best,
So in an effort to revise it, I'll take a look at the last state of my
gists and see where we left off. I did submit a basic patch to facter
that did nothing more than covert the fact output to yaml. My plan
was, when a final format + set of facts was decided, to actually
submit patches to both facter and ohai that would accept a flag to
dump the information in the format we all agreed upon - something like
'facter --format common' or 'ohai --format common'
--
John E. Vincent
http://about.me/lusis
Ok, I missed this thread originally, if someone can send me a link to the discussion is be happy to see if I could integrate it into Edison.
Matt
It aims to centralise a Configuration Management DataBase,
Configuration Deployment (using puppet at the moment but contributions
for other systems are welcome!) and Change Management in one place.
The main idea behind it was that I'd be able to say "What has changed
on server X in the past 8 hours" and it would tell me - life-saving in
the middle of a major incident!
Cheers,
Matt (ProfFalken)
On 27 March 2011 17:22, Clay McClure <cl...@daemons.net> wrote:
> Noah? Edison?
>
> On Sun, Mar 27, 2011 at 3:36 AM, Matthew Macdonald-Wallace
> <mattm...@gmail.com> wrote:
>>
>> Ok, I missed this thread originally, if someone can send me a link to the
>> discussion is be happy to see if I could integrate it into Edison.
>>
>> Matt
>>
Noah is a service registry plus distributed coordination system similar/inspired by Apache zookeeper:
Vogleler is a stalled project of mine that was going to be a framework for a command and control + cmdb. I started this thread when I was still working on it.
https://github.com/lusis/vogeler
I have every intention of picking it back up and I'm actually going to refractory quite a bit based on my experiences developing Noah.
Edison is proffalken's baby. I don't have the URL on me.
Matt,
Best bet is to hit groups.google.com and find the mailing list page. This thread is right up top right now ;) I warn you that the previous discussion was 3 or so pages long.
I'll hit you off list on the work stuff.
As for Vogeler, It morphed over the time I was working on it but the
original use case was a system of record for a company I worked at.
Cobbler didn't really work as a model for our environment and puppet
was using Cobbler for lookups. We were going to hack on Cobbler to
support a CouchDB backend or wrap all our cobbler calls in lookups to
couchdb. So I started hacking on something in my spare time - Vogeler.
As it evolved it, it really became more of generic command-and-control
system. That was born out of the desire to not rewrite factor or
cobbler. When I first "announced" it, Patrick (Debois) asked me some
questions so I wrote this blog post:
http://lusislog.blogspot.com/2010/09/follow-up-to-vogeler-post.html
So while it's in a stalled state now, I actually want to refactor it
and remove the rabbitmq + couchdb stuff and replace them both with
Redis (since it actually works well for both storage and queuing -
pubsub replacing fanout and LISTs replacing direct exchanges).
You should also take a look at what Miquel Torres and Grig Gheorghiu
are doing with Overmind - https://github.com/tobami/overmind
We had talked about using some form of the Vogeler C&C capabilities in
it but I've not had time (surprise) to check back in with it.
On Mon, Mar 28, 2011 at 4:23 PM, Clay McClure <cl...@daemons.net> wrote:
> John,
> Noah looks pretty cool. I like the stack you're using with that: sinatra,
> ohm, redis.
> Tell me more about vogeler. It sounds interesting, and perhaps related to a
> project that's been floating around in my head for a while now.
> Whereabouts do you work?
> Cheers,
> Clay
>
> On Sun, Mar 27, 2011 at 12:54 PM, John Vincent <lusi...@gmail.com> wrote:
>>
>> Noah is a service registry plus distributed coordination system
>> similar/inspired by Apache zookeeper:
>>
>> https://github.com/lusis/Noah
>>
>> Vogleler is a stalled project of mine that was going to be a framework for
>> a command and control + cmdb. I started this thread when I was still working
>> on it.
>>
>> https://github.com/lusis/vogeler
>>
>> I have every intention of picking it back up and I'm actually going to
>> refractory quite a bit based on my experiences developing Noah.
>>
>> Edison is proffalken's baby. I don't have the URL on me.
>>
>> On Mar 27, 2011 12:22 PM, "Clay McClure" <cl...@daemons.net> wrote:
>
>