Re: question about hostcheck

25 views
Skip to first unread message

Jehiah Czebotar

unread,
Oct 7, 2008, 6:19:40 PM10/7/08
to txlb...@googlegroups.com
this is now bugified, and i'm responding to the mailing list so i
start remembering to use that =)

https://bugs.launchpad.net/txloadbalancer/+bug/279883

hopefully we can check in a fix for this before getting release 1.2 together.

--
Jehiah
http://jehiah.cz/

On Tue, Oct 7, 2008 at 3:36 PM, Duncan McGreggor
<duncan.m...@gmail.com> wrote:
> On Tue, 2008-10-07 at 15:15 -0400, Jehiah Czebotar wrote:
>> so i have a scenario which i want to ask you about
>>
>> suppose i have a service in txlb named 'webkit' and that has 4 hosts
>> app01,app02,app03,app04.
>>
>> when i restart the listening application on app01-04 it stops
>> receiving connections on the specified port for about half a second.
>> If a request comes through txlb during that half second that host gets
>> marked as dead as it should.
>>
>> My understanding is that this dead host will not be re-added to the
>> webkit service till the hostCheck runs and is able to connect. However
>> if all 4 hosts happen to be marked as dead, i really want the next
>> incoming connection to try to connect to a dead host until one is
>> successful and added back into rotation, all without having to wait
>> for a timer to fire on the hostCheck event.
>>
>> Is the only (current) way to accomplish this to set the hostCheck to a
>> really small interval? (which has the downside of, when set at .5sec
>> or so, clobbering hosts while they are trying to start up)
>>
>> Thanks. If you can think of a better way to handle the above scenario,
>> I'll be happy to try and work up an implementation.
>
> What we need to do is add some logic in the host-checking code: in the
> even that all hosts are marked as dead, mark all as active. There should
> be something like that in there from the original code, so maybe all we
> have to do is make sure it actually gets called.
>
> Regardless, it's an easy fix :-)
>
> d
>
>

Duncan McGreggor

unread,
Oct 8, 2008, 5:54:04 PM10/8/08
to txlb...@googlegroups.com
On Tue, 2008-10-07 at 18:19 -0400, Jehiah Czebotar wrote:
> this is now bugified, and i'm responding to the mailing list so i
> start remembering to use that =)
>
> https://bugs.launchpad.net/txloadbalancer/+bug/279883
>
> hopefully we can check in a fix for this before getting release 1.2 together.
>
> --
> Jehiah

Thanks for all your help, Jehiah -- these bug reports are the best way
to get me put some attention on the issues in txLB :-)

I've gone through the remaining issues (didn't realize you'd put more
up!) and have made plans for addressing them. All of these probably
won't get done this week, but let's keep our fingers crossed for next
week.

d

Charles Kaminski

unread,
Oct 9, 2008, 10:31:37 AM10/9/08
to txlb...@googlegroups.com
Hi All,

My company needs automated load-balancing to bring reserve systems on line, take certain systems off line, perform maintenance, remove down machines, bring in fixed machines, and start the process over again.  This happens every few minutes through the http interface on the load balancer.  We use a separate master application to control these functions through our load balancer.  To that end, we need to know the current state of the load balancer.

The class and two functions written in pages.py below provide that information for us.  One thing I did notice, the count on open ports never goes down when we disable a group.  It looks like it could be a bug. . .

-Charles
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
class RunningData(BasePage):
    """
    This class was thrown together real quick to give xml access to the current state.
    """
    def getPage(self, request):
        """
        Don't look at me; this craziness is a modified version of the original.
        """

        request.setHeader('Content-type', 'text/plain')

        verbose = False
        resultMessage = ''
        content = []
        msg = ''

        for service in self.parent.conf.getServices():
            buf = '  '
            content.append('<service name="%s">\n' % (service.name))
            
            # Get Hosts and ports service is listening on
            for index, l in enumerate(service.listen):
                proxy = self.parent.director.getProxy(service.name, index)
                content.append('  <listen ip="%s:%s"/>\n' % (proxy.host, proxy.port))

            # Get groups and enabled group
            eg1 = service.getEnabledGroup()
            groups = service.getGroups()

            # iterate through groups for each host
            for group in groups:
# Get group info
                enabled = group is eg1
                content.append('  <group enabled="%s" name="%s" scheduler="roundr">\n' % (enabled, group.name))
# Get Parent Objects to hosts
                tracker = self.parent.director.getTracker(service.name, group.name)
                stats = tracker.getStats()
# Get hostnames dict, as well as bad hosts and good hosts
                hdict = tracker.getHostNames()
                bad = stats['bad']
                counts = stats['openconns']

# get a list of keys
badHosts = [('%s:%s' % b,hdict['%s:%s' % b]) for b in bad.keys()]
goodHosts = [(h,hdict[h]) for h in counts.keys()]
# write host info
for ip, name in goodHosts:
   content.append('    <host ip="%s" name="%s" disabled="False"/>\n' % (ip,name))
for ip, name in badHosts:
   content.append('    <host ip="%s" name="%s" disabled="True"/>\n' % (ip,name))

                content.append('  </group>\n')
            content.append('</service>\n')

        return ''.join(content)
    
class PythonData(BasePage):
    """
    This class was thrown together real quick to give python access to the current state.
    """
    def getPage(self, request):
        """
        Don't look at me; this craziness is a modified version of the original.
        """

        request.setHeader('Content-type', 'text/plain')

# Load Balancer Dictionary Object
lb = {}
        for service in self.parent.conf.getServices():

   # New Service Dictionary Object
   lb[service.name] = {}
   
            # Get groups and enabled group
            eg1 = service.getEnabledGroup()
            groups = service.getGroups()

            # iterate through groups for each host
            for group in groups:
lb[service.name][group.name]['enabled'] = group is eg1
# Get Parent Objects to hosts
                tracker = self.parent.director.getTracker(service.name, group.name)
                stats = tracker.getStats()
# Get hostnames dict, as well as bad hosts and good hosts
                hdict = tracker.getHostNames()
                bad = stats['bad']
                counts = stats['openconns']

# get a list of keys
badHosts = [('%s:%s' % b,hdict['%s:%s' % b]) for b in bad.keys()]
goodHosts = [(h,hdict[h]) for h in counts.keys()]
lb[service.name][group.name]['badHosts']  = badHosts
lb[service.name][group.name]['goodHosts'] = goodHosts

        return repr(lb)

Jehiah Czebotar

unread,
Oct 9, 2008, 10:46:35 AM10/9/08
to txlb...@googlegroups.com
i like the idea of replacing RunningPage with something formatted as
json/python data structures instead of plain text (aka the PythonData
Charles just provided).

Charles: can you checkout the trunk branch? we recently added a
RunningConfigXml class
(https://bugs.launchpad.net/txloadbalancer/+bug/277899) which maps to
/running.xml and seems to do the same thing as your RunningData

--
Jehiah

--
Jehiah

Charles Kaminski

unread,
Oct 9, 2008, 10:53:00 AM10/9/08
to txlb...@googlegroups.com
Hi Jehiah,

Thanks for the info on RunningConfigXml.  This was written many months ago and is in production.  I'll see what I can do. . .

-Charles

Jehiah Czebotar

unread,
Oct 9, 2008, 11:18:52 AM10/9/08
to txlb...@googlegroups.com
On Thu, Oct 9, 2008 at 10:53 AM, Charles Kaminski
<ckam...@datascoutinc.com> wrote:
> Hi Jehiah,
> Thanks for the info on RunningConfigXml. This was written many months ago
> and is in production. I'll see what I can do. . .
> -Charles

Thanks for passing the patch along to the list.

I also wrote a bug to track getting the output in json/python data
structure format, so we don't forget about it.

https://bugs.launchpad.net/txloadbalancer/+bug/280790

--
Jehiah

Duncan McGreggor

unread,
Oct 9, 2008, 11:23:15 AM10/9/08
to txlb...@googlegroups.com
On Thu, 2008-10-09 at 09:31 -0500, Charles Kaminski wrote:
> Hi All,
>
>
> My company needs automated load-balancing to bring reserve systems on
> line, take certain systems off line, perform maintenance, remove down
> machines, bring in fixed machines, and start the process over again.
> This happens every few minutes through the http interface on the load
> balancer. We use a separate master application to control these
> functions through our load balancer. To that end, we need to know the
> current state of the load balancer.

I guess your controller application has already been written... does it
do screen-scraping-type activities to automate filling in form data and
submitting it?

If you hadn't already written your application or if you do rewrite it
sometime in the future, would an XML-RPC interface be of interest for
you? Or are you specifically in need of a RESTful interface?

Thanks,

d

Duncan McGreggor

unread,
Oct 9, 2008, 11:28:15 AM10/9/08
to txlb...@googlegroups.com
On Thu, 2008-10-09 at 09:31 -0500, Charles Kaminski wrote:
> Hi All,

[snip]

> The class and two functions written in pages.py below provide that
> information for us. One thing I did notice, the count on open ports
> never goes down when we disable a group. It looks like it could be a
> bug. . .
>

I have added a ticket for this:
https://bugs.launchpad.net/txloadbalancer/+bug/280799

Thanks!

d


Duncan McGreggor

unread,
Oct 9, 2008, 11:30:02 AM10/9/08
to txlb...@googlegroups.com
On Thu, 2008-10-09 at 10:46 -0400, Jehiah Czebotar wrote:
> i like the idea of replacing RunningPage with something formatted as
> json/python data structures instead of plain text (aka the PythonData
> Charles just provided).
>
> Charles: can you checkout the trunk branch? we recently added a
> RunningConfigXml class
> (https://bugs.launchpad.net/txloadbalancer/+bug/277899) which maps to
> /running.xml and seems to do the same thing as your RunningData
>
> --
> Jehiah

Charles,

Note that I will be adding Jehiah's code sometime in the next day or
two. We should have a new release out with several bug fixes and
features sometime next week.

d

Jehiah Czebotar

unread,
Oct 9, 2008, 11:32:25 AM10/9/08
to txlb...@googlegroups.com

one additional thought (perhaps slightly off topic) is that in the
scripts i have written to enable/disable services via the rest api, i
ended up scraping the web page to see if an ip was enabled/disabled.
perhaps we should add a rest api for /checkStatus?service=&group=&ip=
command that returns 'ENABLED', 'DISABLED', or 'NOT_FOUND' ? That
might solve at least one of the cases where currently you have to
scrape the admin page

--
Jehiah

Duncan McGreggor

unread,
Oct 9, 2008, 11:49:25 AM10/9/08
to txlb...@googlegroups.com

I have just created a ticket for addressing this:
https://bugs.launchpad.net/txloadbalancer/+bug/280810

Fortunately, I have much of the code already written and should be able
to drop all the needed supporting infrastructure in place without many
changes.

d


Charles Kaminski

unread,
Oct 9, 2008, 12:11:52 PM10/9/08
to txlb...@googlegroups.com
Hi Duncan,

No screen scrapes initially.  In this case, I make an http call to a page named runningData.py (defined below).  That gives me a text representation of the state of the loadbalancer that I can bring right into a python application (looking back, I could have pickled it instead of using repr. . .)

Once I have the current state, I manipulate the loadbalancer through additional http calls.
No need to fill out forms.  I just code the proper variable values right into the url.

I do need to check that the command went through properly . . .

-Charles
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    def getChild(self, name, request):
        """
        A simple object publisher that mapes part of a URL path to an object.
        """
        if not self.authenticateUser(request):
            return self.unauthorized()
        if name == 'all' or name == '':
            page = RunningPage(self)
            return page
        elif name == 'txlb.css':
            return StyleSheet()
        elif name == 'runningData.xml':
            return RunningData(self)
elif name == 'runningData.py':
            return PythonData(self)
        elif name == 'config.obj':
            return RunningConfig(self)
        elif name == 'config.xml':
            return StoredConfig(self)
        elif name == 'delHost':
            return DeleteHost(self)
        elif name == 'addHost':
            return AddHost(self)
        elif name == 'enableGroup':
            return EnableGroup(self)
        return resource.Resource.getChild(self, name, request)

Charles Kaminski

unread,
Oct 9, 2008, 12:19:22 PM10/9/08
to txlb...@googlegroups.com
I personally prefer RESTful, but wouldn't mind either.

-Charles
Reply all
Reply to author
Forward
0 new messages