What causes "balancing pool" messages ?

Discussion:

Nomad

2008-01-10 05:23:29 UTC

I have two 3.1.0 dhcp servers running and have noticed in my logs these
messages:

Jan 9 04:13:23 DHCP1 dhcpd: balancing pool a854600 172.16.81.96/28 total 9
free 6 backup 3 lts 1 max-own (+/-)1
Jan 9 04:13:23 DHCP1 dhcpd: balanced pool a854600 172.16.81.96/28 total 9
free 6 backup 3 lts 1 max-misbal 1
Jan 9 04:13:23 DHCP1 dhcpd: balancing pool a8487c0 172.16.145.160/27 total
29 free 12 backup 7 lts 2 max-own (+/-)2
Jan 9 04:13:23 DHCP1 dhcpd: balanced pool a8487c0 172.16.145.160/27 total 29
free 12 backup 7 lts 2 max-misbal 3
Jan 9 04:13:23 DHCP1 dhcpd: balancing pool a81b3e0 172.21.1/24 total 123
free 74 backup 49 lts 12 max-own (+/-)12
Jan 9 04:13:23 DHCP1 dhcpd: balanced pool a81b3e0 172.21.1/24 total 123 free
74 backup 49 lts 12 max-misbal 18

For each network segment, I see these messages coming in every minute. In
case it matters, here's the failover section of my dhcp servers:

#failover definition
failover peer "dhcp" {
primary;
address 172.16.100.10;
port 520;
peer address 172.16.100.11;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
mclt 300;
split 255;
load balance max seconds 3;
}

#failover definition
failover peer "dhcp" {
secondary;
address 172.16.100.11;
port 520;
peer address 172.16.100.10;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 3;
}

And by the way, what does the "lts" mean in these messages?

David W. Hankins

2008-01-10 19:08:29 UTC

Permalink

Post by Nomad
Jan 9 04:13:23 DHCP1 dhcpd: balancing pool a854600 172.16.81.96/28 total 9
free 6 backup 3 lts 1 max-own (+/-)1
Jan 9 04:13:23 DHCP1 dhcpd: balanced pool a854600 172.16.81.96/28 total 9
free 6 backup 3 lts 1 max-misbal 1

These ocurr on a schedule now in 3.1.0; by default it will happen at
least once an hour, and should happen no faster than 60 seconds.

Post by Nomad
For each network segment, I see these messages coming in every minute. In

The leases database is more interesting for this than your failover
config (except that you have not set min-balance to override the 60s
default). The server is trying to estimate when the remote-system's
pools are going to reach the "max lease misbalance" point at the
current estimated rate of change (as measured by the oldest free and
backup leases expiration times).

It only reduces this estimation, but never below min-balance, so in
practical terms it takes the "worst case" pool, the pool that
indicates the most near term potential schedule for reaching
max-misbalance.

At least according to theory, the amount of time inbetween balance
events should rise as the expiration times on your free/backup leases
become more distant memories of the past, and it should dynamically
become more frequent as recent activity makes them more fresh.

If your practice is that leases cycle owners so fast they never get
to become memories of the distant past, you can simply raise the
min-balance percentage as a workaround. The dhcpd.conf manpage has a
section on min-balance and the other parameters that govern pool
balance maintenance.

Post by Nomad
And by the way, what does the "lts" mean in these messages?

"leases to send." The convention predates me, but if the local system
(logging the line) were to send 'lts' leases to its peer's state, then
the pool would be "evenly balanced."

So the logs you're pasting are showing that the server looked, but
took no action (the lts number before and after did not change).

It is normal for;

lts to be greater than max-misbalance on a "balancing" line.

lts to be greater than zero on a "balancing" line, but less than zero
on a "balanced" line (leases were shifted to the peer according to
their most recent client and the load balancing hash algorithm).

lts to remain precisely the same non-zero value less than or equal to
'max-own' between both log lines.

--
Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
Why settle for the lesser evil? https://secure.isc.org/store/t-shirt/
--
David W. Hankins "If you don't do it right the first time,
Software Engineer you'll just have to do it again."
Internet Systems Consortium, Inc. -- Jack T. Hankins

Denis Laventure

2008-01-10 19:22:53 UTC

Permalink

Post by David W. Hankins

These ocurr on a schedule now in 3.1.0; by default it will happen at
least once an hour, and should happen no faster than 60 seconds.

Is it possible to hide those messages from the log with an option or
command line argument?
I have 2897 pool that get balanced every time I reload the dhcp server
and every hour.

Denis

David W. Hankins

2008-01-10 19:54:29 UTC

Permalink

Post by Denis Laventure
Is it possible to hide those messages from the log with an option or
command line argument?

The balancing logs are made at INFO level, so you can tell syslog not
to log it, or log it elsewhere.

But there's a lot of other stuff logged at INFO level you might want
to keep.

This isn't really as flexible as we'd like it, and the way DHCP does
logging like this in general is something we want to look at someday.

Post by Denis Laventure
I have 2897 pool that get balanced every time I reload the dhcp server
and every hour.

If you want the balancing to happen every hour but not tell you the
results there aren't a lot of tools available...if your experience is
that the balancing events don't do anything, or weren't necessary, you
could raise the maximum limit on the rebalance intervals (I forget the
config syntax, check dhcpd.conf manpage), but this means the balancing
runs will happen less frequently as well.

David W. Hankins

2008-01-10 20:12:24 UTC

Permalink

Post by Denis Laventure
I have 2897 pool that get balanced every time I reload the dhcp server
and every hour.

Although honestly if you have that many pools, it would actually work
out better if all the pools rebalance events were done individually
and scheduled separately with some random dispersion. 3,000 pools to
check is a significant amount of time to spend churning CPU and not
answering queries.

The troubles with that in the 3.1.x and 4.0.x sources is that our
event (timed) scheduler does not cope well with large numbers of
pending events, and the failover protocol message POOLREQ (which is
used by one peer to politely ask the remote peer to cough up some
leases) doesn't have a means to identify which pool they want to
make a request for...you have to search them all. So it makes for
some challenges in keeping shared code.

Some day. Maybe they could even be balanced by different processors
in that ethereal future everyone keeps talking about.

Nomad

2008-01-10 20:25:05 UTC

Permalink

Post by David W. Hankins

Post by Denis Laventure
I have 2897 pool that get balanced every time I reload the dhcp server
and every hour.

Although honestly if you have that many pools, it would actually work
out better if all the pools rebalance events were done individually
and scheduled separately with some random dispersion. 3,000 pools to
check is a significant amount of time to spend churning CPU and not
answering queries.
The troubles with that in the 3.1.x and 4.0.x sources is that our
event (timed) scheduler does not cope well with large numbers of
pending events, and the failover protocol message POOLREQ (which is
used by one peer to politely ask the remote peer to cough up some
leases) doesn't have a means to identify which pool they want to
make a request for...you have to search them all. So it makes for
some challenges in keeping shared code.
Some day. Maybe they could even be balanced by different processors
in that ethereal future everyone keeps talking about.
--
Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
Why settle for the lesser evil? https://secure.isc.org/store/t-shirt/
--
David W. Hankins "If you don't do it right the first time,
Software Engineer you'll just have to do it again."
Internet Systems Consortium, Inc. -- Jack T. Hankins

Am I misuderstanding something ... are you saying that while the servers are
balancing pools, they're not responding to DHCP requests?

David W. Hankins

2008-01-10 20:34:30 UTC

Permalink

Post by Nomad
Am I misuderstanding something ... are you saying that while the servers are
balancing pools, they're not responding to DHCP requests?

The DHCP software is not currentl threaded or fork-managed or anything
like that. No parallelism. With some exceptions (like ICMP echo
requests), the server only does one thing at a time.

David W. Hankins

2008-01-10 20:13:57 UTC

Permalink

Post by David W. Hankins
min-balance percentage as a workaround. The dhcpd.conf manpage has a

Correction, min-balance is a fixed time, not a percentage.

Aggarwal Vivek-Q4997C

2008-01-14 10:19:09 UTC

Permalink

Hi

I have configured DHCP 3.0.6 on NetBsd. Can anyone tell how much it
should take to assign an IP address to the client.

Regards
Vivek Aggarwal

Simon Hobson

2008-01-14 11:37:10 UTC

Permalink

Post by Aggarwal Vivek-Q4997C
I have configured DHCP 3.0.6 on NetBsd. Can anyone tell how much it
should take to assign an IP address to the client.

Normally it's some small fraction of a second from receiving a request.

If it's not then your next actions should be :

1) Inspect the logs and check that a) the server has started without
error, and b) that it logs some activity in response to the clients
request.

Assuming the server has started, and it's "not working", then your
next step is to use a packet sniffer (eg wireshark) to see what
packets (if any) are being passed on the network - you are looking to
see that a) the client is actually sending request packets, b) that
the server is actually receiving them, c) that the server is
responding, d) that the responses get back to the client. Steps b & d
are usually not an issue on a flat network, but can be a problem when
routers & relay agents are involved.

John Hascall

2008-01-14 11:52:09 UTC

Permalink

Post by Aggarwal Vivek-Q4997C
I have configured DHCP 3.0.6 on NetBsd. Can anyone tell how much it
should take to assign an IP address to the client.

I'm not exactly clear what you are asking, but,
assuming the dhcp server is also ISC's and that
you have not turned off "ping-check", it should
take just over 1 second (as the server will ping
the address it wishes to assign and then wait
1 second to see that there is no reply).

If you are having difficulty, the first two places
to look are the logfiles on the client and server,
and then at the network traffic which something
like tcpdump etc.

John