Discussion:
Good old "peer holds all free leases" appearing in a (formerly) very stable setup
Nicolas Ecarnot
2014-02-18 12:37:07 UTC
Permalink
Hi,

We have setup a complex DHCP infrastructure 5 years ago, and it is very
very stable and working very nicely :
- 10 rhel5 servers, serving around 150 shared-networks for 150 physical
sites
- in every network resides mostly one pool, sometimes 2, and rarely more
- every pool is "failovered" by a couple of two physical servers
- we are using DhcId classes successfully
- our router are using dhcp-helper setup to correctly relay the dhcp traffic

Tests were made, time has passed, all this setup is SVN-versionned, we
frequently change things/values/subnets, but we never change the basics
described above, we use it every day and we think we have a clear view
of it. It is all working FINE.

Today, in a subnet, I just wanted to add a pool which access is
restricted to a new class. As usual, I wanted this pool to be manageable
by a server and its failover peer.

What I get is the good old "peer holds all free leases" error, that I
learned NOT to trust since years, as it appeared even when not related
at all to leases issues.

The subnet looks like :

shared-network dhcp51-dsi {
subnet 192.168.51.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option routers 192.168.51.254;
option domain-name-servers 192.168.39.215, 192.168.12.215;
}

pool {
allow members of "oldStableClass";
failover peer "serv-net-adm1_serv-net-adms1";
deny dynamic bootp clients;
range 192.168.51.2 192.168.51.229;
}
pool {
allow members of "brandNewClass";
failover peer "serv-net-adm1_serv-net-adms1";
deny dynamic bootp clients;
range 192.168.51.230 192.168.51.241;
}
}

You can see there are 2 servers implied :
- serv-net-adm1
- serv-net-adms1
Our dhcp test client is a windows XP which NIC got setup with a dhcid
class like its thousand office neighbors around.

Here are strange things I witnessed :
- When restarting the daemon on adms1, the log file is showing the
shared-networks it will manage, but sometimes I see "dhcp51-dsi" missing...
- I tried to completely remove the failover setup, and to entrust
net-adm1 only to manage this subnet. The error message is changing into
"No free leases". This makes no sense to me as I think/know the pool is
far from full. I'm using the dhcpd-pools to confirm it and it agrees
with me.

I had a (read-only) look at the lease file, and it does not show any
reference to this subnet, though it appears in the log file when
restarting this server (net-adm1).

I must admit I'd be glad to get some advices on things to check, even
the obvious ones I may have skipped.

You'll find below the versions used and you'll note these are not recent
ones, but I think that is not relevant because this whole setup is
running fine as is since years and no additional weird features is
required there.

- Redhat RHEL 5.3 64 bits
- isc dhcp 3.0.5-18.el5

Regards,
--
Nicolas Ecarnot
Sten Carlsen
2014-02-18 17:32:21 UTC
Permalink
Post by Nicolas Ecarnot
Hi,
We have setup a complex DHCP infrastructure 5 years ago, and it is very
- 10 rhel5 servers, serving around 150 shared-networks for 150 physical
sites
- in every network resides mostly one pool, sometimes 2, and rarely more
- every pool is "failovered" by a couple of two physical servers
- we are using DhcId classes successfully
- our router are using dhcp-helper setup to correctly relay the dhcp traffic
Tests were made, time has passed, all this setup is SVN-versionned, we
frequently change things/values/subnets, but we never change the basics
described above, we use it every day and we think we have a clear view
of it. It is all working FINE.
Today, in a subnet, I just wanted to add a pool which access is
restricted to a new class. As usual, I wanted this pool to be manageable
by a server and its failover peer.
What I get is the good old "peer holds all free leases" error, that I
learned NOT to trust since years, as it appeared even when not related
at all to leases issues.
shared-network dhcp51-dsi {
subnet 192.168.51.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option routers 192.168.51.254;
option domain-name-servers 192.168.39.215, 192.168.12.215;
}
pool {
allow members of "oldStableClass";
failover peer "serv-net-adm1_serv-net-adms1";
deny dynamic bootp clients;
range 192.168.51.2 192.168.51.229;
}
pool {
allow members of "brandNewClass";
failover peer "serv-net-adm1_serv-net-adms1";
deny dynamic bootp clients;
range 192.168.51.230 192.168.51.241;
}
}
- serv-net-adm1
- serv-net-adms1
Our dhcp test client is a windows XP which NIC got setup with a dhcid
class like its thousand office neighbors around.
- When restarting the daemon on adms1, the log file is showing the
shared-networks it will manage, but sometimes I see "dhcp51-dsi" missing...
- I tried to completely remove the failover setup, and to entrust
net-adm1 only to manage this subnet. The error message is changing into
"No free leases". This makes no sense to me as I think/know the pool is
far from full. I'm using the dhcpd-pools to confirm it and it agrees
with me.
Sounds a bit like a CLASS issue. This is what I would expect if for some
reason the host did not match either class.
I might test with a third pool that is denied for both classes, just to
see if that would catch it.
Post by Nicolas Ecarnot
I had a (read-only) look at the lease file, and it does not show any
reference to this subnet, though it appears in the log file when
restarting this server (net-adm1).
I must admit I'd be glad to get some advices on things to check, even
the obvious ones I may have skipped.
You'll find below the versions used and you'll note these are not recent
ones, but I think that is not relevant because this whole setup is
running fine as is since years and no additional weird features is
required there.
- Redhat RHEL 5.3 64 bits
- isc dhcp 3.0.5-18.el5
Regards,
--
Best regards

Sten Carlsen

No improvements come from shouting:

"MALE BOVINE MANURE!!!"
Peter Rathlev
2014-02-18 21:10:15 UTC
Permalink
Post by Nicolas Ecarnot
- isc dhcp 3.0.5-18.el5
I moved this to the top, since it's IMHO the most important thing. This
is an _ancient_ release. Upgrading could solve many strange problems.
Post by Nicolas Ecarnot
We have setup a complex DHCP infrastructure 5 years ago, and it is very
Just to rub it in: DHCPd 3.0.5 was release in 2006. That's a lot more
than five years ago. ;-)
Post by Nicolas Ecarnot
shared-network dhcp51-dsi {
subnet 192.168.51.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option routers 192.168.51.254;
option domain-name-servers 192.168.39.215, 192.168.12.215;
}
pool {
allow members of "oldStableClass";
failover peer "serv-net-adm1_serv-net-adms1";
deny dynamic bootp clients;
range 192.168.51.2 192.168.51.229;
}
...

I'm not an authority on this, but I have always placed "pool" statements
inside the "subnet" statements. I don't know if one is supposed to do
that but it looks more intuitive for me. (Inheritance shouldn't be a
problem with your method though AFAIK.)
Post by Nicolas Ecarnot
I had a (read-only) look at the lease file, and it does not show any
reference to this subnet, though it appears in the log file when
restarting this server (net-adm1).
Hmm... At least on ISC DHCP v4.x I would expect "unknown network
segment" if the pool just didn't exist. OTOH if it's not in the leases
file (which every address free or not would be in a failover setup (on
v4 at least)) then something is really wrong.

How many "subnet" statements and "pool" statements do you have? Could
you have run into some kind of limit in the software? Does adding yet
another subnet change things in any way?
--
Peter
Glenn Satchell
2014-02-19 01:58:14 UTC
Permalink
Post by Peter Rathlev
Post by Nicolas Ecarnot
- isc dhcp 3.0.5-18.el5
I moved this to the top, since it's IMHO the most important thing. This
is an _ancient_ release. Upgrading could solve many strange problems.
Agree that it's ancient, but unfortunately this is the version that
shipped with every release of RHEL and CentOS 5.x

It was addressed in dhcpd v4 with the 4.1 extended support version that
picks up all the bug fixes in later dhcpd v4 releases.

regards,
-glenn
Nicolas Ecarnot
2014-02-19 13:22:13 UTC
Permalink
Post by Glenn Satchell
Post by Peter Rathlev
Post by Nicolas Ecarnot
- isc dhcp 3.0.5-18.el5
I moved this to the top, since it's IMHO the most important thing. This
is an _ancient_ release. Upgrading could solve many strange problems.
Agree that it's ancient, but unfortunately this is the version that
shipped with every release of RHEL and CentOS 5.x
It was addressed in dhcpd v4 with the 4.1 extended support version that
picks up all the bug fixes in later dhcpd v4 releases.
regards,
-glenn
_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users
Thanks to all who replied. We are aware of our archeologic datacenter
and on the way to upgrade it soon.
I'll test the embedding of the pool in the subnet declaration to see
whether it improves the situation (and let you know).

have a nice day.
--
Nicolas Ecarnot
Doug Barton
2014-02-19 18:11:23 UTC
Permalink
Post by Nicolas Ecarnot
Thanks to all who replied. We are aware of our archeologic datacenter
and on the way to upgrade it soon.
I'll test the embedding of the pool in the subnet declaration to see
whether it improves the situation (and let you know).
Pools in the subnets they refer to is a good habit in any case, and may
even be mandatory in 4.x.

Meanwhile a possibility I haven't seen mentioned yet is that because
your second pool is so small if you have any DHCP reservations in it, or
there are static hosts in the range that answer ICMP, it will adversely
affect the ability of the servers to balance.

hth,

Doug
Glenn Satchell
2014-02-20 00:33:03 UTC
Permalink
Post by Nicolas Ecarnot
Post by Glenn Satchell
Post by Peter Rathlev
Post by Nicolas Ecarnot
- isc dhcp 3.0.5-18.el5
I moved this to the top, since it's IMHO the most important thing. This
is an _ancient_ release. Upgrading could solve many strange problems.
Agree that it's ancient, but unfortunately this is the version that
shipped with every release of RHEL and CentOS 5.x
It was addressed in dhcpd v4 with the 4.1 extended support version that
picks up all the bug fixes in later dhcpd v4 releases.
regards,
-glenn
Thanks to all who replied. We are aware of our archeologic datacenter
and on the way to upgrade it soon.
I'll test the embedding of the pool in the subnet declaration to see
whether it improves the situation (and let you know).
have a nice day.
--
Nicolas Ecarnot
Pool statements are ok in the shared subnet, they will work fine as they
are. Thinking back to an ancient 3.0.something.RC there was a bug and the
workaround was to move pools out of the subnet into the shared subnet.
Long since fixed, but 5 years ago it was probably something recent enough
to be relevant.

Are you able to post the class definition statements? Seems to me that
some clients are not matching either of your classes and there is no
default pool so in this case "no free leases" means "I couldn't find a
matching pool".

regards,
-glenn
Nicolas Ecarnot
2014-02-20 09:37:08 UTC
Permalink
Post by Glenn Satchell
Post by Nicolas Ecarnot
Post by Glenn Satchell
Post by Peter Rathlev
Post by Nicolas Ecarnot
- isc dhcp 3.0.5-18.el5
I moved this to the top, since it's IMHO the most important thing. This
is an _ancient_ release. Upgrading could solve many strange problems.
Agree that it's ancient, but unfortunately this is the version that
shipped with every release of RHEL and CentOS 5.x
It was addressed in dhcpd v4 with the 4.1 extended support version that
picks up all the bug fixes in later dhcpd v4 releases.
regards,
-glenn
Thanks to all who replied. We are aware of our archeologic datacenter
and on the way to upgrade it soon.
I'll test the embedding of the pool in the subnet declaration to see
whether it improves the situation (and let you know).
have a nice day.
--
Nicolas Ecarnot
Pool statements are ok in the shared subnet, they will work fine as they
are. Thinking back to an ancient 3.0.something.RC there was a bug and the
workaround was to move pools out of the subnet into the shared subnet.
Long since fixed, but 5 years ago it was probably something recent enough
to be relevant.
Hi,

Today, I tried to place the pools declaration *inside* the subnet
declaration and tried the same tests.
No success...
It fail with the same error as usual.

During these tests, I check the host is *actually* sending the correct
classID (tcpdump is witnessing it).
Post by Glenn Satchell
Are you able to post the class definition statements? Seems to me that
some clients are not matching either of your classes and there is no
default pool so in this case "no free leases" means "I couldn't find a
matching pool".
The relevant part of the class declaration is :

class "c-prtest" {
match if substring (option user-class,1,6) = "\0prtest";
}

and there are 18 such definitions working since years. I hope 18 is not
a number high enough to hit a design limit.

I don't know if there is a way I could increase the debug level and
maybe help you help me by providing additionnal logs?

Regards,
--
Nicolas Ecarnot
Glenn Satchell
2014-02-20 10:29:04 UTC
Permalink
Post by Nicolas Ecarnot
Post by Glenn Satchell
Post by Nicolas Ecarnot
Post by Glenn Satchell
Post by Peter Rathlev
Post by Nicolas Ecarnot
- isc dhcp 3.0.5-18.el5
I moved this to the top, since it's IMHO the most important thing. This
is an _ancient_ release. Upgrading could solve many strange problems.
Agree that it's ancient, but unfortunately this is the version that
shipped with every release of RHEL and CentOS 5.x
It was addressed in dhcpd v4 with the 4.1 extended support version that
picks up all the bug fixes in later dhcpd v4 releases.
regards,
-glenn
Thanks to all who replied. We are aware of our archeologic datacenter
and on the way to upgrade it soon.
I'll test the embedding of the pool in the subnet declaration to see
whether it improves the situation (and let you know).
have a nice day.
--
Nicolas Ecarnot
Pool statements are ok in the shared subnet, they will work fine as they
are. Thinking back to an ancient 3.0.something.RC there was a bug and the
workaround was to move pools out of the subnet into the shared subnet.
Long since fixed, but 5 years ago it was probably something recent enough
to be relevant.
Hi,
Today, I tried to place the pools declaration *inside* the subnet
declaration and tried the same tests.
No success...
It fail with the same error as usual.
During these tests, I check the host is *actually* sending the correct
classID (tcpdump is witnessing it).
Post by Glenn Satchell
Are you able to post the class definition statements? Seems to me that
some clients are not matching either of your classes and there is no
default pool so in this case "no free leases" means "I couldn't find a
matching pool".
class "c-prtest" {
match if substring (option user-class,1,6) = "\0prtest";
}
and there are 18 such definitions working since years. I hope 18 is not
a number high enough to hit a design limit.
I don't know if there is a way I could increase the debug level and
maybe help you help me by providing additional logs?
Hi Nicolas

Years ago I ran dhcpd 3.0.5 at a big site with about 100 subnets with
multiple pools in each, so I don't think 18 is a limit.

I can see one thing in that match line that doesn't look right. The
substring is producing a string of length 6, but you are comparing this to
a string of 7 characters. This will never be equal.

In the dhcp-eval man page it is defined as

substring (data-expr, offset, length)

I'm guessing that string on the right should just be "prtest".

You can get dhcpd to log variables into dhcpd.leases, eg:

add this in some relevant scope, say the subnet or pool

set myuserclass = option user-class;

Then it should add a line to dhcpd.leases.

If you're still getting no free leases, then try adding a third pool with

deny members of "c-prtest";
deny members of "theotherclass";

and see if that catches the test client. Then see what value is getting
stored in the lease value.

regards,
-glenn

Loading...