DHCP failover - disk full, can not commit to lease file

Discussion:

Louis Lau

2013-05-27 18:53:12 UTC

Dear all,

I have two DHCP servers configured as failover peer and the router relay the DHCP/Bootp request to both server. When the system works normaly, a DHCP discover from client will be broadcast to both server and the observed behavour is that both server will offer an DHCP OFFER to the same client, The primary DHCP server shall reply an IP-A managed by the Primary, and the Secondary shall reply an IP-B that is managed by the secondary. Assume the client select IP-A and send an DHCP Request for IP-A to both server.

However recently, primary server disk is full, and in the log there are a lot of

ommit_leases: unable to commit: No space left on device

We observed that the DHCP does not failover to the secondary in this case, and the primary will still response to DHCPDISCOVER and provide a DHCP Offer. And in this case, we observed that most client does not try the DHCP offer from secondary but most of them choose the IP assigned from the Primary DHCP server.

Is this behaviour normal?

Are there any configuration that can make the server fail to secondary DHCP server for similar case or let the client try the other DHCP OFFER when the first DHCP IP does not have a ACK?

Thanks for your help on this matter.

Louis Lau

Steven Carr

2013-05-27 22:56:37 UTC

Permalink

No, your disk is full (and you risk more issues than just DHCP
problems), DHCPD needs to be able to write the lease information to
disk, without that that it will not function. The DHCP failover
protocol does not take into account the resources on the local system
when assigning DHCP leases, it assumes your resources are adequate
enough for the job.

Post by Louis Lau
Dear all,
I have two DHCP servers configured as failover peer and the router relay the
DHCP/Bootp request to both server. When the system works normaly, a DHCP
discover from client will be broadcast to both server and the observed
behavour is that both server will offer an DHCP OFFER to the same client,
The primary DHCP server shall reply an IP-A managed by the Primary, and the
Secondary shall reply an IP-B that is managed by the secondary. Assume the
client select IP-A and send an DHCP Request for IP-A to both server.
However recently, primary server disk is full, and in the log there are a lot of
ommit_leases: unable to commit: No space left on device
We observed that the DHCP does not failover to the secondary in this case,
and the primary will still response to DHCPDISCOVER and provide a DHCP
Offer. And in this case, we observed that most client does not try the DHCP
offer from secondary but most of them choose the IP assigned from the
Primary DHCP server.
Is this behaviour normal?
Are there any configuration that can make the server fail to secondary DHCP
server for similar case or let the client try the other DHCP OFFER when the
first DHCP IP does not have a ACK?
Thanks for your help on this matter.
Louis Lau
_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

Doug Barton

2013-05-27 23:27:16 UTC

Permalink

I could make a pretty good argument that "My disk is full and I cannot
write leases" would be something that should trigger failover.

Doug

Post by Steven Carr
No, your disk is full (and you risk more issues than just DHCP
problems), DHCPD needs to be able to write the lease information to
disk, without that that it will not function. The DHCP failover
protocol does not take into account the resources on the local system
when assigning DHCP leases, it assumes your resources are adequate
enough for the job.

Ted Lemon

2013-05-28 01:26:17 UTC

Permalink

I could make a pretty good argument that "My disk is full and I cannot write leases" would be something that should trigger failover.

You would, IMHO, be correct.

Louis Lau

2013-05-28 02:34:21 UTC

Permalink

Would there be any difference in behavior if the DHCP relay to dedicated IP of the DHCP server(s) or broadcast? Because if the DHCP offer response is in round robin response to the client, the client may request for the secondary OFFER IP if the first attempt fail. I was expecting the OFFER shall be replied to client one at a time instead of letting client choose which OFFER to take.

Louis Lau

Senior Consultant
NEC Hong Kong Limited
________________________________________________
25/F, The Metropolis Tower, 10 Metropolis Drive, Hunghom, Kowloon, Hong Kong
Tel: +852 2733 5561
Fax: +852 2733 5419
Mobile: +852 6330 1850/ +852 6136 1712
Email: ***@nechk.nec.com.hk
________________________________________________
SAVE PAPER - THINK BEFORE YOU PRINT!
Disclaimer:
This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended addressee, or the person responsible for delivering it to them, you may not copy, forward, disclose or otherwise use it or any part of it in any way that may be considered unlawful. If you receive this e-mail by mistake, please advise the sender and delete it immediately.
Company is not responsible for any changes made to the message after it has been sent. Where opinions are expressed, they are not necessarily those of Company.

-----Original Message-----
From: dhcp-users-bounces+louis_lau=***@lists.isc.org [mailto:dhcp-users-bounces+louis_lau=***@lists.isc.org] On Behalf Of Ted Lemon
Sent: Tuesday, May 28, 2013 9:26 AM
To: Users of ISC DHCP
Subject: Re: DHCP failover - disk full, can not commit to lease file

I could make a pretty good argument that "My disk is full and I cannot write leases" would be something that should trigger failover.

You would, IMHO, be correct.

Glenn Satchell

2013-05-28 04:02:01 UTC

Permalink

DHCP is not a monitoring system. Why not use your existing, dedicated
monitoring system to detect the failure modes you care about, and get that
to trigger failover. This could be simple - turn off the dhcp service, or
put it in partner-down mode.

There are many possible scenarios where this would be appropriate, but is
this something the dhcp server needs to be able to handle? Is this the
best use of available dhcp developer time?

regards,
-glenn

Post by Doug Barton
I could make a pretty good argument that "My disk is full and I cannot
write leases" would be something that should trigger failover.
Doug

Post by Louis Lau
Dear all,
I have two DHCP servers configured as failover peer and the router relay the
DHCP/Bootp request to both server. When the system works normaly, a DHCP
discover from client will be broadcast to both server and the observed
behavour is that both server will offer an DHCP OFFER to the same client,
The primary DHCP server shall reply an IP-A managed by the Primary, and the
Secondary shall reply an IP-B that is managed by the secondary. Assume the
client select IP-A and send an DHCP Request for IP-A to both server.
However recently, primary server disk is full, and in the log there are
a
lot of
ommit_leases: unable to commit: No space left on device
We observed that the DHCP does not failover to the secondary in this case,
and the primary will still response to DHCPDISCOVER and provide a DHCP
Offer. And in this case, we observed that most client does not try the DHCP
offer from secondary but most of them choose the IP assigned from the
Primary DHCP server.
Is this behaviour normal?
Are there any configuration that can make the server fail to secondary DHCP
server for similar case or let the client try the other DHCP OFFER when the
first DHCP IP does not have a ACK?
Thanks for your help on this matter.
Louis Lau

_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

Louis Lau

2013-05-28 04:25:52 UTC

Permalink

Yes agree. This can be monitored by monitoring system with some docuentation. Currently we may use HPOV or other monitoring script to monitor the DHCP log. But the output must match exactly or otherwise too many alarm will be triggered.

We noted that there are subscription service of ISC DHCP, would this cover the documentation for the monitoring part?

And also we want to know are there any application level fault the failover mechanism could handle. We have currently just know that if node down or communication down, the failover will kick in. not sure about what other condition would kick in the failover.

Louis Lau

-----Original Message-----
From: dhcp-users-bounces+louis_lau=***@lists.isc.org [mailto:dhcp-users-bounces+louis_lau=***@lists.isc.org] On Behalf Of Glenn Satchell
Sent: Tuesday, May 28, 2013 12:02 PM
To: Users of ISC DHCP
Subject: Re: DHCP failover - disk full, can not commit to lease file

DHCP is not a monitoring system. Why not use your existing, dedicated monitoring system to detect the failure modes you care about, and get that to trigger failover. This could be simple - turn off the dhcp service, or put it in partner-down mode.

There are many possible scenarios where this would be appropriate, but is this something the dhcp server needs to be able to handle? Is this the best use of available dhcp developer time?

regards,
-glenn

Post by Doug Barton
I could make a pretty good argument that "My disk is full and I cannot
write leases" would be something that should trigger failover.
Doug

Post by Louis Lau
Dear all,
I have two DHCP servers configured as failover peer and the router
relay the DHCP/Bootp request to both server. When the system works
normaly, a DHCP discover from client will be broadcast to both
server and the observed behavour is that both server will offer an
DHCP OFFER to the same client, The primary DHCP server shall reply
an IP-A managed by the Primary, and the Secondary shall reply an
IP-B that is managed by the secondary. Assume the client select IP-A
and send an DHCP Request for IP-A to both server.
However recently, primary server disk is full, and in the log there are a lot of
ommit_leases: unable to commit: No space left on device
We observed that the DHCP does not failover to the secondary in this
case, and the primary will still response to DHCPDISCOVER and
provide a DHCP Offer. And in this case, we observed that most
client does not try the DHCP offer from secondary but most of them
choose the IP assigned from the Primary DHCP server.
Is this behaviour normal?
Are there any configuration that can make the server fail to
secondary DHCP server for similar case or let the client try the
other DHCP OFFER when the first DHCP IP does not have a ACK?
Thanks for your help on this matter.
Louis Lau

_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

Sten Carlsen

2013-05-28 09:10:16 UTC

Permalink

You should monitor the disks, not the log?

Any system that uses disk space on that particular host needs free
space. The systems I have seen warn when less than 10% free disk space
is remaining, that is the point you should act on - free up disk space
or add new disks. This information is not available in dhcp-logs but a
well designed monitoring system will tell you.

Once you see the message from dhcp, the disaster is at the door, knocking.

Post by Louis Lau
Yes agree. This can be monitored by monitoring system with some docuentation. Currently we may use HPOV or other monitoring script to monitor the DHCP log. But the output must match exactly or otherwise too many alarm will be triggered.
We noted that there are subscription service of ISC DHCP, would this cover the documentation for the monitoring part?
And also we want to know are there any application level fault the failover mechanism could handle. We have currently just know that if node down or communication down, the failover will kick in. not sure about what other condition would kick in the failover.
Louis Lau
-----Original Message-----
Sent: Tuesday, May 28, 2013 12:02 PM
To: Users of ISC DHCP
Subject: Re: DHCP failover - disk full, can not commit to lease file
DHCP is not a monitoring system. Why not use your existing, dedicated monitoring system to detect the failure modes you care about, and get that to trigger failover. This could be simple - turn off the dhcp service, or put it in partner-down mode.
There are many possible scenarios where this would be appropriate, but is this something the dhcp server needs to be able to handle? Is this the best use of available dhcp developer time?
regards,
-glenn

Post by Doug Barton
I could make a pretty good argument that "My disk is full and I cannot
write leases" would be something that should trigger failover.
Doug

Post by Louis Lau
Dear all,
I have two DHCP servers configured as failover peer and the router
relay the DHCP/Bootp request to both server. When the system works
normaly, a DHCP discover from client will be broadcast to both
server and the observed behavour is that both server will offer an
DHCP OFFER to the same client, The primary DHCP server shall reply
an IP-A managed by the Primary, and the Secondary shall reply an
IP-B that is managed by the secondary. Assume the client select IP-A
and send an DHCP Request for IP-A to both server.
However recently, primary server disk is full, and in the log there are a lot of
ommit_leases: unable to commit: No space left on device
We observed that the DHCP does not failover to the secondary in this
case, and the primary will still response to DHCPDISCOVER and
provide a DHCP Offer. And in this case, we observed that most
client does not try the DHCP offer from secondary but most of them
choose the IP assigned from the Primary DHCP server.
Is this behaviour normal?
Are there any configuration that can make the server fail to
secondary DHCP server for similar case or let the client try the
other DHCP OFFER when the first DHCP IP does not have a ACK?
Thanks for your help on this matter.
Louis Lau

_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

--
Best regards

Sten Carlsen

No improvements come from shouting:
"MALE BOVINE MANURE!!!"

Glenn Satchell

2013-05-28 03:58:25 UTC

Permalink

Your monitoring system could always detect no disk free space and shut
down the dhcp service?

regards,
-glenn

Post by Louis Lau
Dear all,
I have two DHCP servers configured as failover peer and the router relay
the DHCP/Bootp request to both server. When the system works normaly, a
DHCP discover from client will be broadcast to both server and the
observed behavour is that both server will offer an DHCP OFFER to the same
client, The primary DHCP server shall reply an IP-A managed by the
Primary, and the Secondary shall reply an IP-B that is managed by the
secondary. Assume the client select IP-A and send an DHCP Request for IP-A
to both server.
However recently, primary server disk is full, and in the log there are a lot of
ommit_leases: unable to commit: No space left on device
We observed that the DHCP does not failover to the secondary in this case,
and the primary will still response to DHCPDISCOVER and provide a DHCP
Offer. And in this case, we observed that most client does not try the
DHCP offer from secondary but most of them choose the IP assigned from the
Primary DHCP server.
Is this behaviour normal?
Are there any configuration that can make the server fail to secondary
DHCP server for similar case or let the client try the other DHCP OFFER
when the first DHCP IP does not have a ACK?
Thanks for your help on this matter.
Louis Lau
_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

Ted Lemon

2013-05-28 12:19:40 UTC

Permalink

Post by Glenn Satchell
Your monitoring system could always detect no disk free space and shut
down the dhcp service?

The DHCP server got an error when it tried to write the lease, so it knows that it's down. Why wait for a mythical monitoring system to do something?

Steven Carr

2013-05-28 13:27:02 UTC

Permalink

How does the DHCP server know what the cause of the error is and why
would it care? all it knows is that it had an problem writing to the
lease file which could be caused by any number of issues not just a
full disk. DHCP failover isn't mean't to guard against your server
running into problems, it's there to failover in the event of
communications failure between the network and one of the DHCP
servers. Yes you could argue that if it can't write the lease it
should failover, but your system is in much bigger trouble if you
don't have any free disk space so the DHCP issue becomes moot.

As has already been said you should be monitoring the server and
alerting if you ever get anywhere near filling a disk, it's called
server management/system administration.

Post by Ted Lemon

Post by Glenn Satchell
Your monitoring system could always detect no disk free space and shut
down the dhcp service?

The DHCP server got an error when it tried to write the lease, so it knows that it's down. Why wait for a mythical monitoring system to do something?
_______________________________________________
dhcp-users mailing list
https://lists.isc.org/mailman/listinfo/dhcp-users

Ted Lemon

2013-05-28 13:40:26 UTC

Permalink

Post by Steven Carr
How does the DHCP server know what the cause of the error is and why
would it care? all it knows is that it had an problem writing to the
lease file which could be caused by any number of issues not just a
full disk. DHCP failover isn't mean't to guard against your server
running into problems, it's there to failover in the event of
communications failure between the network and one of the DHCP
servers.

Failover is there to allow one server to continue operating when the other fails, whether that's from a full disk or a network outage. In cases where the server can detect that it has failed, for whatever reason, it's entirely appropriate for it to indicate to its peer that it is no longer able to serve leases, and there's a provision for that in the failover protocol.

It doesn't matter whether the server failed to write the lease because the disk was full or because of a hardware or permission error: what matters is that the server definitely can't make progress until something changes. Gracefully handling this state in a failover peering environment is clearly in scope for failover. Sure, the server could punt, and indeed when I wrote the code that you are seeing the problem with, I never got around to adding this feature. But that was because I had other code I needed to write, not because I thought it was a bad idea.

Niall O'Reilly

2013-05-28 13:41:05 UTC

Permalink

It shouldn't care about the cause of the error, but it
should care that it's no longer working to specification
since it no longer can write to persistent storage.

Why would the message, "I'm down; carry on without me" to
the failover partner be inappropriate in this case?

ATB
Niall O'Reilly

A***@lboro.ac.uk

2013-05-28 14:52:09 UTC

Permalink

Hi,

Post by Niall O'Reilly
It shouldn't care about the cause of the error, but it
should care that it's no longer working to specification
since it no longer can write to persistent storage.
Why would the message, "I'm down; carry on without me" to
the failover partner be inappropriate in this case?

many server daemons have similar problems....eg the default config for
FreeRADIUS if using logging will fail to authenticate the user if it cannot log
due to a full disk... the argument goes along the lines of not wanting
to let someone on to the network if theres no logging. would you want to
hand out an address via DHCP if it cannot be stored/maintained in lease table etc.

however, something like 'munin' checking on the host for disk space and tied
up to a script to launch some OMAPI script if disk space reaches ag 99%

eg

https://kb.isc.org/article/AA-00475/0/Sending-a-Server-Shutdown-Message-Via-OMAPI.html

alan