UNIX MATRIX - Where there is a shell, there is a way

UNIX,cluster, Solaris, SVM,ZFS, AIX Booting,Solaris basic,soft partition,Storage Area Network, NetAPP storage, Solaris zone, server installation, server support, AIX architecture, cluster availability, cluster guide,cluster requirements,SUN, AIX,NFS,Solaris troubleshooting

UNIX Document Center

Hardware Setup and Assumptions

This guide is intended primarily to get the basic Solaris 9 Operating Environment set up and to configure the FC3510 array for use. It stops short of setting up the partitioning scheme used by ON-Center, but leaves the system in a state where this is easily done as the next step.
This guide assumes that the FC3510 only has a single controller, and is connected to the host with a single Fibre Channel HBA. The HBA should be connected to Port 0 of the FC3510 controller.

Operating System Setup

This section guides you through the installation and configuration of Solaris 9.

Operating System Installation
Install Solaris 9 on the server. Keep the following in mind:

  • Just install Solaris - don't worry about Extra Value Software, the Software Companion, or any extra products.
  • Select "Entire Distribution plus OEM support" as the software group to install.
  • Allocate only system partitions during the install (/, /var, and swap). Choose Manual Layout and use the example below.
  • Use only the first disk for system partitions, as we will encapsulate and mirror these to the second disk later with Solaris Volume Manager.
  • The system should have two internal physical disks. Leave the bulk of the free space on the first disk unallocated. We can use this space for additional soft partitions later, if needed.

Example disk configuration for a system with two 146GB drives (this is just the first disk, we will set up the second disk after install):

Part Tag Size

Slice 0 root 30.00GB # /

Slice 1 swap 8.00GB # swap space

Slice 2 backup 146.35GB # whole disk

Slice 3 var 10.00GB # /var

Slice 4 unassigned 8.00GB # dump device

Slice 5 unassigned 0

Slice 6 unassigned 90.00GB # soft partitions mirror

Slice 7 unassigned 34.78MB # SVM state database

  • Configure networking on the server as appropriate.
  • Configure the network-based Service Processor for future administration. This is Highly Recommended.

Disk Configuration

We use Solaris Volume Manager (previously Solstice) to provide redunancy and management for our system. For this description, the first disk (primary mirror) is c1t0d0 and the second disk (secondary mirror) is c1t1d0.

  • Duplicate the slice layout of the boot disk on the drive that will be its mirror:

# prtvtoc /dev/rdsk/c1t0d0s2 fmthard -s - /dev/rdsk/c1t1d0s2

  • Configure the new dedicated dump device. This is done so that we don't have to depend on a working swap mirror to retrieve a kernel core dump:

# dumpadm -d /dev/dsk/c1t0d0s4

  • Initialize the state database replicas for Solaris Volume Manager. This gives us four state replicas, two per disk:

# metadb -a -f -c 2 c1t0d0s7 # metadb -a -c 2 c1t1d0s7

  • Create the /, swap, and /var volumes as follows:

# metainit -f d10 1 1 c1t0d0s0

# metainit -f d11 1 1 c1t0d0s1

# metainit -f d13 1 1 c1t0d0s3

# metainit d20 1 1 c1t1d0s0

# metainit d21 1 1 c1t1d0s1

# metainit d23 1 1 c1t1d0s3

# metainit d0 -m d10

# metainit d1 -m d11

# metainit d3 -m d13

  • Set up the system to boot from the mirror: # metaroot d0
  • Change the entries for swap and /var in /etc/vfstab to point to their new locations (/dev/md/dsk/d1, /dev/md/dsk/d3). Don't forget to change the /dev/rdsk entries to /dev/md/rdsk as well.
  • Write down the name of the second disk in the mirror indicated in bold below:

# ls -l /dev/rdsk/c1t1d0s0 lrwxrwxrwx 1 root root 47 Feb 22 11:38 /dev/rdsk/c1t1d0s0 -> devices/pci@1c,600000/scsi@2/sd@1,0:a,raw

  • Halt the system:

# init 0

  • Create an OpenBoot PROM alias for the bootable disks. Some controllers require you to replace sd with disk; you can verify this with the show-disks command before running the nvalias commands. For the V210 and the V240, it appears we should make this substitution. Also, remove ,raw from the end of the device name. You can also look at the output of devalias for examples; in particular, look at disk0 and disk1.

ok nvalias bootdisk /pci@1c,600000/scsi@2/disk@0,0:a

ok nvalias mirrdisk /pci@1c,600000/scsi@2/disk@1,0:a

ok setenv boot-device bootdisk mirrdisk

ok boot
Once booted, attach the submirrors to complete the mirroring:

# metattach d0 d20

# metattach d1 d21

# metattach d3 d23

* Mirror the remaining free space on the disks for future soft partitions. Later, you can allocate soft partitions from d6 for whatever you like:

# metainit d16 1 1 c1t0d0s6

# metainit d26 1 1 c1t1d0s6

# metainit d6 -m d16

# metattach d6 d26

  • For example, to allocate a 10g soft partition that would be mounted at /data:

# metainit d100 -p d6 10g

# newfs -m 0 /dev/md/dsk/d100

# mkdir /data

  • Then, edit /etc/vfstab to include the new mountpoint and set it to mount during boot.

Patching

Now we will install the Solaris 9 Recommended and Security patches, last updated Feb 21, 2007. Download 9_Recommended_Security-20070221.zip. Once it's on the system, run the following commands to install:

# unzip -d 9_Recommended_Security-20070221 9_Recommended_Security-20070221.zip ... unzip output ...

# cd 9_Recommended_Security-20070221/9_Recommended

# ./install_cluster

Answer `y' to start installing patches.
Many of the patches will fail to install, but this is (usually) because the patch has already been applied or is not needed on the system. Check the patch log at /var/sadm/install_data/Solaris_9_Recommended_Patch_Cluster_log for details of the process.

FC3510 Setup

Here we install the software required to access the FC3510 storage array, and then we configure it.

SAN 4.4.12 Installation

Next we install the SAN software, which includes drivers for the Fibre Channel host adapter.

Download SAN 4.4.12: SAN_4.4.12_install_it.tar.Z. Once it's on the system, run the following commands to install:

# zcat SAN_4.4.12_install_it.tar.Z tar xvf - ... tar output ...

# cd cd SAN_4.4.12_install_it

# ./install_it

Answer `y' to start installing the software.
If the installation succeeds, reboot the system:

# shutdown -y -i6 -g0

It would be wise to watch the system console while it reboots in case errors appear during boot related to the new patches.

StorEdge 3000 Family Software Installation

The Sun StorEdge 3000 Family Software includes the command line utility (sccli) used to manage the external array.
Download the Sun StorEdge 3000 Family Software:
2.2_sw_solaris-sparc.zip and 2.3_smis_provider.zip. Once on the system, run the following commands to install:

# unzip -d 2.2_sw_solaris-sparc 2.2_sw_solaris-sparc.zip ... unzip output ...

# pkgadd -d 2.2_sw_solaris-sparc/solaris/sparc Answer `all' to install all packages, and then `y' to any questions. # unzip -d 2.3_smis_provider 2.3_smis_provider.zip ... unzip output ... # pkgadd -d 2.3_smis_provider Answer `all' to install all packages, and then `y' to any questions.

FC3510 Firmware Upgrade

Now we will upgrade the firmware on the FC3510.
Download patch 113723-15:
113723-15.zip. Decompress the patch with unzip and read the section entitled "Patch Installation Instructions" inside of README.113723-15. This README documents the upgrade steps better than I could do here.

Array Configuration

We need to delete any existing LUN mappings, and then delete the logical drives themselves. Run sccli to enter the configuration tool. It should connect to the 3510 automatically.
First, display the current LUN mappings (this is an example and may not match what you see): sccli> show lun-maps

Ch Tgt LUN ld/lv ID-Partition Assigned Filter Map

-----------------------------------------------------------------0 40 0 ld0 1A6C4238-00 Primary
For each LUN mapping, run unmap Ch.Tgt.LUN. Do this starting with the highest numbered LUNs and work your way down to 0. For example, using the above output:

sccli> unmap 0.40.0

sccli>
Exit sccli and run the following:

# devfsadm -Cv ... output regarding device changes, if any ...

Now restart sccli, and display the logical drives:

sccli> show logical-drives

LD LD-ID Size Assigned Type Disks Spare Failed Status ------------------------------------------------------------------------

ld0 1A6C4238 58.59GB Primary RAID0 2 0 0 Good Write-Policy Default StripeSize 128KB

And delete each logical drive by running delete logical-drive LD for each logical drive. For example:

sccli> delete logical-drive ld0

This operation will result in the loss of all data on the logical drive.

Are you sure? y

sccli: ld0: deleted logical drive
Now we have a clean array, ready for a new configuration. We have 6 73GB drives in each array, we will configure 5 of the disks into a RAID5 configuration with a spare sixth disk. This should give us about 292GB of usable storage. Note the following examples were done using only 5 disks, so the size numbers will be less than a 6 disk system. Make sure that your configuration uses all but 1 disk for the RAID5 array and the last disk for a spare.

First, we configure SCSI channel 0. This setting should be the same as the default config, but we'll make sure:

sccli> configure channel 0 host pid 40 --reset

sccli: shutting down controller...

sccli: controller is shut down

sccli: resetting controller... sccli: controller has been reset

Now we will set the cache parameters for the array. Since this is a single-controller array, we need to make sure we're using write-through caching, as write-back caching is dangerous without a redundant controller. Also, we set the array to optimize for random access:

sccli> set cache-parameters random write-through

Changes will not take effect until controller is reset

Do you want to reset the controller now? y

sccli: resetting controller...

sccli: controller has been reset

Now we're ready to create our logical disk. Type the following commands into sccli to display the disks in the system, configure a RAID5 logical disk, and configure a global spare drive. Remember, make sure that your configuration uses all but 1 disk for the RAID5 array and the last disk for a spare:

sccli> show disks

Ch Id Size Speed LD Status IDs

Rev ----------------------------------------------------------------------------

2(3) 0 68.37GB 200MB NONE FRMT FUJITSU MAT3073F SUN72G 0602 S/N000513B02RF7 WWNN

500000E010FC3EF0

2(3) 1 68.37GB 200MB NONE FRMT FUJITSU MAT3073F SUN72G 0602 S/N 000512B02DYP WWNN 500000E010F8CF60

2(3) 2 68.37GB 200MB NONE FRMT FUJITSU MAT3073F SUN72G 0602 S/N 000512B02E3S WWNN 500000E010F8D410

2(3) 3 68.12GB 200MB NONE NEW FUJITSU MAT3073F SUN72G 0602 S/N 000513B02RN8 WWNN 500000E010FC8500

2(3) 4 68.37GB 200MB NONE FRMT FUJITSU MAT3073F SUN72G 0602 S/N 000514B02VRY WWNN 500000E010FE8100

sccli> create logical-drive raid5 2.0,2.1,2.2,2.3 primary global-spare 2.4

sccli> map ld0 0.40.0 sccli> show disks

Ch Id Size Speed LD Status IDs

Rev ----------------------------------------------------------------------------

2(3) 0 68.37GB 200MB ld0 ONLINE FUJITSU MAT3073F SUN72G 0602 S/N 000513B02RF7 WWNN 500000E010FC3EF0

2(3) 1 68.37GB 200MB ld0 ONLINE FUJITSU MAT3073F SUN72G 0602 S/N 000512B02DYP WWNN 500000E010F8CF60

2(3) 2 68.37GB 200MB ld0 ONLINE FUJITSU MAT3073F SUN72G 0602 S/N 000512B02E3S WWNN 500000E010F8D410

2(3) 3 68.37GB 200MB ld0 ONLINE FUJITSU MAT3073F SUN72G 0602 S/N 000513B02RN8 WWNN 500000E010FC8500

2(3) 4 68.37GB 200MB GLOBAL STAND-BY FUJITSU MAT3073F SUN72G 0602 S/N 000514B02VRY WWNN 500000E010FE8100

sccli> show logical-drives

LD LD-ID Size Assigned Type Disks Spare Failed Status ------------------------------------------------------------------------

ld0 7D1F7008 204.35GB Primary RAID5 4 1 0 Good I Write-Policy Default StripeSize 32KB

sccli> show map
We are now done with the array configuration. Exit =sccli= and run:

# devfsadm -Cv

... output regarding device changes, if any ...

Host Configuration

We will now configure the drive array for access from the host system.

Slice Setup

Now, when you run format, you should see the new device and should be able to configure it:

# format

Searching for disks...done

c2t40d0: configured with capacity of 204.34GB

AVAILABLE DISK SELECTIONS:

0. c1t0d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f6cd7,0 1. c1t1d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100000c50010967,0 2. c2t40d0 /pci@9,600000/SUNW,qlc@1,1/fp@0,0/ssd@w216000c0ff88655a,0 Specify disk (enter its number): 2

selecting c2t40d0

[disk formatted]

Disk not labeled. Label it now? y

... format menu, choose `p' and `p' again ...
And there's the disk, right on target 40 where we put it. You should now see the slices in the new LUN. Reconfigure the slices such that all space is given to the first slice (i.e., it matches the s2 slice). It should look something like this:

partition> p

Current partition table (unnamed):

Total disk cylinders available: 52723 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks

0 unassigned wm 0 - 52722 20 4.34GB (52723/0/0)

428532544

1 unassigned wm 0 0 (0/0/0) 0

2 backup wu 0 - 52722 204.34GB (52723/0/0)

428532544

3 unassigned wm 0 0 (0/0/0) 0

4 unassigned wm 0 0 (0/0/0) 0

5 unassigned wm 0 0 (0/0/0) 0

6 unassigned wm 0 0 (0/0/0) 0

7 unassigned wm 0 0 (0/0/0) 0

label the disk and exit the format utility.

Solaris Volume Manager Setup

We need to add another set of SVM state replicas to the new disk and create an SVM submirror so we can allocate soft partitions:

# metadb -a -c 2 c2t40d0s0

# metainit d7 1 1 c2t40d0s0

At this point we should have two SVM devices suitable for soft partition allocation: d6 and d7. d6 has the extra space from the disks internal to the server, and d7 has the entire RAID5 array from the FC3510. For example, to allocate a 50GB soft partition from the RAID5 array (we'll call it d101), you would run the following command:

# metainit d101 -p d7 50g
You could then run newfs on /dev/md/dsk/d101 and otherwise treat it as a standard block device.

DNS CONCEPTS:

Why do we have DNS servers?
Without a Name Service there would simply not be a visible Internet. To understand why, we need to look at what DNS does and how and why it evolved.
1. A DNS translates (or maps) the name of a resource to its physical IP address
2. A DNS can also translate the physical IP address to the name of a resource by using reverse look-up or mapping.
The Internet (or any network for that matter) works by allocating every point (host, server, router, interface etc.) a physical IP address (which may be locally unique or globally unique).

DNS CONCEPTS AND IMPLEMENTATION

The domain name system is a global network of servers that translate host names like http://www.unixmatrix.blogspot.com/ into numerical IP (Internet Protocol) addresses, like 10.15.25.0, which computers on the Net use to communicate with each other. Without DNS, we'd all be memorizing long numbers instead of intuitive URLs or email addresses.

The domain name space

In order to understand how a DNS server works, you should be familiar with what is called the domain name space. Domain Name Space looks like this:




Fig 1.1

Each node on the tree represents a domain. Everything below a node falls into its domain. One domain can be part of another domain. For example, the machine chichi is part of the .us domain as well as the .com domain.
The Internet's Domain Name Service (DNS) is just a specific implementation of the Name Server concept optimized for the prevailing conditions on the Internet.

DNS OVERVIEW

From the history of name servers three needs emerged:
1. The need for a hierarchy of names.
2. The need to spread the operational loads on our name servers
3. The need to delegate the administration of our name servers.

DOMAINS AND DELEGATIONS

The Domain Name System uses a tree (or hierarchical) name structure. At the top of the tree is the root followed by the Top Level Domains (TLDs) then the domain-name and any number of lower levels each separated with a dot.

NOTE: The root of the tree is represented most of the time as a silent dot ('.')

Top Level Domains (TLDs) are split into two types:
1. Generic Top Level Domains (gTLD) .com, .edu, .net, .org, .mil etc.
2. Country Code Top Level Domain (ccTLD) e.g. .us, .ca, .tv , .uk etc.
Country Code TLDs (ccTLDs) use a standard two letter sequence defined by ISO 3166.
The following figure shows this:




Fig 5.2

What is commonly called a 'Domain Name' is actually a combination of a domain-name and a TLD and is written from LEFT to RIGHT with the lowest level in the hierarchy on the left and the highest level on the right.

domain-name.tld e.g. example.com

In the case of the gTLDs e.g. .com, .net etc. the user part of the delegated name - the name the user registered - is called a Second Level Domain (SLD), it is the second level in the hierarchy. The user part is frequently simply referred to as the SLD. So the Domain name in the example above can be re-defined to consist of:

sld.tld e.g. example.com

The term Second Level Domain (SLD) is much less useful with ccTLDs where the user registered part is frequently the Third Level Domain e.g.:

example.co.in

example.com.br

The term Second Level Domain (SLD) provides technical precision but can be confusing - unless the precision is required we will continue to use the generic term Domain Name or simply Domain to the whole name e.g. a Domain Name is example.com or example.co.in.

Authority and Delegation

The concepts of Delegation and Authority lie at the core of the domain name system hierarchy. The Authority for the root domain lies with Internet Corporation for Assigned Numbers and Names (ICANN). Since 1998 ICANN, a non-profit organisation, has assumed this responsibility from the US government.
The gTLDs are
authoritatively administered by ICANN and delegated to a series of accredited registrars. The ccTLDs are delegated to the individual countries for administration purposes. Figure
5.2 above shows how any authority may in turn delegate to lower levels in the hierarchy, in other words it may delegate anything for which it is authoritative. Each layer in the hierarchy may delegate the authoritative control to the next lower level.

Countries with more centralized governments, like India and others, have opted for functional segmentation in their delegation models e.g. .co = company, .ac = academic etc.). Thus mycompany.co.in is the 'Domain Name' of 'mycompany' registered as a company from the UK registration authority.
By reading a domain name from RIGHT to LEFT you can track its delegation. This unit of delegation is usually referred to as a 'zone' in standards documentation.

DNS ORGANISATION AND STRUCTURE

The Internet's DNS exactly maps the 'Domain Name' delegation structure described above. There is a DNS server running at each level in the delegated hierarchy and the responsibility for running the DNS lies with the AUTHORITATIVE control at that level.
Figure 5.3 shows this diagrammatically.



Figure 5.3 DNS mapped to Domain Delegation

The Root Servers (Root DNS) are the responsibility of ICANN but operated by a consortium under a delegation agreement. ICANN created the Root Servers Systems Advisory Committee (RSSAC) to provide advice and guidance as to the operation and development of this critical resource. The IETF was requested by the RSSAC to develop the engineering standards for operation of the Root-Servers. This request resulted in the publication of RFC 2870.
There are currently (mid 2003) 13 root-servers world-wide.

The Root-Servers are known to every public DNS server in the world.
The TLD servers (ccTLD and gTLD) are operated by a variety of agencies and registrars under a fairly complex set of agreements by Registry Operators.

The Authority and therefore the responsibility for the User (or 'Domain Name') DNS servers lie with the owner of the domain. In many cases this responsibility is delegated by the owner of the Domain to an ISP, Web Hosting Company or increasingly a registrar. Many companies, however, elect to run their own DNS servers and even delegate the Authority and responsibility for sub-domain DNS servers to separate parts of the organisation.

When any DNS cannot answer (resolve) a request for a domain name from a host e.g. example.com the query is passed to a root-server which will direct the query to the appropriate TLD DNS server which will in turn direct it to the appropriate Domain (User) DNS server.

DNS QUERIES

The major task carried out by a DNS server is to respond to queries (questions) from a local or remote resolver or other DNS acting on behalf of a resolver. A query would be something like 'what is the IP address of abc.example.com'.
A DNS server may receive such a query for any domain. DNS servers may be configured to be authoritative for some (if any) domains, slaves, caching, forwarding or many other combinations for others.
Most of the queries that a DNS server will receive will be for domains for which it has no knowledge i.e. for which it has no local zone files. The DNS software typically allows the name server to respond in different ways to queries about which it has no knowledge.
There are three types of queries defined for DNS:
1. A recursive query - the complete answer to the question is always returned. DNS servers are not required to support recursive queries.
2. An Iterative (or non-recursive) query - where the complete answer MAY be returned. All
DNS servers must support Iterative queries.
3. An Inverse query - where the user wants to know the domain name given a resource record.

Note: The process called Reverse Mapping (returns a host name given an IP address) does not use Inverse queries but instead uses Recursive and Iterative (non-recursive) queries using the special domain name IN-ADDR.ARPA. Historically reverse IP mapping was not mandatory. Many systems however now use reverse mapping for security and simple authentication schemes so proper implementation and maintenance is now practically essential.

Recursive Queries

A recursive query is one where the DNS server will fully answer the query (or give an error). DNS servers are not required to support recursive queries and either the resolver (or another DNS acting recursively on behalf of another resolver) negotiate use of recursive service using bits in the query headers.

There are three possible responses to a recursive query:

1.The answer to the query accompanied by any CNAME records (aliases) that may be useful. The response will indicate whether the data is authoritative or cached.
2.An error indicating the domain or host does not exist (NXDOMAIN). This response may also contain CNAME records that pointed to the non-existing host.
3.A temporary error indication - e.g. can't access other DNS's due to network error etc.
In a recursive query a DNS server will, on behalf of the client (resolver), chase the trail of DNS across the universe to get the real answer to the question. The journeys of a simple query such as 'what is the IP address of xyz.example.com' to a DNS server which supports recursive queries but is not authoritative for example.com could look something like this:

1.Resolver on a host sends query 'what is the IP address of xyz.example.com' to locally configured DNS server.
2.DNS server looks up xyz.example.com in local tables (its cache) - not found
3.DNS sends query to a root-server for the IP of xyz.example.com
4.The root-server replies with a referral to the TLD servers for .com
5.The DNS server sends query 'what is the IP address xyz.example.com' to .com TLD server.
6.The TLD server replies with a referral to the name servers for example.com
7.The DNS server sends query 'what is the IP address xyz.example.com' to name server for example.com.
8.Zone file defines a CNAME record which shows xyz is aliased to abc. DNS returns both the CNAME and the A record for abc.
9.Send response abc=x.x.x.x (with CNAME record xyz=abc) to original client resolver. Transaction complete.

Iterative (non-recursive) Queries

An Iterative (or non-recursive) query is one where the DNS server may provide a partial answer to the query (or give an error). DNS servers must support non-recursive queries.

There are four possible responses to a non-recursive query:
1.The answer to the query accompanied by any CNAME records (aliases) that may be useful. The response will indicate whether the data is authoritative or cached.
2.An error indicating the domain or host does not exist (NXDOMAIN). This response may also contain CNAME records that pointed to the non-existing host.
3.A temporary error indication - e.g. can't access other DNS's due to network error etc...
4.A referral; the name and IP address (es) or one or more name server(s) that are closer to the requested domain name. This may, or may not be, the authoritative name server for the target domain.
The journeys of a simple query such as 'what is the IP address of xyz.example.com' to a DNS server which supports Iterative (non-recursive) queries but is not authoritative for example.com could look something like this:
1.Resolver on a host sends query 'what is the IP address xyz.example.com' to locally configured DNS server.
2.DNS server looks up xyz.example.com in local tables (its cache) - not found
3.The DNS replies with a referral containing the root-servers
4.Resolver sends query to a root-server for the IP of xyz.example.com
5.The root-server replies with a referral to the TLD servers for .com
6.The Resolver sends query 'what is the IP address xyz.example.com' to .com TLD server.
7.The TLD server replies with a referral to the name servers for example.com
8.The Resolver sends query 'what is the IP address xyz.example.com' to name server for example.com.
9.Zone file defines a CNAME record which shows xyz is aliased to abc. DNS returns both the CNAME and the A record for abc.
10.Transaction complete.

Note: The above sequence is highly artificial since the resolver on Windows a most *nix systems is a stub resolver - which is defined in the standards to be a minimal resolver which cannot follow referrals. If you reconfigure your local PC or Workstation to point to a DNS server that only supports Iterative queries - it will not work.

Inverse Queries

An Inverse query maps a resource record to a domain. An example Inverse query would be 'what is the domain name for this MX record'. Inverse query support is optional and it is permitted for the DNS server to return a response Not Implemented.
Inverse queries are NOT used to find a host name given an IP address. This process is called Reverse Mapping (Look-up) uses recursive and Iterative (non-recursive) queries with the special domain name IN-ADDR.ARPA.

Zone Updates

The initial design of DNS allowed for changes to be propagated using Zone Transfer (AXFR) but the world of the Internet was simpler and more sedate in those days (1987). The desire to speed up the process of zone update propagation while minimizing resources used has resulted in a number of changes to this aspect of DNS design and implementation from simple - but effective - tinkering such as Incremental Zone Transfer (IXFR) and Notify messages to the concept of Dynamic Updates which is still not widely deployed.
Warning While zone transfers are generally essential for the operation of DNS systems they are also a source of threat. A slave DNS can become poisoned if it accepts zone updates from a malicious source. Care should be taken during configuration to ensure that, as a minimum, the 'slave' will only accept transfers from known sources.

Full Zone Update (AXFR)

The original DNS specifications (RFC 1034 & RFC 1035) envisaged that slave (or secondary) DNS servers would 'poll' the 'master'. The time between such 'polling' is determined by the REFRESH value on the domain's SOA Resource Record

The polling process is accomplished by the 'slave' sending a query to the 'master' and requesting the latest SOA record. If the SERIAL number of the record is different from the current one maintained by the 'slave' a zone transfer (AXFR) is requested. This is why it is vital to very disciplined about updating the SOA serial number every time anything changes in ANY of the zone records.
Zone transfers are always

Incremental Zone Update (IXFR)

Transferring very large zone files can take a long time and waste bandwidth and other resources. This is especially wasteful if only a single record has been changed! RFC 1995 introduced Incremental Zone Transfers (IXFR) which as the name suggests allows the 'slave' and 'master' to transfer only those records that have changed.
The process works as for AXFR. The 'slave' sends a query for the domain's SOA Resource Record every REFRESH interval. If the SERIAL value of the SOA record has changed the 'slave' requests a Zone Transfer and indicates whether or not it is capable of accepting an Incremental Transfer (IXFR). If both 'master' and 'slave' support the feature an Incremental Transfer (IXFR) takes place otherwise a Full Zone Transfer (AXFR) takes place. Incremental Zone transfers use TCP on port 53 (normal DNS queries operations use UDP on port 53).
The default mode for BIND when acting as a 'slave' is to use IXFR unless it is configured not to using the request-ixfr parameter in the server or options section of the named.conf file.
The default mode for BIND when acting as a 'master' is to use IXFR only when the zone is dynamic. The use of IXFR is controlled using the provide-ixfr parameter in the server or options section of the named.conf file.
Notify (NOTIFY)
RFC 1912 recommends a REFRESH interval of up to 12 hours on the REFRESH interval of an SOA Resource Record. This means that changes to the 'master' DNS may not be visible at the 'slave' DNS for up to 12 hours. In a dynamic environment this may be unacceptable.

RFC 1996 introduced a scheme whereby the master will send a NOTIFY message to the slave DNS systems that a change MAY have occurred in the domain records. The 'slave' on receipt of the NOTIFY will request the latest SOA Resource Record and if the SERIAL value is different will attempt a Zone Transfer using either a full Zone Transfer (AXFR) or an Incremental Transfer (IXFR).
NOTIFY behavior in BIND is controlled by
notify, also-notify and notify-source parameters in the zone or options statements of the named.conf file.

Dynamic Update

The classic method of updating Zone Resource Records is to manually edit the zone file and then stop and start the name server to propagate the changes. When the volume of changes reaches a certain level this can become operationally unacceptable - especially considering that in organizations which handle large numbers of Zone Files, such as service providers, BIND itself can take a long time to restart at it plows through very large numbers of zone statements.
The 'holy grail' of DNS is to provide a method of dynamically changing the DNS records while DNS continues to service requests.
There are two architectural approaches to solving this problem:
Allow 'run-time' updating of the Zone Records from an external source/application.
Directly feed BIND (say via one of its two APIs) from a database which can be dynamically updated.
RFC 2136 takes the first approach and defines a process where zone records can be updated from an external source. The key limitation in this specification is that a new domain cannot be added dynamically. All other records within an existing zone can be added, changed or deleted. In fact this limitation is also true for both of BIND's APIs as well.

Part of this specification the term Primary Master is coined to describe the Name Server defined in the SOA Resource Record for the zone. The significance of this term is that when dynamically updating records it is essential to update only one server even though there may be multiple master servers for the zone. In order to solve this problem a 'boss' server must be selected, this 'boss' server termed the Primary Master has no special characteristics other than it is defined as the Name Server in the SOA record and may appear in an allow-update clause to control the update process.
While normally associated with Secure DNS features (TSIG - RFC 2845 & TKEY - RFC 2930) Dynamic DNS (
DDNS) does not REQUIRE TSIG/TKEY. However there is a good reason to associate the two specifications when you consider that by enabling Dynamic DNS you are opening up the possibility of master zone file corruption or poisoning. Simple IP address protection (ACL) can be configured into BIND but this provides - at best - limited protection. For that reason serious users of Dynamic DNS will always use TSIG/TKEY procedures to authenticate incoming requests.
Dynamic Updating is defaulted to
deny from all hosts. Control of Dynamic Update is provided by the BIND allow-update (usable with and without TSIG/TKEY) and update-policy (only usable with TSIG/TKEY) clauses in the zone or options statements of the named.conf file.
There are a number of Open Source tools which will initiate Dynamic DNS updates these include
dnsupdate (not the same as DNSUpdate) and dnsupdate which is distributed with bind-utils.

Alternative Dynamic DNS Approaches

As noted above the major limitation in the standard Dynamic DNS (RFC 2136) approach is that new domains cannot be created dynamically.


BIND-DLZ takes a much more radical approach and using a serious patch to BIND allows replacement of all zone files with a single zone file which defines a database entry. The database support, which includes most of the major databases (MySQL, PostgreSQL, BDB and LDAP among others) allows the addition of new domains as well as changes to pre-existing domains without the need to stop and start BIND. As with all things in life there is a trade-off and performance can drop precipitously. Current work being carried (early 2004) out with a High performance Berkeley DB (BDB) is showing excellent results approaching raw BIND performance.