IBM SG24-5131-00 manual

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240

Go to page of

A good user manual

The rules should oblige the seller to give the purchaser an operating instrucion of IBM SG24-5131-00, along with an item. The lack of an instruction or false information given to customer shall constitute grounds to apply for a complaint because of nonconformity of goods with the contract. In accordance with the law, a customer can receive an instruction in non-paper form; lately graphic and electronic forms of the manuals, as well as instructional videos have been majorly used. A necessary precondition for this is the unmistakable, legible character of an instruction.

What is an instruction?

The term originates from the Latin word „instructio”, which means organizing. Therefore, in an instruction of IBM SG24-5131-00 one could find a process description. An instruction's purpose is to teach, to ease the start-up and an item's use or performance of certain activities. An instruction is a compilation of information about an item/a service, it is a clue.

Unfortunately, only a few customers devote their time to read an instruction of IBM SG24-5131-00. A good user manual introduces us to a number of additional functionalities of the purchased item, and also helps us to avoid the formation of most of the defects.

What should a perfect user manual contain?

First and foremost, an user manual of IBM SG24-5131-00 should contain:
- informations concerning technical data of IBM SG24-5131-00
- name of the manufacturer and a year of construction of the IBM SG24-5131-00 item
- rules of operation, control and maintenance of the IBM SG24-5131-00 item
- safety signs and mark certificates which confirm compatibility with appropriate standards

Why don't we read the manuals?

Usually it results from the lack of time and certainty about functionalities of purchased items. Unfortunately, networking and start-up of IBM SG24-5131-00 alone are not enough. An instruction contains a number of clues concerning respective functionalities, safety rules, maintenance methods (what means should be used), eventual defects of IBM SG24-5131-00, and methods of problem resolution. Eventually, when one still can't find the answer to his problems, he will be directed to the IBM service. Lately animated manuals and instructional videos are quite popular among customers. These kinds of user manuals are effective; they assure that a customer will familiarize himself with the whole material, and won't skip complicated, technical information of IBM SG24-5131-00.

Why one should read the manuals?

It is mostly in the manuals where we will find the details concerning construction and possibility of the IBM SG24-5131-00 item, and its use of respective accessory, as well as information concerning all the functions and facilities.

After a successful purchase of an item one should find a moment and get to know with every part of an instruction. Currently the manuals are carefully prearranged and translated, so they could be fully understood by its users. The manuals will serve as an informational aid.

Table of contents for the manual

  • Page 1

    SG24-51 31-00 Internatio na l T echnical Support Organization http://www.redbooks.ibm.com IBM Certification Study Guide AIX HACMP David Thiessen, Achim Reh or, Reinhard Zettler[...]

  • Page 2

    [...]

  • Page 3

    IBM Certificat ion Study Gui de AIX HACMP May 1999 SG24-5131-00 International T echnical Support Organizatio n[...]

  • Page 4

    © Copyright International Busine ss Mac hines Corpora tion 1999. All rights reser ved. Note to U.S Gov ernmen t Users – Do cum entation r elated to r estric ted righ ts – Us e, duplic ation or disclosu re is subject to re stricti ons set forth in GSA ADP Sc hedule Contra ct with IBM Corp . First Ed iti on (May 1 999) This editio n applies to H[...]

  • Page 5

    © Copyright IBM Corp. 1 999 iii Contents Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix Ta b l e s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .[...]

  • Page 6

    iv IBM Certificatio n Study Guide A IX HAC MP Chapter 3. Cluster Hardware and Software Preparation . . . . . . . . . . . 51 3.1 Cluster Node Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.1 Adapter Slot Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.2 Rootvg Mirroring . . . .[...]

  • Page 7

    v 5.1.3 Event Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.4 Event Recovery and Ret ry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.5 Notes on Cust omizing Event Processing . . . . . . . . . . . . . . . . . 123 5.1.6 Event Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . .[...]

  • Page 8

    vi IBM Certific ation Stu dy Guide A IX HAC MP 8.1.1 The clstat C ommand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.1.2 Monitoring Clusters using HAView . . . . . . . . . . . . . . . . . . . . . . 152 8.1.3 Cluster Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.2 Starting and Stopp[...]

  • Page 9

    vii 9.3 VSDs - RVSDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.3.1 Virtual Shared Disk (VSDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 0 9.3.2 Recoverable Virtual Shared Disk . . . . . . . . . . . . . . . . . . . . . . . 193 9.4 SP Switch as an HACMP Networ k . . . . . . . . . . . . . . [...]

  • Page 10

    viii IBM C erti fication S tud y Guide A IX HA CMP[...]

  • Page 11

    © Copyright IBM Corp. 1 999 ix Figures 1. Basic SSA Confi guration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 7 2. Hot-Standby Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3. Mutual Takeov er Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [...]

  • Page 12

    x IBM Certificatio n St udy Gui de AIX H ACMP[...]

  • Page 13

    © Copyright IBM Corp. 1 999 xi Ta b l e s 1. AIX Versi on 4 HACMP Installati on and Impl ementatio n . . . . . . . . . . . . . . . 4 2. AIX Versi on 4 HACMP S ystem Admini stration . . . . . . . . . . . . . . . . . . . . . . 5 3. Hardware Requi rements for the Different HAC MP Versio ns . . . . . . . . . . . . 8 4. Number of Ad apter Sl ots in Ea [...]

  • Page 14

    xii IBM Certifica tion Stud y Gu ide AIX HA CMP[...]

  • Page 15

    xiii Pref ace The AIX and RS/6000 Certifications of fer ed through the Professional Certification Program from I BM are designed to validate t he skills required of technical professionals who work in the powerful and o ften complex environments of AIX and RS/6000. A complete set of professional certifications is available. It includes: • IBM Ce [...]

  • Page 16

    xiv IBM Certification Study Guide AIX HACM P • AIX par ameters that are af fec ted by an HACMP installation, and their correct settings • T he cluster and resource configuration process, including how to choose the best resource configuration for a customer requirement • Customization of t he standard HACMP facilities to satisfy special custo[...]

  • Page 17

    xv POWERparallel Systems area, known as the SP1 at that time. In 1997 he began working on HACMP as the Service Groups for HACMP and RS/6000 SP merged into one. He holds a diploma in Computer Science from the University of Frankfurt in Germany . This is his first redbook. Reinha rd Zettle r is an AIX Software Engineer in Munich, Germany . He has two[...]

  • Page 18

    xvi IBM Certification Study Guide AIX HACM P[...]

  • Page 19

    © Copyright IBM Corp. 1 999 1 Chapter 1. Certif ication Ov erview This chapter provides an overview of the skill requirements for obtaining an IBM Certified Specialist - AIX HACMP certification. The following chapters are designed to provide a comprehensive review of specific topics that are essential for obtaining the certification. 1.1 IBM Ce rt[...]

  • Page 20

    2 IBM Certificatio n St udy Gui de AIX H ACMP 1.2 Certificati o n Exam Ob jectives The following objectives were used as a basis for what is required when the certification exam was developed. Some of these topics have been regrouped to provide better organization when discussed in this publication. Section 1 - Preinstallation The following items s[...]

  • Page 21

    Certification Overview 3 • Cr eate an application server . • Set up E vent Notification. • Set up ev ent notification and pre/post event scripts. • Set up error notification. • Post Configur ation Activities. • Conf i gure a client notification and ARP update. • Implement a test plan. • Cr eate a snapshot. • Cr eate a customizatio[...]

  • Page 22

    4 IBM Certificatio n St udy Gui de AIX H ACMP 1.3 Certific ation Educ ation Courses Courses and publications are of fered to help you prepare for the certification tests. These courses are recommended, but not r equi red, before taking a certification test. At the printing of this guide, the following courses are available. For a current list, plea[...]

  • Page 23

    Certification Overview 5 The following table outlines information about the next course. T abl e 2. AIX Version 4 HA CMP S ystem A dmin istration Course Number Q1 150 (USA ); AU50 (Worldwide) Course Duration Five days Course Abstract This course teaches the student the skills required to administer an HACMP clus ter on an ongoing basis after it is [...]

  • Page 24

    6 IBM Certificatio n St udy Gui de AIX H ACMP[...]

  • Page 25

    © Copyright IBM Corp. 1 999 7 Chapter 2. Cluster Planning The area of cluster planning is a large one. Not only does it include planning for the types of hardware (CPUs, netw or ks, disks) to be used in the cluster , but it also includes other aspects. These include resource planning, that is, planning the desired behavior of the cluster in failur[...]

  • Page 26

    8 IBM Certificatio n St udy Gui de AIX H ACMP RISC System/6000 models as nodes in an HACMP 4.1 for AIX, HACMP 4.2 for AIX, or HACMP 4.3 for A IX cluster . T abl e 3. Hardw are R equ irem ents for the Differ ent HACMP Versions 1 AIX 4.3.2 required For a detailed description of system models supported by HACMP/6000 and HACMP/ES, you should refer to t[...]

  • Page 27

    Cluster Planning 9 Much of the decision centers around the following ar eas: • Processor capacity • Application requirements • Anticipated gr owth requirements • I/ O slot requirements These paradigms are certainly not new ones, and are also important considerations when choosing a processor for a single-system environment. However , when d[...]

  • Page 28

    10 IBM Certifica tion Stud y Gu ide AIX HA CMP Y our slot configuration must also allow for the disk I/O adapters you need to support the cluster’ s s hared disk (volume group) configuration. If you intend to use disk mirroring for shared volume groups, whic h i s strongly recommended, then you will need to use slots for additional disk I/O adapt[...]

  • Page 29

    Cluster Planning 11 2.2 Cl uste r Networks HACMP differentiat es between two major types of networks: T CP/IP networks and non-TCP/IP networks. HACMP utilizes both of them for exchanging heartbeats. HACMP uses these heartbeats to diagnose failures in the cluster . Non-TCP/IP networks are used to distinguish an actual hardware failure from the failu[...]

  • Page 30

    12 IBM Certifica tion Stud y Gu ide AIX HA CMP • FDDI • SP Switch •S L I P •S O C C • T oken- Ring As an independent, layered component of AIX, the HACMP for AIX software works with most TCP/IP-based networks. HACMP for AIX has been tested with standard Ethernet interfaces (en *) but not with IEEE 802.3 Ethernet interfaces (et*), where * [...]

  • Page 31

    Cluster Planning 13 Network types also differentiate themselves in the maximum distance they allow between adapters, and in the maximum number of adapters al lowed on a physical network. • Ethernet supports 10 and 100 Mbps currently , and supports hardware address swapping. Alternate hardware addr esses should be in the form xxxxxxxx xxyy , where[...]

  • Page 32

    14 IBM Certifica tion Stud y Gu ide AIX HA CMP • SP Switch is a high-speed packet switching network, running on the RS/6000 SP system only . It runs bidirectionally up to 80 MBps, which adds up to 160 MBps of capacity per adapt er . This is node-to-node communication and can be done in parallel between every pair of nodes inside an SP . The SP Sw[...]

  • Page 33

    Cluster Planning 15 2.2.2.2 Special C onsiderat ions As for TCP/IP networks, there are a number of restrictions on non-TCP/IP networks. These are explained for the three dif ferent types in more detail below . Serial (RS232) A serial (RS232) network needs at least one available serial por t per cluster node. In case of a cluster consisting of more [...]

  • Page 34

    16 IBM Certifica tion Stud y Gu ide AIX HA CMP 2 a PCI Multiport Async Card is required in an S7X model, no native ports 3 only one serial port available for customer use, i.e. HACMP In case the number of native serial por ts doesn’t match your HACMP cluster configuration needs, you can ex tend it by adding an eight-port asynchronous adapter , th[...]

  • Page 35

    Cluster Planning 17 SSA subsystems are built up from loops of adapters and disks. A simple example is shown in Figure 1. Figure 1. Basic SSA Co nfigur ation Here, a single adapter controls one SSA loop of eight disks. Data can be transferred around the lo op, in either direction, at 20 MBps. Consequently , the peak transfer rate of t he adapter is [...]

  • Page 36

    18 IBM Certifica tion Stud y Gu ide AIX HA CMP • 7133 Serial Storage Architecture (SSA) Disk Subs ystem Models 010, 500, 020, 600, D40 and T40. The 7133 models 010 and 500 were the fir s t SSA products announced in 1995 with the revolutionary new Serial Storage A rchitecture. Some IBM customers still use the Models 010 and 500, but these have bee[...]

  • Page 37

    Cluster Planning 19 2.3.1. 1 Disk C apacities T able 8 lists the dif ferent SSA disks, and provides an overview of their characteristics. T abl e 8. SSA Disk s 2.3.1.2 Supported and Non-Supported Adapters T able 9 lists the dif ferent SSA adapters and presents an overview of their characteristics. T abl e 9. SSA A dapter s Supp orte d RAI D lev el [...]

  • Page 38

    20 IBM Certifica tion Stud y Gu ide AIX HA CMP 1 See 2.3.1.3, “Rules for SSA Loops” on page 20 for more information. The following rules apply to SSA Adapters: • Y ou cannot have more than four adapters in a single system. • The MCA SSA 4-Port RAID Adapter (FC 6217) and PCI SSA 4-Port RAID Adapter (FC 6218) are not useful for HACMP , becaus[...]

  • Page 39

    Cluster Planning 21 • A maximum of 48 devices can be connected in a particular SSA loop. • Only one pai r of adapter connectors can be connected in a particular SSA loop. • Member disk drives of an array can be on either SSA loop. For SSA loops that include a Micro Channel Enhanced SSA Multi-initiator/RAID EL adapter , Feature 6215 or a PCI S[...]

  • Page 40

    22 IBM Certifica tion Stud y Gu ide AIX HA CMP 2.3.1.4 RAID vs . Non-RAID RAID T echnology RAID is an acronym for Redundant Array of Independent Disks. Disk arrays are groups of disk drives that work together to achieve higher data-transfer and I/O rates than those provided by single large drives. Arrays can also provide data redundancy so that no [...]

  • Page 41

    Cluster Planning 23 RAID Leve ls 2 and 3 RAID 2 and RAID 3 are parallel process array mechanisms, where all drives in the array operate in unison. Similar to data st riping, information to be written to disk is s plit into chunks (a fixed amount of data), and each chunk is written out to the same physic al position on separate disks (in parallel). [...]

  • Page 42

    24 IBM Certifica tion Stud y Gu ide AIX HA CMP As with RAID 3, in the event o f disk f ailure, the information can be r ebuilt from the remaining drives. RAID level 5 array also uses parity information, though it is still important to make regular backups of the data in the array . RAID level 5 stripes data across all of the drives in the array , o[...]

  • Page 43

    Cluster Planning 25 • Array member drives and spares must be on same loop (cannot span A and B loops) on the adapter . • Y ou cannot boot (ipl) from a RAID. 2.3.1. 5 Advan tages Because SSA allows SCSI-2 mapping, all functions associated with initiators, targets, and logical units are translatable. Therefor e, SS A can use the same command desc[...]

  • Page 44

    26 IBM Certifica tion Stud y Gu ide AIX HA CMP 2.3.2 SCSI Disks After the announcement of the 7133 SS A Disk Subsystems, the SCSI Disk subsystems became less common in HACMP clusters. However , the 7135 RAIDiant Array (Model 1 10 and 210) and other SCSI Subsystems are still in use at many customer sites. We will not describe other SCSI Subsystems s[...]

  • Page 45

    Cluster Planning 27 • Enhanced SCSI-2 Diff erential Fast/Wide Adapter/A (MCA, FC: 2412, Adapter Label: 4-C); not us able with 7135-1 10 • SCSI-2 Fast/Wide Differ ential Adapter (PCI, FC: 6209, Adapter Label: 4-B) • DE Ultra SCSI Adapter (PCI, FC: 620 7, Adapter Label: 4-L); not usable with 7135-1 10 2.3.2.4 Advant ages - Disad vantages The 71[...]

  • Page 46

    28 IBM Certifica tion Stud y Gu ide AIX HA CMP withdraw the 7135 RAID iant Systems from marketing because it is equally possible to configure RAID on the SSA Subsystems. 2.4 Re source Pl anning HACMP provides a highly available environment by identifying a set of cluster-wide resources essential to uninterrupted processing, and then defining relati[...]

  • Page 47

    Cluster Planning 29 • Cascading • Rotating • Concurrent Each of these types describes a differ ent set of relationships between nodes in the cluster , and a dif ferent set of behaviors upon nodes entering and leaving the cluster . Cascading Resource Groups: All nodes in a cascading resource group are assigned priorities for that resource grou[...]

  • Page 48

    30 IBM Certifica tion Stud y Gu ide AIX HA CMP reintegration, a node remains as a standby and does not take back any of the resources that it had initially served. Concu rrent Resour ce Groups: A concurrent resource group may be shared simultaneously by multiple nodes. The resources that can be part of a concurrent resource group are limited to vol[...]

  • Page 49

    Cluster Planning 31 Figure 2. Hot- Standb y Con figur ation In this configuration, there is one c ascading resource group cons isting of the four disks, hdisk1 to hdisk4, and their constituent volume groups and file systems. Node 1 has a priority of 1 for this resource group while node 2 has a priority of 2. During normal operations, node 1 pr ovid[...]

  • Page 50

    32 IBM Certifica tion Stud y Gu ide AIX HA CMP the cluster becomes a standby node. Y ou must choose a rotating standby configuration if you do not want a break in service during reintegr ation. Since takeover nodes continue providing s ervices until they have to leave the cluster , you should configure your cluster with nodes of equal power . While[...]

  • Page 51

    Cluster Planning 33 When a failed node reintegrates into the cluster , it takes back the resource group for which it has the highest priority . Therefore, even in this configuration, there is a break in serv ice during reintegration. Of course, if you look at it from the point of view of performance, this is the best thing to do, since you have one[...]

  • Page 52

    34 IBM Certifica tion Stud y Gu ide AIX HA CMP Here the resource groups are the same as the one s in the mutual takeover configuration. Also, similar to the previous configuration, nodes 1 and 2 each have priorities of 1 for one of the r esource groups, A or B. The only thing differ ent in this configuration is that there is a third node which has [...]

  • Page 53

    Cluster Planning 35 • Design the network topology • Define a network mask for your site • Define IP addresses (adapter identifiers) for e ach node’s service an d standby adapters. • Define a boot address for each service adapte r that can be taken over , if you are using IP address takeover or rotating resources. • Define an alternate h[...]

  • Page 54

    36 IBM Certifica tion Stud y Gu ide AIX HA CMP Dual Network A dual-network setup has two separ ate networks for communication. Nodes are connected to two network s, and each node has two service adapters available to clients. If one network fails, the remaining network can still function, connecting nodes and providing resource access to clients. I[...]

  • Page 55

    Cluster Planning 37 The following diagram shows a cluster consisting of two nodes and a client. A single public network connects the nodes and the client, and the nodes are linked point-to-point by a private high-speed SOCC connection t hat provides an alternate path for cluster and lock traf fic should the public network fail. Figure 7. A Poin t-t[...]

  • Page 56

    38 IBM Certifica tion Stud y Gu ide AIX HA CMP SLIP are considere d public networ ks. Note th at a SLIP line, ho wever, does not p rovide client a ccess. Private A private network provides communication between nodes only; it typically does not allow client access. An SOCC line or an A TM network are also private networks; however , an A TM network[...]

  • Page 57

    Cluster Planning 39 until it assumes the shared IP address. Consequently , Clinfo makes known the boot address for this adapter . In an HACMP for AIX environm ent on the RS/6000 SP , the SP Ethernet adapters can be configured as service adapters but should not be configured for IP address takeover . For the SP switch network, service addresses used[...]

  • Page 58

    40 IBM Certifica tion Stud y Gu ide AIX HA CMP service label (address) instead of the boot label. If the node should fail, a takeover node acquires the failed node’ s service address on its standby adapter , thus making the failure transparent to clients using that specific service address. During the reintegration of t he failed node, which come[...]

  • Page 59

    Cluster Planning 41 If you do not use Hardwar e Address T ak eover , the ARP cache of clients can be updated by adding the clients’ IP addresses to the PING_CLIENT_L IST variable in the /usr/sbin/cluster/etc/clinfo.rc file. 2.4. 4 NFS Export s and NFS Mo unts There are two items concerning NFS when doing the configur ation of a Resource Group: Fi[...]

  • Page 60

    42 IBM Certifica tion Stud y Gu ide AIX HA CMP application on the takeover node when a fallover occurs. For more information about creating application server resources, see the HACMP for AIX, V ersion 4.3: Installation Guide , SC23-4278. 2.5.1 Pe rformance Requireme nts In order to plan your application’ s needs, you must have a thorough underst[...]

  • Page 61

    Cluster Planning 43 2.5. 3 Lic ensing Met hods Some vendors require a unique license for each processor that runs an application, which means that you must license-protect the application by incorporating processor-specific information into the application when it is installed. As a result, it is possible that even though the HACMP for AIX software[...]

  • Page 62

    44 IBM Certifica tion Stud y Gu ide AIX HA CMP 2.6 Cus tomization P lanning The Cluster Manager ’s ability to recognize a specific series of events and subevents permits a very flexible customization scheme. The HACMP for AI X software provides an event customization facility that allows you to tailor cluster event processing to your site. 2.6.1 [...]

  • Page 63

    Cluster Planning 45 event to inform system administrators that t raffic may have to be rerouted. Afterwards, you can use a network_u p notification event to inform system administrators that traf fic can again be serviced thr ough the restored network. 2.6.1. 3 Predic tive Even t Error Correct ion Y ou can specify a command that attempts to recover[...]

  • Page 64

    46 IBM Certifica tion Stud y Gu ide AIX HA CMP 2.6.2.1 Single Point-of-Fail ure Hardware Component Recovery As described in 2.2.1.2, “Special Network Considerations” on page 12, the HPS Switch network is one resource that has to be cons idered as a single point of failure. Since a node can support only one switch adapter , its failur e will dis[...]

  • Page 65

    Cluster Planning 47 The above example screen will add a Notification Method to the ODM, so that upon appearance of the HPS_F AUL T9_ER entry in the error log, the er ror notification daemon will trigger the execution of t he /usr/sbin /cluster/u tiliti es/clstop -gr sy command, which shuts HACMP down gracefully with takeover . In this way , the swi[...]

  • Page 66

    48 IBM Certifica tion Stud y Gu ide AIX HA CMP 2.7 Us er ID Plan ning The following sections describe various aspects of User ID Planning. 2.7. 1 Clus ter User a nd Grou p IDs One of the basic tasks any system administrator must perform is setting up user accounts and groups. All users require accounts to gain access to the system. Every user accou[...]

  • Page 67

    Cluster Planning 49 2.7. 2 Clus ter Passwo rds While user and group management is ver y much facilitated with C-SPO C, the password information still has to be distributed by some other means. If the system is not configured to use NIS or DCE, the sys tem administrator still has to distribute the password infor mation, meaning that found in the /et[...]

  • Page 68

    50 IBM Certifica tion Stud y Gu ide AIX HA CMP 2.7.3.3 NFS-Mounted Home Directories on Shared V olumes So, a combined approach is used in most cases. In order to make home directories a highly available resource, they have to be part of a resource group and placed on a shared volume. That way , all c luster nodes can access them in case they need t[...]

  • Page 69

    © Copyright IBM Corp. 1 999 51 Chapter 3. Cluster Hardware and Softwar e Prepar ation This chapter covers the steps that are required to prepar e the RS/6000 hardware and AIX software for the i nstallation of HACMP and the configuration of the cluster . This includes configuring adapters for TCP/IP , setting up shared volume groups, and mirroring [...]

  • Page 70

    52 IBM Certifica tion Stud y Gu ide AIX HA CMP mirroring rootvg in order to a v oid the impact of the failover time involved in a node failure. In terms of maximizing availability , this technique is just as valid for increasing the availability of a cluster as it is for increasing single-sys tem availability . The following procedure contains info[...]

  • Page 71

    Cluster Hardware and S oftware Preparation 53 mirrored. If the dump devices are NOT the paging device, that dump logi cal volume will not be mirrored. 3.1. 2.1 Procedur e The following steps assume the user has rootvg contained on hdisk0 and is attempting to mirror the rootvg to a new disk : hdisk1. 1. Extend rootvg to hdisk1 by executing the follo[...]

  • Page 72

    54 IBM Certifica tion Stud y Gu ide AIX HA CMP “-m” option. Y ou should consult documentation on the us age of the “-m” option for mklv copy . 4. Synchronize the newly created mirrors with the following command: 5. Bosboot to initialize all boot records and devices by executing the following command: where hdisk ? is the first hdisk listed [...]

  • Page 73

    Cluster Hardware and S oftware Preparation 55 3.1 .2.2 Ne cessary A P AR Fix es T able 1 1. Necessar y AP AR Fixes T o determine if either fix is installed on a machine, execute the following: 3.1. 3 AIX P rerequis ite LP Ps In order to install HACMP and HACMP/ES the AIX setup must be in a pr oper state. The following table gives you the prerequisi[...]

  • Page 74

    56 IBM Certifica tion Stud y Gu ide AIX HA CMP • nv6000.database.obj 4.1.0.0 • nv6000.Features.obj 4.1.2.0 • nv6000.client.obj 4.1.0.0 and for HA View 4.3 • xlC.rte 3.1.4.0 • nv6000.base.obj 4.1.2.0 • nv6000.database.obj 4.1.2.0 • nv6000.Features.obj 4.1.2.0 • nv6000.client.obj 4.1.2.0 3.1.4 AIX Paramete r Settings This section disc[...]

  • Page 75

    Cluster Hardware and S oftware Preparation 57 and low-water marks. If a process tries to wr ite to a file at the high-water mark, it must wait until enough I/O oper ations have finished to make the low-water mar k. Use the smi t chgsys fastpath to set high- and low-water marks on the Change/Show Characteristics of the Operat ing System screen. By d[...]

  • Page 76

    58 IBM Certifica tion Stud y Gu ide AIX HA CMP 3.1.4.3 Editing the /e tc /hosts File a nd Nameserver Configuration Make sure all nodes can resolve all cluster addresses. See the chapter on planning TCP/IP networks (the section Using HACMP with NIS and DNS) in the HACMP for AIX, V ersion 4.3: Planning Guide, SC23-4277 for more information on name se[...]

  • Page 77

    Cluster Hardware and S oftware Preparation 59 3.1.4.5 Editing the /.rhosts File Make sure that each node’s service adapters and boot addresses are listed in the /.rhosts file on each cluster node. D oi ng so allows the /usr/sbi n/cluster/uti lities/cl runcmd command and the /usr/sbin/cluster/godm daemon to run. The / us r/sbin/cluster/godm daemon[...]

  • Page 78

    60 IBM Certifica tion Stud y Gu ide AIX HA CMP 3.2 Networ k Connection and T es ting The following sections describe important aspects of network connection and testing. 3.2.1 TC P/IP Networks Since there are several types of TCP/IP Networ ks available within HACMP, there are several dif ferent characteristics and some restrictions on them. Charact[...]

  • Page 79

    Cluster Hardware and S oftware Preparation 61 . Figure 9. Con necting Networ ks to a Hub 3.2.1.2 IP Addresses and Subnets The design of the HACMP for AIX software s pecifies that: • All client traffic be carried over the service adapter • Standby adapters be hidden from client applications and carry only internal Cluster Manager traffic[...]

  • Page 80

    62 IBM Certifica tion Stud y Gu ide AIX HA CMP T o comply with these rules, pay careful attention to the IP addresses you assign to standby adapters. Standby adapters mus t be on a separ ate s ubnet from the service adapt ers, even though they are on the same physical network. Placing standby adapters on a dif ferent subnet from the service adapter[...]

  • Page 81

    Cluster Hardware and S oftware Preparation 63 • Scan the /tmp/hacmp.out file to confirm that the /etc/rc.net script has run successfully . Look for a zero exit status. • If IP address takeover is enabled, confirm that the /etc/rc.net script has run and that the service adapter is on its service address and not on its boot address. • Use the l[...]

  • Page 82

    64 IBM Certifica tion Stud y Gu ide AIX HA CMP TMSS A T arget-mode SSA is only supported with the SSA Multi-Initiator RAID Adapters (Feature #6215 and #6219) , Microcode Level 1801 or later . Y ou need at least HACMP V ersion 4.2.2 with AP AR IX75718. 3.2.2.2 Configuring RS232 Use the smit tty fastpath to create a tty device on the nodes. On the re[...]

  • Page 83

    Cluster Hardware and S oftware Preparation 65 3.2.2.4 Configuring T arget Mode SSA The node number on each system needs to be changed fr om the default of zero to a number . All systems on the SSA loop must have a unique node number . T o change the node number use the following command: chdev -l ssar -a node _number=# T o show the system’s node [...]

  • Page 84

    66 IBM Certifica tion Stud y Gu ide AIX HA CMP cat /etc /environment > /dev/tm ssay.im on the corresponding node for wr iting. x and y cor respond to the appropriate opposite nodenumber . Y ou should see the first command hanging unt i l the second command is issued, and then showing its output. T arget Mode SCSI: After configuration of T arget [...]

  • Page 85

    Cluster Hardware and S oftware Preparation 67 For more information regarding adapters and c abling rules see 2.3.1, “SSA Disks” on page 16 or the following documents: • 7133 SSA Disk Subsystems: Servic e Guide, SY33-0185-02 • 7133 SSA Disk Subsystem: Operator Guide, GA 33-3259-01 • 7133 Models 010 and 020 SSA Disk Subsystems: Installation[...]

  • Page 86

    68 IBM Certifica tion Stud y Gu ide AIX HA CMP Adapter Definitions By issuing the following command, you can check the correct adapter configuration. In order to work correctly , the adapt er must be in the “Available” state: The third column in the adapter device line shows the location of the adapter . Disk Definitions SSA disk drives are rep[...]

  • Page 87

    Cluster Hardware and S oftware Preparation 69 SSA physical disks: • Are configured as pdisk0, pdisk1,...,pdiskN. • Have errors logged against them in the system error log. • Support a character special file (/dev/pdisk0, /dev/pdisk1,...,/dev/p.diskN). • Support the IOCTLl subroutine for servicing and diagnostic functions. • Do not accept [...]

  • Page 88

    70 IBM Certifica tion Stud y Gu ide AIX HA CMP Configuration V erification This option enables you to display the relationships between physical (pdisk) and logical (hdisk) di sks . Format Disk T his option enables you to format SSA disk drives. Certify Disk This option enables you to test whether data on an SSA disk drive can be read correctly . D[...]

  • Page 89

    Cluster Hardware and S oftware Preparation 71 12.Run cfgmgr to install the microcode to adapters. 13.T o complete the device driver upgrade, you must now reboot your s ystem. 14.T o confirm that the upgrade was a success, type lscfg -vl s saX whe re X i s 0,1... for all SSA adapters. Check the ROS Level line to see that each adapter has the appropr[...]

  • Page 90

    72 IBM Certifica tion Stud y Gu ide AIX HA CMP 18.T o confirm that the upgrade was a success, type lscfg -vl pd iskX where X is 0,1... for all SSA disks. Check the ROS Level line to see that each disk has the appropriate microcode level (for the correct microcode level see the above mentioned web-site). 3.3.1.4 Configuring a RA ID on SSA Disks Disk[...]

  • Page 91

    Cluster Hardware and S oftware Preparation 73 3.3.2. 1 Cabli ng The following sections describe important information about cabling. SCSI Ada pters A overview of SCSI adapters that can be used on a shared SCSI bus is given in 2.3. 2.3, “Supported SCSI Adapters” on page 26. F or the necessary adapter changes, see 3.3.2.3, “Adapter SCSI ID and [...]

  • Page 92

    74 IBM Certifica tion Stud y Gu ide AIX HA CMP FC: 2902 or 9202 (2.4m), PN: 67G 1260 - OR - FC: 2905 or 9205 (4.5m), PN: 67G 1261 - OR - FC: 2912 or 9212 (12m), PN: 67G1262 - OR - FC: 2914 or 9214 (14m), PN: 67G1263 - OR - FC: 2918 or 9218 (18m), PN: 67G1264 • T erm inator (T) Included in FC 2422 (Y -Cable), PN: 52G7350 • Cable Interposer (I) F[...]

  • Page 93

    Cluster Hardware and S oftware Preparation 75 FC: 2426 (0.94m), PN: 52G4234 • 16-Bit SCSI-2 Differential System-to-System Cable FC: 2424 (0.6m), PN: 52G4291 - OR - FC: 2425 (2.5m), PN: 52G4233 This cable is used only if there are more than two nodes attached to the same shared bus. • 16-Bit Differential SCSI Cable (RAID Cable) FC: 2901 or 9201 [...]

  • Page 94

    76 IBM Certifica tion Stud y Gu ide AIX HA CMP T T T T 6 bit) 6 (16-bit) #2416 (16 - #2424 6-bit) 6 (16-bit ) #2426 #2416 (16- b #2416 (16-bit) #2426 Maximum total cab le length: 25m[...]

  • Page 95

    Cluster Hardware and S oftware Preparation 77 Figure 1 1. 71 35-1 10 RAIDi ant Arr ays Con nected on T wo S hared 16-Bit SC SI Buses 3.3.2.3 Adapter SCSI ID and T ermination change The SCSI-2 Diff er ential Controller is used to connect to 8-bit disk devices on a shared bus. The SCSI-2 Dif f erential Fast/Wide Adapter/A or Enhanced SCSI-2 Dif feren[...]

  • Page 96

    78 IBM Certifica tion Stud y Gu ide AIX HA CMP SCSI-2 Dif ferential Fast/Wide Adapter/A and Enhanced SCSI-2 Dif ferential Fast/Wide Adapter/A) are shown in Figure 12 and F igure 13 respectively . Figure 12. T erminatio n on th e SCSI-2 Differen tial Co ntroller Figure 13. T erminatio n on th e SCSI-2 Differen tial Fast/Wi de Adapter s 4-2 P/N 43G01[...]

  • Page 97

    Cluster Hardware and S oftware Preparation 79 The ID of an SCSI adapter , by default, is 7. Since each device on an SCSI bus must have a unique ID, the ID of at least one of the adapters on a shared SCSI bus has to be changed. The procedure to change the ID of an SCSI-2 Differential Controller is: 1. At the command prompt, enter smit chgs csi . 2. [...]

  • Page 98

    80 IBM Certifica tion Stud y Gu ide AIX HA CMP 4. Reboot the machine to bring the change int o effect . The same task can be executed from the command line by entering: Also with this method, a reboot is required to br ing the change into eff ec t. The procedure to change the ID of an SCSI-2 Differential Fast/Wide Adapter/A or Enhanced SCSI-2 Diffe[...]

  • Page 99

    Cluster Hardware and S oftware Preparation 81 The command line version of this is: As in the case of the SCSI-2 Differential Contr oller, a system reboot is required to bring the change into ef fect. The maximum length of the bus, including any internal cabling in disk subsystems, is limited to 19 meters for buses connected to the SCSI-2 Differ ent[...]

  • Page 100

    82 IBM Certifica tion Stud y Gu ide AIX HA CMP 3.4.1 Cre ating Share d VGs The following sections contain information about creating non-concurr ent VGs and VGs for concurrent access. 3.4.1.1 Creating Non-Concurrent VGs This section covers how to create a shar ed volume group on the source node using the SMIT interface. Use the s mit mk vg fastpath[...]

  • Page 101

    Cluster Hardware and S oftware Preparation 83 Creating a Concurrent Acce s s V olume Group on Serial Dis k Subsystems T o us e a concurrent access volume group, defined on a serial disk subsystem such as an IBM 7133 disk subsystem, you must create it as a concurrent-capable volume group. A concurrent-capable volume group can be activated (varied on[...]

  • Page 102

    84 IBM Certifica tion Stud y Gu ide AIX HA CMP Use the smit mkvg fastpath to create a shared volu me group. Use the default field values unless your site has other requirements, or unless you are specifically instructed otherwise. T abl e 15. sm it mkvg Optio ns (C oncur rent, RAID) 3.4.2 Cre ating Share d L V s and File S ystems Use the smi t crjf[...]

  • Page 103

    Cluster Hardware and S oftware Preparation 85 the journaled file sys tem log (jfslog) is a logi cal volume that requires a unique name in the cluster . T o make sur e that logical volumes have unique names, rename the logical volume associated with the file system and the corresponding jfslog logical volume. Use a naming scheme that indicates the l[...]

  • Page 104

    86 IBM Certifica tion Stud y Gu ide AIX HA CMP That is, you enter this command for each disk. In the resulting display , locate the line for the logical volume for which you just added copies. For copies placed on separate disks, the numbers in the logical partitions column and the physical partitions column should be equal. Otherwise, the copies w[...]

  • Page 105

    Cluster Hardware and S oftware Preparation 87 The T askG uide uses a graphical interface to guide you thr ough the steps of adding nodes to an existing volume group. For more information on t he T ask Guide, see 3.4.6, “Alternate Method - T askG uide” on page 90. Importing the volume group onto the destination nodes synchronizes the ODM definit[...]

  • Page 106

    88 IBM Certifica tion Stud y Gu ide AIX HA CMP 3.4.4.4 V arying Off the V ol ume Group on the De stination Nodes Use the varyoffvg command to deactivate the shared volume group so that it can be imported onto another destination node or activ ated as appropriate by the cluster event scripts. Enter: varyoffvg volume_group_na me. 3.4.5 Qu orum Quorum[...]

  • Page 107

    Cluster Hardware and S oftware Preparation 89 command succeeds. If exactly half the copies are available, as with two of four , quorum is not achieved and the varyonvg command fails. 3.4.5. 2 Quorum afte r V ary On If a write to a physical v olume fails, the VGSAs on the other physical volumes within the volume group are updated to indicate that on[...]

  • Page 108

    90 IBM Certifica tion Stud y Gu ide AIX HA CMP Forcing a V aryon A volume group with quorum disabled and one or more physical volumes unavailable can be “forced” to vary on by using the -f flag with the varyonvg command. Forcing a varyon with missing disk resources can cause unpredictable results, including a reducevg of the physical volume fro[...]

  • Page 109

    Cluster Hardware and S oftware Preparation 91 conflict with the cluster ’s configuration. Online help panels give additional information to aid in each step. 3.4.6.1 T ask Guide Requir ements Before starting the T askGuide, make sur e: • Y ou have a configured HACMP cluster in place. • Y ou are on a graphics capable terminal. 3.4.6. 2 Start i[...]

  • Page 110

    92 IBM Certifica tion Stud y Gu ide AIX HA CMP[...]

  • Page 111

    © Copyright IBM Corp. 1 999 93 Chapter 4. HACMP Installation and Cluster Definition This chapter describes issues concerning the actual installation of HACMP V ersion 4.3 and the definition of a cluster and its r esources. It concentrates on the HACMP part of the installation, so, we will assume AIX is already at the 4.3.2 level. Please refer to t[...]

  • Page 112

    94 IBM Certifica tion Stud y Gu ide AIX HA CMP cluster. base.server.u tils HACMP Base Server Uti lities • cluster .cspoc This component includes all of the commands and environment for the C-SPOC utility , the Cluster-Single Point Of Control feature. These routines are responsible for centralized administration of the cluster . There is no restri[...]

  • Page 113

    HACMP Installation and Cluster D efinition 95 • cl ust er .vsm The Visual Systems Management File set contains Icons and bitmaps for the graphical Management of HACMP Resources, as well as the x hacmpm command: cluster. vsm HACMP X11 Depen dent • cluster .haview This fileset contains the files for including HACMP cluster views into a TME 10 Net[...]

  • Page 114

    96 IBM Certifica tion Stud y Gu ide AIX HA CMP This fileset contains the Application Heart Beat Daemon, Oracle Parallel Server is an application that makes use of it: cluster.hc.rte Application Heart Be at Daemon The inst allation of CRM req uires th e followin g softwar e: bos.rte.lvm.usr.4.3.2.0 AIX Run-time Executable Insta ll Se rver Node s Fro[...]

  • Page 115

    HACMP Installation and Cluster D efinition 97 HACMP software to HACMP for AIX, V ersion 4.3. The comments on upgrading the Operating System are not included. If you are already running AIX 4.3, see the special note at the end of this section. 4.1.2.1 Upgrading from V ersion 4.1.0 through 4.2.2 to V e r sion 4.3 The following procedure applies to up[...]

  • Page 116

    98 IBM Certifica tion Stud y Gu ide AIX HA CMP Install HA CMP 4.3 for AI X on Node A 5. After upgrading AIX and verifying that the disks are correctly configured, install the HACMP 4.3 for AIX software on Node A. For a short description of the filesets, please refer to 4.1.1, “First Time Installs” on page 93 or to Chapter 8 of the HACMP for AIX[...]

  • Page 117

    HACMP Installation and Cluster D efinition 99 file on Node A using the following command: /usr/sbi n/cluster/uti lities/cl lsif -x >> /.rhos ts This command will append information to the /.rhosts file instead of overwriting it. Then, you can ftp this file to t he other nodes as necessar y . 12.V erify the cluster topology on all nodes using [...]

  • Page 118

    100 IBM Certific ation Stu dy Guid e AIX HAC MP 2. If you wish to save your cluster configuration, see the chapter Sav ing and Restoring Cluster Configurations in the HACMP for AIX, V ersion 4.3: Administration Guide, SC23-4279. 3. Commit your current HACMP for AIX software on all nodes. 4. Shut down one node (gracefully with t ak eover) using the [...]

  • Page 119

    HACMP Installation and Cluster Defini tion 10 1 • The network modules Y ou define the cluster topology by enter ing information about each component into HACMP-specific ODM classes. Y ou enter the HACMP ODM data by using the HACMP SMIT interface or the VSM utility xhacmpm . The xhacmpm util ity is an X Windows tool for creating cluster configurat[...]

  • Page 120

    102 IBM Certific ation Stu dy Guid e AIX HAC MP Adding or Changing a Node Name a fte r the Initial Config uration If you want to add or change a node name after the initial configuration, use the Change/Show Cluster Node N ame screen. See the chapter on changing the cluster topology of the HACMP for AIX, V ersion 4.3:Administration Guide, SC23-4279[...]

  • Page 121

    HACMP Installation and Cluster Defini tion 10 3 Network Name Enter an ASCII text string that identifies the network. The network name can include alphabe tic and numeric characters and underscores. Use no more than 31 characters. The network name is arbitrary , but must be used consistently for adapters on the same physical network. If several adap[...]

  • Page 122

    104 IBM Certific ation Stu dy Guid e AIX HAC MP Adapte r Iden tifier Enter the IP address in dotted decimal format or a device file name. IP address information is required for non-serial network adapters only if the node’ s address cannot be obtained from t he domain name server or the local /etc/hosts file (using the adapter IP label given). Y [...]

  • Page 123

    HACMP Installation and Cluster Defini tion 10 5 Adding or Changing Ada pters after the Initi al Configuration If you want to change the information about an adapter after the initial configuration, use the Change/Show an Adapter screen. See the chapter on changing the cluster topology in the HACMP for AIX, V ersion 4.3: Administration Guide , SC23-[...]

  • Page 124

    106 IBM Certific ation Stu dy Guid e AIX HAC MP •S L I P • SP Switch •A T M It is highly unlikely that you will add or remove a network module. For information about changing a characteristic of a Network Module, such as the failure detection rate, see the chapter on changing the cluster topology in t he HACMP for AIX, V ersion 4.3: Administr[...]

  • Page 125

    HACMP Installation and Cluster Defini tion 10 7 configuration. If the cluster manager is active on some other cluster nodes but not on the lo cal node, the synchronization operation is aborted. Before attempting to synchronize a cluster configurat ion, ensure that all nodes are powered on, that the H A CMP software is installed, and that the /etc/h[...]

  • Page 126

    108 IBM Certific ation Stu dy Guid e AIX HAC MP 4.3 Defin ing Reso urces The HACMP for AIX software provides a highly available environment by identifying a set of cluster-wide resources essential to uninterrupted processing, and then by defining relationships among nodes that ensure these resources are available to client processes. Resources incl[...]

  • Page 127

    HACMP Installation and Cluster Defini tion 10 9 4.3.1.1 Configuring Resources for Resource Groups Once you have defined resource groups, you further configur e them by assigning cluster resources to one resource group or another . Y ou can configure resource groups even if a node is powered down. However , SMIT cannot list possible shared resources[...]

  • Page 128

    11 0 IBM Certification S tudy Gu ide AIX HACMP These settings also have to be synchronized throughout the cluster . Therefore Synchronize Cluster Resources has to be chosen from the corresponding SMIT Menu. If the Cluster Manager is running on t he local node, synchronizing cluster resources triggers a dynamic reconfiguration event (DARE, see 8.5.3[...]

  • Page 129

    HACMP Installation and Cluster Definition 111 as the path locations for start and s top scripts for the application. These scripts have to be in the same location on every service node. Just as for pre- and post-events, these scripts can be adapted to specific nodes. They don’t need to be e qual in content. The system administrator has to ensure,[...]

  • Page 130

    11 2 IBM Certification S tudy Gu ide AIX HACMP 4.4.2 Initia l Startup At this point in time, the cluster is not yet started. So the cluster manager has to be started first. T o check whether the cluster manager is up, you can either look for the process with the ps command: ps -ef | grep clstr or look for the status of the cluster group subsystems:[...]

  • Page 131

    HACMP Installation and Cluster Definition 11 3 For cascading resource groups the failed node is going to reaquire its resources, once it is up and running again. So, you have to restart HACMP on it through smit ty clstart and check again for the logfile, as well as the clusters status. Further and more intensive debugging issues are covered in Chap[...]

  • Page 132

    11 4 IBM Certification S tudy Gu ide AIX HACMP Essentially , a snapshot saves all the ODM classes HACMP has generated during its configuration. It does not save user customized scripts, such as start or stop scripts for an application server . However , the location and names of these scripts are in an HACMP ODM class, and are therefore saved. It i[...]

  • Page 133

    HACMP Installation and Cluster Definition 11 5[...]

  • Page 134

    11 6 IBM Certification S tudy Gu ide AIX HACMP[...]

  • Page 135

    © Copyright IBM Corp. 1 999 11 7 Chapter 5. Cluster Customization Within an HACMP for AIX cluster , there are several things that are customizable. The following paragraphs explain the customizing features for events, error notification, network modules and topology services. 5.1 Even t Customiz ation An HACMP for AIX cluster environment acts upon[...]

  • Page 136

    11 8 IBM Certification S tudy Gu ide AIX HACMP acquire_service_addr (If configured for IP address takeover .) Configures boot addresses to the corresponding service address, and starts TCP/IP servers and network daemons by running the t elinit -a command. acquire_takeover_addr The script checks to see if a configur ed standby address exists, then s[...]

  • Page 137

    Cluster Customization 11 9 event occurs only after a node_up_remote event has successfully completed. Sequence of node_down Events node_d own This event occ urs when a node intentionally leaves the cluster or fails. Depending on whether the exiting node is local or remote, this event initiates either the node_down_local or node_down_remote event, w[...]

  • Page 138

    120 IBM Certific ation Stu dy Guid e AIX HAC MP node_down_local_complete Instructs the Cluster Manager to ex it when the local node has left the cluster . This ev ent occurs only after a node_down_local event has successfully compl eted. node_down_remote_complete Starts takeov er application servers. This event runs only after a node_down_remote ev[...]

  • Page 139

    Cluster Customization 121 no actions since appropriate actions depend on the local network c onfiguration. 5.1.1. 3 Netwo rk Adapte r Event s swap_adapter This event occurs when the service adapter on a node fails. The swap_adapter event exchanges or swaps the IP addresses of the service and a standby adapter on the same HACMP network and then reco[...]

  • Page 140

    122 IBM Certific ation Stu dy Guid e AIX HAC MP reconfig_resource_complete This event indicates that a cluster resource dynamic reconfiguration has completed. 5.1.2 Pre - and Po st-Event Pro cessing T o tailor ev ent processing to your environment, specify commands or user-defined scripts that should execute before and/or after a specific event is [...]

  • Page 141

    Cluster Customization 123 For example, a file system cannot be unmounted, because of a process running on it. Then, you might want to kill that process first, before unmounting the file system, in order to ge t the event scr ipt done. Now , since the event script didn’t succeed in its f irst run, the Retr y feature enables HACMP for AIX to retry [...]

  • Page 142

    124 IBM Certific ation Stu dy Guid e AIX HAC MP Each time an error is logged in the system error log, the error notification daemon determines if the error lo g entry matches the selection criteria. If it does, an executable is run. This executable, called a notify method , can range from a simple command to a complex program. F or exampl e, the no[...]

  • Page 143

    Cluster Customization 125 The failure rate of network s varies, depending on their characteristics. For example, for an Ethernet, the nor mal failure detection rate is two ke epalives per second; fast is about four per second; slow is about one per second. F or an HPS network, because no network traf fic is allowed when a node joins the cluster , n[...]

  • Page 144

    126 IBM Certific ation Stu dy Guid e AIX HAC MP T o prevent problems with NFS file systems in an HACMP cluster , make sure that each shared volume group has the same major number on all nodes. The lvlstmaj or command lists the free major numbers on a node. Use this command on each node to find a major number that is fr ee on all clus ter nodes, the[...]

  • Page 145

    Cluster Customization 127 Figure 14. NFS Cross Mounts When Node A fail s, Node B uses the cl_nfskill utility to close open files in Node A:/afs, unmounts it, mounts it locally , and r e-exports it to waiting clients. After takeover , Node B has: /bfs locally mounted /bfs nfs-exported /afs locally mounted /afs nfs-exported Ensure that the shared vol[...]

  • Page 146

    128 IBM Certific ation Stu dy Guid e AIX HAC MP • Ensure that node name and the service adapter label ar e the same on each node in the cluster or • Alias the node name to the service adapter label in the / etc/hosts file. 5.4. 5 Cros s Mounted NFS File Syst ems an d the N etwork Lock Manager If an NFS client application uses the Network Lock M[...]

  • Page 147

    Cluster Customization 129 ######## A dd for NF S Lock Removal ( start) ### ##### ######## A dd for NF S Lock Removal ( finish) ## ###### ########## ######### ####### ######### ########## ########## ######### ########## ##### # # Name: cl_deacti vate_nf s # # Given a list of nfs-mou nted file systems, w e try and unmount - f # any that are currently[...]

  • Page 148

    130 IBM Certific ation Stu dy Guid e AIX HAC MP fi /bin/rm -f /etc /sm.bak/$ host /bin/rm -f /etc /sm/$host /bin/rm -f /etc /state fi ######## A dd for NF S Lock Removal ( finish) ## ###### # Send a SIGKILL to all processes having o pen file # descr iptors wit hin th is logical volume t o allow # the u nmount to succee d.. cl_nfski ll -k -u $ fs fi[...]

  • Page 149

    © Copyright IBM Corp. 1 999 131 Chapter 6. Cluster T est ing Before you start to test the HACMP configuration, you need to guar antee that your cluster nodes are in a stable state. Check the state of the: • Devices • System parameters • Processes • Network adapters •L V M •C l u s t e r • Other items such as SP Switch, printers, and [...]

  • Page 150

    132 IBM Certific ation Stu dy Guid e AIX HAC MP 6.1.2 Sy stem Param eters • T ype date on all nodes to check th at all the nodes in the cluster are running with their clocks on the same time. • Ensure that the number of user licenses has been correctly set (lslicen se ). • Check high water mark and other system settings ( smitty chgsy s) . ?[...]

  • Page 151

    Cluster T esting 133 • Check that all interfaces communicate ( ping <ip-addres s> or ping -R <ip-addr ess>). • List the arp table entries with arp -a . • Check the status of the TCP/IP daemons ( lssr c -g tcpip ). • Ensure that there are no bad entries in the /etc/hosts file, especially at the bottom of the file. • V erify tha[...]

  • Page 152

    134 IBM Certific ation Stu dy Guid e AIX HAC MP • V erify the c luster configuration by running /usr/sb in/clu ster/diag/ clconfig -v ’-tr’ . • T o show clus ter configuration, run: /usr /sbin/cluster /utilit ies/cllsc f . • T o show the clstrmgr version, type: snmpinfo -m dum p -o /usr/sbi n/cluster/hac mp.defs c lstrmgr . 6.2 Simula te [...]

  • Page 153

    Cluster T esting 135 • Use ifconfig to swap the service address back to the original service interface back ( ifcon fig en1 down ). This will cause the service IP address to failover back to the service adapter on N odeF . 6.2.1.2 Ethernet or T oken Ri ng Adapter or Cable Failure Perform the following steps in the event of an Ethernet or T oken R[...]

  • Page 154

    136 IBM Certific ation Stu dy Guid e AIX HAC MP • Generate the switch error in the error l og which is being monitored by HACMP Error Notification (for configuration see 2. 6.2.1, “Single Point-of-Failure Hardware Component Recovery” on page 46) , or , if the network_down event has been customized, bring down css0 ( i fconfig css0 down ) or f[...]

  • Page 155

    Cluster T esting 137 • V erify that all sharedvg file systems and paging spaces are accessi ble ( df -k and lsps -a ). 6.2.2 No de Failure / Reintegra tion The following sections deal with issues of node failure and reintegration. 6.2.2.1 AIX Cras h Perform the following steps in the event of an AIX crash: • Check, by way of the verification co[...]

  • Page 156

    138 IBM Certific ation Stu dy Guid e AIX HAC MP • V erify tha t failover has occurred ( netstat -i and ping for net works, lsvg -o and vi of a test file for volume groups, and ps -U <ap puid > for application processes). • Power cycle NodeF . If HACMP is not configured to start from /etc/inittab (on restart), start HACMP on NodeF ( smit c[...]

  • Page 157

    Cluster T esting 139 • Monitor the cluster log files on NodeT . • Disconnect the network cable from the appropriate serv ice and all the standby interfaces at the same time (but not the Administrative SP Ethernet) on NodeF . This will cause HACMP to detect a network_down event. • HACMP triggers events dependent on your configuration of the ne[...]

  • Page 158

    140 IBM Certific ation Stu dy Guid e AIX HAC MP • Reconnect hdisk0, close the casing, and turn the key to normal mode. • Power on NodeF then verify that t he r ootvg logical volumes are no longer stale ( lsvg - l rootv g ). 6.2. 4.2 7135 Disk Failure Perform the following steps in the event of a disk failure: • Check, by way of the verificati[...]

  • Page 159

    Cluster T esting 141 • Monitor cluster logfiles on NodeT if HACMP has been customized to monitor 7133 disk failures. • Since the 7133 disk is hot pluggable, remove a disk from drawer 1 associated with NodeF's shared volume group. • The failure of the 7133 disk will be detected in the error log ( errpt -a | more ) on NodeF , and the logic[...]

  • Page 160

    142 IBM Certific ation Stu dy Guid e AIX HAC MP[...]

  • Page 161

    © Copyright IBM Corp. 1 999 143 Chapter 7. Cluster T roubleshooting T ypically , a functioning HACMP cluster requires minimal intervention. If a problem occurs, however , diagnostic and recovery s kills are essential. Thus, troubleshooting requires that you identif y the problem quickly and apply your understanding of the HACMP for AIX software to[...]

  • Page 162

    144 IBM Certific ation Stu dy Guid e AIX HAC MP For a more detailed description of the cluster log files consult Chapter 2 of the HACMP for AIX, V ersion 4.3: T roubleshooting Guide , SC23-4280. 7.2 confi g_too_long If the cluster manager recognizes a state change in the cluster , it acts upon it by executing an event script. However , some circums[...]

  • Page 163

    Cluster T roubleshooting 145 hang. After a certain amount of time, by default 360 seconds, the cluster manager will issue a config_too_long message into the /tmp/hacmp.out file. The message issued looks like this: The cluster has been in rec onfigurati on too long;Somethi ng may be wrong. In most cases, this is because an event script has failed. Y[...]

  • Page 164

    146 IBM Certific ation Stu dy Guid e AIX HAC MP 7.3.1 Tuning the Syst em Using I/O Pacing Use I/O pacing to tune the system so that system resources are distributed more equitably during large disk writes. Enabling I/O pacing is required for an HACMP cluster to behave correctly during large disk writes, and it i s strongly recommended if you antici[...]

  • Page 165

    Cluster T roubleshooting 147 7.3.4 Ch anging the Failure Detec tion Rate Use the SMI T Chang e/Show a Cluster Netwo rk Module screen to change the failure detection rate for your networ k module only if enabling I/O pacing or extending the syncd frequency did not resolve deadman problems in your cluster . By changing the failure detection rate to ?[...]

  • Page 166

    148 IBM Certific ation Stu dy Guid e AIX HAC MP and control messages so that the Cluster Manager has accurate information about the status of its partner . When a cluster becomes partitioned, and the network pr oblem is cleared after the point when takeover processing has b egun so that keepalive packets start flowing between the partitioned nodes [...]

  • Page 167

    Cluster T roubleshooting 149 7.6 Us er ID Prob lems Within an HACMP cluster , you always have more than one node potentially offer ing the same service to a specific user or a specific user id. As the node providing the service can change, the system administrator has to ensure that the same user and group is known to all nodes potentially running [...]

  • Page 168

    150 IBM Certific ation Stu dy Guid e AIX HAC MP • Go from the simple to the complex. Make the simple tests first. Do not tr y anything complex and complicated until you have ruled out the simple and obvious. • Do not make more than one change at a time. If y ou do, and one of the changes corrects the problem, you have no way of knowing which c [...]

  • Page 169

    © Copyright IBM Corp. 1 999 151 Chapter 8. Cluster Management and Administration This chapter covers all aspects of monitoring and managing an existing HACMP cluster . This includes a description of the different monitoring methods and tools available, how to start and stop the c luster , changing cluster or resource configurations, applying softw[...]

  • Page 170

    152 IBM Certific ation Stu dy Guid e AIX HAC MP Consult the HACMP for AIX, V ersion 4.3: T roubleshooting Guide , SC23-4280, for help if you detect a pr oblem with an HACMP cluster . 8.1.1 The clstat C ommand HACMP for AIX provides the /us r/sbin/ cluster/c lstat command for monitoring a cluster and its components. The cl stat utility is a clinfo c[...]

  • Page 171

    Cluster Management and Administration 153 More details on how to configure HA View and on how to monitor your cluster with HA View can be found in Chapter 3, “Monitoring an HACMP cluster” in HACMP for AIX, V ersion 4.3: Administration Guide , SC23-4279. 8.1.3 Clu s ter Log Files HACMP for AIX writes the messages it generates to the system conso[...]

  • Page 172

    154 IBM Certific ation Stu dy Guid e AIX HAC MP 8.1.3.5 /tmp/cm.log Contains timestamped, formatted messages generated by H A CMP for AIX clstrmgr activ ity . This file is typically used by IBM support personnel. 8.1.3.6 /tmp/cspoc.log Contains timestamped, formatted messages generated by H A CMP for AIX C-SPOC commands. The / tmp/cspoc.log file re[...]

  • Page 173

    Cluster Management and Administration 155 (C-SPOC) utility c an be used to start and stop cluster serv ices on all nodes in cluster environments. Starting cluster services refers to the process of starting the HACMP fo r AIX daemons that enable the coordination required betw een nodes in a cluster . Starting cluster services on a node also triggers[...]

  • Page 174

    156 IBM Certific ation Stu dy Guid e AIX HAC MP 8.2.1.4 Cluster Information Program daemon (clinfo) This daemon provides status information about the cluster to cluster nodes and clients and invokes the /usr/sbin /cluster/etc/ clinfo.rc sc ript in response to a cluster event. The c linfo daemon is optional on c luster nodes and clients. However , i[...]

  • Page 175

    Cluster Management and Administration 157 are started in sequential order - not in par allel. The output of the command run on the remote node is returned to the originating node. Because the command is executed remotely , there can be a delay before the command output is returned. 8.2.2.1 Automati cally Rest arting Clus ter Servi ces Y ou can opti[...]

  • Page 176

    158 IBM Certific ation Stu dy Guid e AIX HAC MP node. Because the command is executed remotely , there can be a delay before the command output is ret urned. 8.2.3. 1 When t o Stop C luster services Y ou typically stop cluster services in the following situations: • Before making any hardware or software changes or other s cheduled node shutdowns[...]

  • Page 177

    Cluster Management and Administration 159 prevents unpredictable behavior from corrupting the data on the shared disks. See the clexit.rc man page for additional information. 8.2. 4 Star ting and S toppi ng Clu ster Serv ices o n Clients Use the /us r/sbin/ cluster/e tc/rc.cluster script or the startsr c command to start clinfo on a c lient, as sho[...]

  • Page 178

    160 IBM Certific ation Stu dy Guid e AIX HAC MP 8.3 Rep lacing Failed Components From time to time, it will be necessary to perform hardware maintenance or upgrades on cluster components. Some replacements or upgrades can be performed while the cluster is operative, while others r equi re planned downtime. Make sure you plan all the necessary actio[...]

  • Page 179

    Cluster Management and Administration 161 • The new adapter must be of the same type or a compatible type as the replaced adapter . • When replacing or adding an SCSI adapter , remove the resistors for shared buses. Furthermore, set the SCSI ID of the adapter to a value differ ent than 7. 8.3.3 Dis k s Disk failures are handled differently acco[...]

  • Page 180

    162 IBM Certific ation Stu dy Guid e AIX HAC MP 4. Logically remove the disk from the system ( rmdev -l hdi skX -d; rmdev -l pdiskY - d if a SSA disk ) on all nodes. 5. Physically remove the failed disk and replace it with a new disk. 6. Add the disk to the ODM ( mkde v or cfgmgr) on all nodes. 7. Add the disk to the shared volume group ( extendvg [...]

  • Page 181

    Cluster Management and Administration 163 8.4 Cha nging Sh ared L VM Com ponents Changes to VG constructs are probably the most frequent kind of changes to be performed in a cluster . As a system administrator of an HACMP for AIX cluster , you may be called upon to perform any of the following L VM-related tasks: • Creating a new shared volume gr[...]

  • Page 182

    164 IBM Certific ation Stu dy Guid e AIX HAC MP When changing shared L VM components manually , you will usually need to run through the following procedure: 1. Stop HACMP on the node owning the shared volume group (sometimes a stop of the applications using the shared volume group may be suf ficient). 2. Make the necessary changes to the shared L [...]

  • Page 183

    Cluster Management and Administration 165 Lazy Update has some limi tations, which you need to consider when you rely on Lazy Update in general: • If the first disk in a sharedvg has been r eplaced, the impor tvg command will fail as Lazy Update expects to be able to match the hdisk number for the first disk to a valid PVID in the ODM. • Multi-[...]

  • Page 184

    166 IBM Certific ation Stu dy Guid e AIX HAC MP • Shared volume groups • List all volume groups in the cluster . • Import a volume group (with HACMP 4.3 only). • Extend a volume group (with HACMP 4.3 only). • Reduce a volume group (with HACMP 4.3 only). • Mirror a volume group (with HACMP 4.3 only). • Unmirror a volume group (with HAC[...]

  • Page 185

    Cluster Management and Administration 167 T o us e the SMIT shortcuts to C-SPOC, type smit cl_lvm or smit cl_conl vm for concurrent volume groups. Concurrent volume groups must be varied on in concurrent mode to perform tasks. 8.4.4 T askGu ide The T as kGuide is a graphical interface that simplifies the task of creating a shared volume group withi[...]

  • Page 186

    168 IBM Certific ation Stu dy Guid e AIX HAC MP T o change the nodes associ ated with a given resource group, or to change the priorities assigned to the nodes in a resource grou p chain, you must redefine the resource group. Y ou must also redefine the resource group if you add or change a resource assigned to the group. This section describes how[...]

  • Page 187

    Cluster Management and Administration 169 • If the Cluster M anager is active on the local node, synchronization tr iggers a cluster-wide, dynamic reconfiguration event. In dynamic reconfiguration, the configuration data stor ed in the DCD is updated on each clus ter node, and, in addition, the new OD M data replaces the ODM data stored in the AC[...]

  • Page 188

    170 IBM Certific ation Stu dy Guid e AIX HAC MP 8.5.3.1 Resource Migration T ypes Before performing a resource migration, decide if you will declare the migration sticky or non-sticky . Stic ky Re sour ce Migr ation A sticky migration permanently attaches a resource group t o a specified node. The resource group attempts t o remain on the specified[...]

  • Page 189

    Cluster Management and Administration 171 INACTIVE_T AKEOVER flag set to false and has not yet started because its primary node is down. In general, however , only rotating resource groups should be migrated in a non-sticky manner . Such migrations are one-time events and occur simi lar to normal rotating resource group flavors. After migrat i on, [...]

  • Page 190

    172 IBM Certific ation Stu dy Guid e AIX HAC MP If you do not include a location specifier in the location f ield, the DARE Resource Migration utility performs a default migra tion, again making the resources available for reacquisition. Stop Location The second special location keyword, stop , causes a resource group to be made inactive, preventin[...]

  • Page 191

    Cluster Management and Administration 173 Note that you cannot add nodes to the resource group list with the DARE Resource Migration utility . This task is performed through SMIT . Stopping Resource Groups If the location field of a migration contains the keywor d sto p instead of an actual nodename, the DARE Resource Migration ut i lity attempts t[...]

  • Page 192

    174 IBM Certific ation Stu dy Guid e AIX HAC MP Be aware that persistent sticky location markers are saved and restored in cluster snapshots. Y ou can use the clfindre s command to find out if sticky markers are present in a resource group. If you want to remove sticky location markers while the cluster is down, the default keyword is not a valid m[...]

  • Page 193

    Cluster Management and Administration 175 5. Restart the HACMP for AIX software on the node using the smit c lstart fastpath and verify that the node successfully joined the cluster . 6. Repeat Steps 1 through 5 on t he remaining cluster nodes. Figure 15 below shows the procedure: Figure 15. A pplyin g a PTF to a Clus ter Node Along with the normal[...]

  • Page 194

    176 IBM Certific ation Stu dy Guid e AIX HAC MP • Cluster nodes should be running the same HACMP maintenance levels. There might be incompatibilities between various maintenance levels of HACMP, so you must ensure that consistent levels are maintained across all cluster nodes. The cluster must be taken down to update the maintenance levels. 8.7 B[...]

  • Page 195

    Cluster Management and Administration 177 8.7.1.1 How to do a split-mirror backup This same procedure can be used with just one mirrored copy of a logical volume. If you remove a mirrored copy of a logical volume (and file system), and then create a new logical volume (and file system) using the allocation map from that mirrored copy , your new log[...]

  • Page 196

    178 IBM Certific ation Stu dy Guid e AIX HAC MP 9. After the backup is complete and verified, unmount and delete the new file system and the logical volume you used for it. 10.Use the mklvcopy command to add back the logical volume copy you previously split off to the f s lv logical volume. 1 1. Resynchronize the logical volume. Once the mirror cop[...]

  • Page 197

    Cluster Management and Administration 179 they don’t match, the user won’t get anything done after a failover happened. So, the administrator has to keep definitions equal throughout t he cl uster . Fortunately , the C- SPOC utility , as of HACMP V ersion 4.3 and later , does this for you. When you create a cluster group or user u sing C-SPOC, [...]

  • Page 198

    180 IBM Certific ation Stu dy Guid e AIX HAC MP T o add a user on one or more nodes in a cluster , you can either use the AIX mkuser command in a rsh to one clusternode after the other , or use the C-SPOC cl_m kuser command or the Add a User to the Cluster SMIT screen. The cl_mku ser command calls the AIX mkuser command to c reate the user account [...]

  • Page 199

    Cluster Management and Administration 181 T o remove a user account from one or more cluster nodes, yo u can either use the AIX rmuser command on one cluster node after the other , or use the C-SPOC cl_rmuser command or the C-SPO C Remove a User from the Cluster SMIT screen. The cl_rmuser c ommand executes the AIX rmuser command on all cluster node[...]

  • Page 200

    182 IBM Certific ation Stu dy Guid e AIX HAC MP[...]

  • Page 201

    © Copyright IBM Corp. 1 999 183 Chapter 9. Specia l RS/600 0 SP T opics This chapter will introduce you to some special topics that only apply if you are running HACMP on the SP system. 9.1 High A vailability Control Workstatio n (HACWS) If you are thinking about what could happen to your SP whenever the Control Workstation might fail, you will pr[...]

  • Page 202

    184 IBM Certific ation Stu dy Guid e AIX HAC MP need to have the frame supervisor s support dual tty lines i n order to get both control workstations connected at the same time. Contact your IBM representative for the neccessary hardware (see F igure 16 on page 184). Both the tty network and the RS/6000 SP inter nal ethernet are extended to the bac[...]

  • Page 203

    Special RS/6000 SP T opics 185 The backup cws has to be installed with the same level of AIX and PSSP . Depending on the kerberos configuration of the primary cws, the backup cws has to be configured either as a secondary authentication server for the authentication realm of your RS/6000 SP when the pr imary cws is an authentication server itself, [...]

  • Page 204

    186 IBM Certific ation Stu dy Guid e AIX HAC MP ordinary HACMP cluster , as it is described in Chapter 7 of the HACM P for AIX, V ersion 4.3: Installation Guide , SC23-4278. Now the cluster environment has to be configured. D efine a clus ter ID and name for your HACWS cluster and define the two nodes to HACMP. Adapters have to be added to your clu[...]

  • Page 205

    Special RS/6000 SP T opics 187 After that, identify the HACWS event scripts to HACMP by executing the /usr/sbi n/hacws/spcw_add events command, and verify the configuration with the /usr/sbi n/hacws/hacws_ve rify command. Y ou should also check the cabling from the backup cws with the /usr/sbi n/hacws/spcw_ver ify_cabling command. Then reboot the p[...]

  • Page 206

    188 IBM Certific ation Stu dy Guid e AIX HAC MP The following is simply a shortened description on how kerberos works. For more details, the redbook Inside the RS/6000 SP , SG24-5145, covers the subject in much more detail. When dealing with authentication and Kerberos, three entities are involved: the client , who is requesting service f r om a se[...]

  • Page 207

    Special RS/6000 SP T opics 189 allow the clients to get service ticket s to be used with other servers without the need to give them the p ass word every time they request services. So, given a user has a ticket-granting ticket, if a user requests a kerberized service, he has to get a service ticket for it. In order to get one, the ker berized comm[...]

  • Page 208

    190 IBM Certific ation Stu dy Guid e AIX HAC MP After setting the cluster’s security settings to enhanced for all these nodes, you can verify that it is working as expected, for example, by running clverify , which goes out to the nodes and checks the consistency of files. 9.3 VS Ds - R VSDs VSDs (Virtual Shar ed Disks) and R VS Ds (Recoverable V[...]

  • Page 209

    Special RS/6000 SP T opics 191 With reference to Figure 17 above, imagine tw o nodes, Node X and Node Y , running the same application. The nodes are connected by the switch and have locally-attached disks. On Node X’s disk resides a volume group containing the raw logical volume lv_X. Similarly , Node Y has lv_Y . For the sake of illustration, l[...]

  • Page 210

    192 IBM Certific ation Stu dy Guid e AIX HAC MP The VSDs in this scenario are mapped to the raw logical volumes lv_X and lv_Y . Node X is a client of Node Y’s VSD, and vice versa. Node X is also a direct client of its own VSD (lv_X), and N ode Y is a direct client of VSD lv_Y . VSD configuration is flexible. An interesting property of the archite[...]

  • Page 211

    Special RS/6000 SP T opics 193 impact of servicing a local I/O request through VSD relative to the normal VMM/L VM pathway is very small. IBM supports any IP network for VSD, but we recommend the switch for performance. VSD provides distributed data access, but not a locking mechanism to preserve data integrity . A separate product such as Oracle P[...]

  • Page 212

    194 IBM Certific ation Stu dy Guid e AIX HAC MP operation that was in progress, as well as new I/O operations against rvsd_X, are suspended until failover is complete. When Node X is repaired and rebooted, R VSD switches the rvsd_X back to its primary , Node X. The R VSD subsystems are shown in Figure 20 on page 194. The rvsd daemon controls recove[...]

  • Page 213

    Special RS/6000 SP T opics 195 9.4 SP Switc h as an HA CMP Network One of the fascinating things with an RS/6000 SP is the switch network. It has developed over time; so, currently there are two types of switches at customer sites. The “older” HPS or HiPS switch (High Performance Switch), also known as the TB2 switch, and the “newer” SP Swi[...]

  • Page 214

    196 IBM Certific ation Stu dy Guid e AIX HAC MP 9.4.2 Eprimary Mana gement The SP switch has an internal primary backup concept, where the primary node, known as the Eprimary , is backed up automatically by a backup node. So, in case any serious failure happens on the primary , it will resign from work, and the backup node will take over the switch[...]

  • Page 215

    Special RS/6000 SP T opics 197 In case this node was the Eprimary node on the switch net wor k, and it is an SP switch, then the RS/6000 SP software would have chosen a new Eprimary independently from the HACMP software as well.[...]

  • Page 216

    198 IBM Certific ation Stu dy Guid e AIX HAC MP[...]

  • Page 217

    © Copyright IBM Corp. 1 999 199 Chapter 10. HACMP Classic vs. HACMP/ES vs. HANFS So, why would you prefer to install one version of HACMP instead of another? This chapter summarizes the differences between them, to give you an idea in which situation one or the other best matches your needs. The certification test itself does not refer to these di[...]

  • Page 218

    200 IBM Certific ation Stu dy Guid e AIX HAC MP handling membership and event management by using heartbeats. On the SP , the original High Availability infrastructure was built on this t echnology , and HACMP/ES V ersion 4.3. is now another instance relying on it. As of AIX 4.3.2 and PSSP 3.1, the High Availability infrastructure, which previously[...]

  • Page 219

    HACMP Classic vs. HACMP/ES vs. HANFS 201 See Part 4 of HACMP for AIX, V ersion 4.3: Enhanced Scalability Installation and Administration Guide , SC23-4284, for more information on t hes e services. 10.2.2 E nhanced Cl uster Secu rity With HACMP V ersion 4.3 comes an option to switch security Mode between Standard and Enhanced. Sta ndard Synchroniza[...]

  • Page 220

    202 IBM Certific ation Stu dy Guid e AIX HAC MP 10.4 Simila rities and Diffe rences All three products have the basi c structure in common. They all use the same concepts and structures. So, a cluster or a network, in the HACMP context, is the same, no matter what pr oduct is being used. There is always a Cluster Manager controlling the node, keepi[...]

  • Page 221

    HACMP Classic vs. HACMP/ES vs. HANFS 203 For switchless RS/6000 SP systems or SPs with the newer SP Switch, the decision will be based on a more functional level. Event Management is much more flexible in HACMP/ES, since you can define custom events. These events can act on anything that haemd can detect, which is virtually anything measurable on a[...]

  • Page 222

    204 IBM Certific ation Stu dy Guid e AIX HAC MP[...]

  • Page 223

    © Copyright IBM Corp. 1 999 205 Appendix A. Special Notic es This publication is intended to help System Administr ators, System Engineers and other System Professionals to pass the IBM HACMP Cer tification Ex am. The information in this publication is not intended as the spec ification for any of the following programming interfaces: HACMP , HACM[...]

  • Page 224

    206 IBM Certific ation Stu dy Guid e AIX HAC MP been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these t echniques to their own environments do so at their own risk. Any pointers in this publication to external Web s ites are p[...]

  • Page 225

    Special Notices 207 Java and HotJava are trademarks of Sun Microsystems, Incorporated. Microsoft, Windows, Windows NT , and the Windows 95 logo are trademarks or registered trademarks of Microsoft Corporation. PC Direct is a trademark of Zif f Communications Company and is used by IBM Corporation under license. Pentium, MMX, ProShare, LANDesk, and [...]

  • Page 226

    208 IBM Certific ation Stu dy Guid e AIX HAC MP[...]

  • Page 227

    © Copyright IBM Corp. 1 999 209 Appendix B. Related Publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook. B.1 Internati onal T echnical Support Organiz ati on P ublicati ons For information on ordering these ITSO publications see “How to Get[...]

  • Page 228

    210 IBM Certific ation Stu dy Guid e AIX HAC MP B.3 Other Publicati ons These publications are also relevant as additional sources of information: • IBM RS/6000 SP: Planning, V olume 2, Control Workstation and Software Environment , GA22-7281 • IBM PSSP for A IX: Install ation and Migration Guide , GA22-7347 • IBM PSSP for A IX: Managing Shar[...]

  • Page 229

    © Copyri ght IBM Corp. 1999 21 1 How to Get ITSO Redbooks This section explains how bot h custome rs and IB M employee s can f ind out a bout ITSO red books, CD-ROMs, worksho ps, an d re sidencies. A for m for or derin g boo ks and C D-ROMs is also provid ed. This information wa s curr ent at the t ime of publ ication , but is co ntinua lly subjec[...]

  • Page 230

    212 IBM Certific ation S tudy Gui de AIX H ACM P How C ustome rs Can Get IT SO Redboo ks Customers may request ITSO deliverables (re dbooks, BookManager BOOKs, and CD-ROMs) and informatio n about r edbooks, workshops, and resid encies in the foll owing w ays: • Online Orders – send order s to : • T eleph one Orders • Mail Orde rs – send o[...]

  • Page 231

    213 IBM Re dbook O rder For m Please send me the following: We accept Amer ican Expr ess, Diners , Eurocar d, Master Ca rd, and Visa. Pay ment by c redit car d not available in all countries . Signature mandator y for credit ca rd payment. Tit le Order Num ber Qua ntity First name Last nam e Company Addres s City Postal code T eleph one numb er T e[...]

  • Page 232

    214 IBM Certific ation S tudy Gui de AIX H ACM P[...]

  • Page 233

    © Copyright IBM Corp. 1 999 215 List of Abbreviations AIX Advanced Inter active Executive AP A All Poi nts Addre ssable AP AR Authoriz ed Progr am Analysis Report The descrip tion of a problem to be fixed by IBM defe ct supp ort. This fix is delivered in a PTF (see below). ARP Address Resolution Protocol ASCI I American Stand ard Code for I nfor m[...]

  • Page 234

    216 IBM Certific ation Stu dy Guid e AIX HAC MP NETBIO S Network Basic Input/Outp ut System NFS Network File S ystem NIM Netwo rk Inter face Module (Th is is the definition of NIM in the HACMP con text. N IM i n the AIX 4. 1 cont ext stands for Netw ork Installation Manag er) . NIS Network Infor mation Service NVRA M Non-V olatile Ra ndom Access Me[...]

  • Page 235

    © Copyright IBM Corp. 1 999 217 Index Symbols /.rhosts file edit ing 59 /etc/hos ts file and ad apte r labe l 38 /sbin/rc .boot f ile 146 /usr/sbi n/cluster/godm da emon 59 A abbrevi ations 21 5 Abnormal Terminati on 158 acron yms 215 Adapter Fa ilure 134 Adapter Fu nction 38 Adapter H ardware Addre s s 104 Adapter Id entifier 104 adapt er la bel [...]

  • Page 236

    218 IBM Certifica tion St udy Gu ide AIX HACM P DGSP message 148 Disk Capaci ties 19 Disk Fai l ure 139 dual-net work 36 Dynamic Reconfigurat ion 169 E edit ing /.rhosts file 59 emsv csd 156 Enhance d Cluster Securi ty 201 Eprimary 19 6 Error Notific ation 45 , 123 Ethernet 13 Event Cus tomization 44 , 117 Event Em ulator 123 Event Ma nager 200 Eve[...]

  • Page 237

    219 Network Topology 35 netw orks point-to-p oint 36 NFS mount ing fi les ystem s 126 takeov er issues 126 NFS cro ss mount 41 NFS Exports 41 NFS mount 41 NIM 199 NIS 58 Node Even ts 117 Node Failu re / R eint egra tion 137 Node isola tion 147 node rel ationships 108 non-con current access quorum 90 Non-S ticky Resour ce Mig rati on 170 P part itio[...]

  • Page 238

    220 IBM Certifica tion St udy Gu ide AIX HACM P Token-Rin g 13 Topolog y Service 20 0 topsvc sd 156 U Upgrading 96 user ac counts adding 179 changi ng 180 crea ting 179 remo ving 180 User and Group IDs 48 V VGDA 88 VGSA 88 Virtual Sh ared Disk (V SD s) 190 X xhacm pm 101[...]

  • Page 239

    © Copyright IBM Corp. 1 999 221 ITSO Redbook Eva l uation IBM Certif icatio n St udy Guide AI X HACMP SG24-5131-00 Y our f eedback is ver y impor tant to help u s maint ain the qua lity o f I TSO r edboo ks. Please compl ete this quest ionn air e an d retu rn it using one of the fol lowin g met hod s: • Use the online evaluation form found at ht[...]

  • Page 240

    Printed in the U.S.A . SG24-5131-00 IBM Certification Stud y Guide AIX HACMP SG24-5 131-00[...]