Saturday, October 22, 2022

Error: Do not have sufficient voting files - Restoring Voting Disk

 MOS article 428681.1 should be your starting point to add a vote disk in your RAC based on the version of RAC you are using and where you are storing your voting disk. In an Oracle RAC environment, it is recommended to store voting disk on a diskgorup with normal or high redundancy and have 3 copies of voting disk. During installation you can select the number of voting disks you want to create, or you can add extra copies later. If you have 4 physical disks in an ASM diskgroup with normal redundancy, you can choose to have 4 copies of vote disk and Oracle would create 4 voting disks on each physical disk of the diskgroup. In case of one or two disks failure, you would still have 2 voting disks copies to start the RAC. There is also a requirement of having at least 2 voting disks in order to start your RAC, otherwise RAC resources would not start up.

Do not have sufficient voting files, found 1 of 2 configured files, needed at least 2 

If you do not have at least 2 voting disks, Grid Infrastructure stack would write following in ocssd.log file, and would not start up.

2018-04-19 06:55:31.864: [   SKGFD][9616]Handle 0000000006D14B90 from lib :UFS:: for disk :\\.\ORCLDISKCRS0:2018-04-19 06:55:31.864: [    CSSD][9616]ASSERT clssnml.c 453

2018-04-19 06:55:31.864: [    CSSD][9616]clssnmlgetleasehdls: Do not have sufficient voting files, found 1 of 2 configured files, needed at least 2
2018-04-19 06:55:31.864: [    CSSD][9616]###################################
2018-04-19 06:55:31.864: [    CSSD][9616]clssscExit: CSSD aborting from thread Main
2018-04-19 06:55:31.864: [    CSSD][9616]###################################
2018-04-19 06:55:31.864: [    CSSD][9616](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2018-04-19 06:55:31.864: [    CSSD][9616]

----- Call Stack Trace -----
2018-04-19 06:55:32.310: [    CSSD][9616]
Symbol file D:\oracle\1120\grid\bin\orannzsbb11.SYM does not match binary.
 Symbol TimeStamp=52bb3a9d, Module TimeStamp=53a3d18a are different
2018-04-19 06:55:32.348: [    CSSD][9616]calling              call     entry                argument values in hex     
2018-04-19 06:55:32.348: [    CSSD][9616]location             type     point                (? means dubious value)    
2018-04-19 06:55:32.348: [    CSSD][9616]-------------------- -------- -------------------- ----------------------------
2018-04-19 06:55:32.634: [    CSSD][9616]
Symbol file D:\oracle\1120\grid\bin\orannzsbb11.SYM does not match binary.
 Symbol TimeStamp=52bb3a9d, Module TimeStamp=53a3d18a are different
2018-04-19 06:55:32.667: [    CSSD][9616]000000014009B293     CALL???  0000000140019820     000000000 000000000 1400D7760
2018-04-19 06:55:32.667: [    CSSD][9616]                                                   000000001
2018-04-19 06:55:32.667: [    CSSD][9616]000000014007D5B5     CALL???  000000014009B11A     004C500B8 140022DAD 1400CB560
2018-04-19 06:55:32.667: [    CSSD][9616]                                                   0002AF930


The case I am discussing here, I have 3 disks in my normal redundancy CRS diskgroup to store Cluster Registry and voting files, and I had 2 copies of voting files that were stored in 2 of the disks of this diskgroup. Because of a storage server failure, one of the disks got detached from the 2 nodes RAC and that caused voting disk stored on that disk to get corrupted. When same disk came online again, we were not able to start the RAC because we had only one copy of voting file available. CRS diskgroup was not mounted as one disk was gone, but we could still mount it using FORCE option because it still had 2 working disks attached to it and that was all needed to bring a normal redundancy diskgroup online.

In order to add a new copy of voting file, we needed to drop the existing disk of the ASM diskgorup that had corrupted voting file, and add that back again. Adding back would automatically create a new copy of voting file on the disk.

First you need to stop the CRS on all of the RAC nodes using force option so that if there is any service up, that gets shut down completely.

C:\>crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'myrac01'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'myrac01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'myrac01'
CRS-2673: Attempting to stop 'ora.crf' on 'myrac01'
CRS-2677: Stop of 'ora.mdnsd' on 'myrac01' succeeded
CRS-2677: Stop of 'ora.crf' on 'myrac01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'myrac01'
CRS-2677: Stop of 'ora.gipcd' on 'myrac01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'myrac01'
CRS-2677: Stop of 'ora.gpnpd' on 'myrac01' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'myrac01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'myrac01' has completed


Next, start the CRS on one node only in exclusive mode.

C:\>crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'myrac01'
CRS-2676: Start of 'ora.mdnsd' on 'myrac01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'myrac01'
CRS-2676: Start of 'ora.gpnpd' on 'myrac01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'myrac01'
CRS-2672: Attempting to start 'ora.gipcd' on 'myrac01'
CRS-2676: Start of 'ora.gipcd' on 'myrac01' succeeded
CRS-2676: Start of 'ora.cssdmonitor' on 'myrac01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'myrac01'
CRS-2676: Start of 'ora.cssd' on 'myrac01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'myrac01'
CRS-2672: Attempting to start 'ora.ctssd' on 'myrac01'
CRS-2676: Start of 'ora.ctssd' on 'myrac01' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'myrac01' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'myrac01'
ORA-12560: TNS:protocol adapter error
CRS-2681: Clean of 'ora.asm' on 'myrac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'myrac01'
CRS-2676: Start of 'ora.asm' on 'myrac01' succeeded


Now we can check the details of vote files and their availability

C:\>crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   0f8e22cd31fc4f50bffcc6ed4d4dd47c (\\.\ORCLDISKCRS0) [CRS]
 2. OFFLINE  3b079a1073454f5bbffaa7f8e28ce6d6 () []
Located 2 voting disk(s).


As can be seen above, I had 2 vote files out of which one is OFFLINE because the ASM disk that was part of CRS diskgroup and contained a copy of vote file, got corrupted.

Diskgroup CRS was still offline because one of its disk was missing, so I brought it online using FORCE option explained here

Logging into the ASM instance and checking of disks returned following result.

SQL> select name,path,state,header_status from v$asm_disk where group_number=0;
 
NAME                           PATH                                 STATE    HEADER_STATU
------------------------------ ---------------------------------------- -------- --------
CRS0                       \\.\ORCLDISKCRS0                         NORMAL   MEMBER     
CRS1                       \\.\ORCLDISKCRS1                         NORMAL   MEMBER     
CRS2                                                                                  NORMAL   UNKNOWN    


I have seen that even if we have normal redundancy, Oracle creates as many copies of voting file as the number of disks in the CRS diskgroup. In this case, as soon as diskgroup CRS was mounted, a new copy of voting file was seen on CRS2 disk.

C:\>crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   0f8e22cd31fc4f50bffcc6ed4d4dd47c (\\.\ORCLDISKCRS0) [CRS]
 2. ONLINE   155153d4d1514fecbf71099b422a14dc (\\.\ORCLDISKCRS1) [CRS]
Located 2 voting disk(s).

 After this, I dropped the disk CRS2 and added it back, I could see the 3rd copy of voting disk.

C:\>crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   0f8e22cd31fc4f50bffcc6ed4d4dd47c (\\.\ORCLDISKCRS0) [CRS]
 2. ONLINE   155153d4d1514fecbf71099b422a14dc (\\.\ORCLDISKCRS1) [CRS]
 3. ONLINE   72d9d0e705f14f40bfa902341c1760d8 (\\.\ORCLDISKCRS2) [CRS]

 If you want to move voting file to a new diskgroup with different redundancy or just simply on another diskgroup, you can do this by using following command.

C:\> crsctl replace votedisk <+diskgroup>


No comments:

Post a Comment

Popular Posts - All Times