shutdown timeout

I do not like shutdown abort (see this post). I always use shutdown immediate and it always work… well almost always.

Today I discovered a 9iR2 new feature : shutdown timeout !
Shutdown Timeout
If all events blocking the shutdown do not occur within one hour, the shutdown operation aborts with the following message: ORA-01013: user requested cancel of current operation.

Ok, if and only if I am getting this ORA-1013, I shutdown abort, startup, shutdown immediate.

It is very unusual that a shutdown immediate does not terminate in 1 hour, and hard to reproduce. For this test case, I am doing a shutdown normal

1) make sure you have at least one other session open
2) shutdown normal
3) wait about 60 minutes (defined in _shutdown_completion_timeout_mins, not a supported parameter to change)

SQL> shutdown normal
ORA-01013: user requested cancel of current operation
SQL>

Now we received a ORA-1013 (but I did not use CTRL-C). The instance is now half-stopped, most sessions and background processes like MMON, CJQ, SMCO are already dead and it is probably a good idea to restart it properly. Maybe with startup force and shutdown immediate.

SQL> startup force
ORACLE instance started.

Total System Global Area 1069252608 bytes
Fixed Size 2166160 bytes
Variable Size 658510448 bytes
Database Buffers 402653184 bytes
Redo Buffers 5922816 bytes
Database mounted.
Database opened.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

This is all documented :
After ORA-01013 occurs, you must consider the instance to be in an unpredictable state…If subsequent SHUTDOWN commands continue to fail, you must submit a SHUTDOWN ABORT

I am extremly happy to have discovered this, because it will still allow a shutdown abort after one hour of shutdown immediate, which is an extremly rare case, but it is an enhancement for my backup scripts.

9 thoughts on “shutdown timeout

  1. Eric Grancher

    Laurent,
    (if needed to shutdown abort) after shutdown abort, I normally do startup *restrict* (** for emphasis) then shutdown immediate.
    best
    Eric

  2. DKB

    In cases where the database MUST be shutdown at a specific time, there is no alternative to a “shutdown abort”, startup, shutdown immediate.
    Can I set _shutdown_completion_timeout_mins to a value of one?

    DKB

  3. Laurent Schneider Post author

    putting a timeout of 1 would not be a good idea, not only it is not supported to set underscore parameters, but the chance to have an interupted shutdown will be to high and an interupted shutdown leave your db in a bad state

  4. Ludovico Caldara

    Hi Laurent,
    I know it looks a little complicated but that’s the way I did it for years…
    Lokk at the code to see how it works and feel free to take some ideas.
    Bests

    Ludovico

    #!/bin/bash

    #################################################################################
    # – Enables shutdown tracing (events 10046 and 10400)
    # – Tries a shutdown IMMEDIATE in background
    # – If after X minutes the db is down, it exits 0
    # – If after X minuti the smon is still there:
    # – select * from x$ktuxe where ktuxecfl = ‘DEAD’;
    # – optionally kills all LOCAL=NO processes and wait 10 more seconds
    # – shutdown abort
    # – startup restrict
    # – shutdown immediate
    # – At the end, it checks again if smon is really dead.
    # – exit 0 if the db is really down

    #################################################################################

    #########
    #MOFIFIABLE VARS
    # log can be stdout to be catch by the father
    LOG=$HOME/stop.log
    #wait 300 seconds for immediate, then abort
    # sleep seconds is the interval between polls
    sleep_seconds=10
    # sleep loops is the number of polls before the abort
    sleep_loops=30
    #########

    ORACLE_SID=$1
    ORACLE_HOME=$2
    export ORACLE_SID ORACLE_HOME

    shutdown_immediate () {
    #tries shutdown immediate.
    # launched in background while the “MAIN” checks for effective shutdown
    $ORACLE_HOME/bin/sqlplus /nolog <>$LOG
    connect / as sysdba
    alter system checkpoint;
    alter system flush shared_pool;
    alter system checkpoint;
    alter session set events ‘10046 trace name context forever,level 12’;
    alter session set events ‘10400 trace name context forever, level 1’;
    shutdown immediate;
    exit
    EOF
    }

    shutdown_abort () {
    # after the “grace period” the MAIN triggers this function
    # it does shutdown abort + start/stop for clean shutdown
    $ORACLE_HOME/bin/sqlplus /nolog <>$LOG
    connect / as sysdba
    shutdown abort;
    startup restrict;
    shutdown immediate;
    exit
    EOF
    }

    check_smon_down () {
    ## if smon is up returns 0 else 1
    ps -eaf | grep -v grep | grep ora_smon_${ORACLE_SID}\$ >>$LOG
    }

    echo `date +%Y-%m-%d_%H:%M`” — Starting Shutdown instance $ORACLE_SID” >>$LOG
    shutdown_immediate &
    sh_i_pid=$!
    #sh_i_pid set to the PID of the shutdown immediate. (not used but nice to have)

    #looping sleep_loops times, waiting sleep_seconds each iteration.
    # if the instance shuts down in the meantime, exit 0. if at the end it’s still alive , abort
    while [ ${sleep_loops} -gt 0 ] ; do
    sleep ${sleep_seconds}
    check_smon_down
    if [ $? -eq 0 ] ; then
    echo `date +%Y-%m-%d_%H:%M`” — ($sleep_loops) Instance $ORACLE_SID active.” >>$LOG
    else
    echo `date +%Y-%m-%d_%H:%M`” — Instabce $ORACLE_SID shutdown.” >>$LOG
    exit 0
    fi
    sleep_loops=$(($sleep_loops-1))
    done

    echo `date +%Y-%m-%d_%H:%M`” — WARNING: Shutdown of $ORACLE_SID in HANG. Trying kill sessions LOCAL=NO.” >>$LOG
    ps -eaf | grep oracle${ORACLE_SID} | grep -v grep | grep LOCAL=NO >>$LOG
    ps -eaf | grep oracle${ORACLE_SID} | grep -v grep | grep LOCAL=NO | awk ‘{print $2}’ | xargs kill -9
    sleep 10
    check_smon_down
    if [ $? -eq 0 ] ; then
    echo `date +%Y-%m-%d_%H:%M`” — WARNING: Shutdown of $ORACLE_SID in HANG. Forcing shutdown abort.” >>$LOG
    shutdown_abort
    fi

    sleep 2
    check_smon_down
    if [ $? -eq 0 ] ; then
    ## wtf!!! 🙂
    echo `date +%Y-%m-%d_%H:%M`” — ERROR: Instance $ORACLE_SID still alive” >>$LOG
    exit 1
    else
    echo `date +%Y-%m-%d_%H:%M`” — Instance $ORACLE_SID shutdown.” >>$LOG
    exit 0
    fi

Comments are closed.