Monitoring the age of the last backup with OEM

My customer wants to receive alerts if a database has not been backed up (either unsuccessful backup or no backup) for ages.

As the customer have Oracle Enterprise Manager Grid Control 10gR2, I started using User Defined Metric (UDM) yesterday. Thanks Troy for his comment at OEM Generic Service

I would do 2 checks, backup of datafiles and backup of redo logs.

Here the steps :
– Open a database target (LSC01)
– Click on User-Defined-Metrics
– Create
– Metric Name = Age of datafile backup
– Type = Number
– Output = Single Value
– SQL Query : the age in hour since the oldest checkpoint time of the newest backup
select (sysdate-min(t))*24 from
(
select max(b.CHECKPOINT_TIME) t
from v$backup_datafile b, v$tablespace ts, v$datafile f
where INCLUDED_IN_DATABASE_BACKUP='YES'
and f.file#=b.file#
and f.ts#=ts.ts#
group by f.file#
)
– Credentials : dbsnmp/*****
– Threshold Operator > Warning 24 Critical 48
– Repeat every 1 hour
– OK

Same for redologs, with a name of Age of redolog backup query of
select (sysdate-max(NEXT_TIME))*24 from v$BACKUP_REDOLOG

I am not going to do this for each instance, so I will create a monitoring template
– Setup
– Monitoring Templates
– Create
– Select the target (LSC01)
– Name = Age of last backup
– Metric Threshold : remove all metrics except Age of datafile backup and Age of redolog backup
– Policies : remove all policies from template
– OK

Start applying your new metric to your databases.
– Apply
– Add
– Select all your databases
– Continue
– dbsnmp/*** (if you have the same database password for dbsnmp on all databases, it is easier)

It is now possible to define alerts.
– Preferences
– Notification Rules
– Create
– Apply to specific targets : Add you productive databases group
– Deselect Availability Down
– Metric: Add : Show all: Check User defined metric : Select : Age of datafile backup , Age of redolog backup
– Severity : Critical and Clear
– Policy : None
– Method : Email

After a while, you can monitor the metric with a historical graphic
User Defined Metric graph in OEM

21 thoughts on “Monitoring the age of the last backup with OEM

  1. daryl

    Were you able to actually place the UDM into the templates? _ i had a tar open on this as it didnt work for me ..
    derwin at rim dot com

  2. daryl

    OEM uses the following for their metric
    select
    to_char(max(completion_time) ,’DD-MON-YY HH24:MI:SS’) lastbackup,
    round(sysdate – max(completion_time),0) numdays
    from (SELECT completion_time
    FROM v$backup_set
    UNION
    SELECT completion_time
    FROM v$datafile_copy
    union
    select sysdate-365 from dual
    )

  3. laurentschneider Post author

    daryl,
    thanks a lot for the update. Yes, I did integrate is on my template.

    My targets are 9.2.0.7 / AIX5L and my OEM is 10.2.0.2 / Solaris. What is not supported in 10gR2 is multicolumn UDM in monitoring template.

    About OEM metric, you makes me worried. You mean that metric already existed … I will have a look !

  4. daryl

    Ah multicolumn – that was my problem – atleast its in the template now.
    I was refering to the value that gets displayed on the database page. “Last Backup”

  5. daryl

    And now the apply is working too .. Thanks Laurent. I had all but given up on mass deploying UDM since I couldnt get the templates to work. I wrestled with support for a month trying get them to understand the problem let alone solve it – and you have have solved it for me in 5 minutes — “UDM cant be multicolumn for use in templates”.

  6. Marco Gralike

    Great post, just what i was looking for. Do you know if there is a decent manual to created customized reporting (with company logo’s, different headers etc…)?

  7. daryl

    (dang got the spam answer wrong and lost my nice reply)
    Let me just say that my tar was from before the date on the above tar. The rep insisted that there was no problem. After a month of me “fighting” with him. I gave in and gave up.
    Please post more OEM UDM – I have one for example, that alerts when a user approaches max connections based on their profile restrictions. This catches thos run away java apps that dont disconnect.

  8. laurentschneider Post author

    daryl: for your example, you may want to try something like :
    select max(c) from (select username,(select count(*) from v$session where username=dba_users.username)/decode(limit,’0′,-1,’UNLIMITED’,null,’DEFAULT’,(select decode(limit,’0′,-1,’UNLIMITED’,null,to_number(limit)) from dba_profiles where profile=’DEFAULT’ and resource_name=’SESSIONS_PER_USER’),to_number(limit)) c from dba_profiles join dba_users using(profile) where resource_name=’SESSIONS_PER_USER’ );

    and set a metric at 0.8

    HTH

  9. Marco Gralike

    @Laurent

    ….of course, I want always more, then that what is obvious 😉

    I have seen your suggested solution, but what I want is to completely change header and footer (for instance it, the report, always generates a “created by OEM” remarks and stuff).

    I want to generate a OEM report which can be used as a base for a SLA reporting solution (and/or if possible) imbedded in a portal…

  10. Roger Barton

    Hi Laurent!

    I stumbled across this when trying to set up something to report on my last backups… and thought “Wow … this is exactly what I needed!” Thanks.

    So I have set it up in OEM, and yes, it is showing me that my backups are being performed, and is warning me when a backup has not been performed within the last 24hours, or a serious alert when last backup is not within the last 48hours …. so far so good!

    My issue, now is that I discovered that my last backup was not in fact successful. OEM reports that the backup has failed, however, according to my alerting that I have set up with your guidence on this blog entry, I have a backup within the last 24 hours, so no alert.

    Thus, I’m looking to further extend on the alerting you have provided in this blog. Do you know a way of modifying what you have in your blog entry to alert me that I don’t have a SUCCESSFUL backup .
    ie. aim – I want to be alerted if I don’t have a SUCCESSFUL backup in the last say 24 hours – no alert – good news! If Alert – needs investigating to determine (a) if the backup process infact ran, and then (b) if it ran, why it was not successful.

    Regards,
    Roger Barton

  11. laurentschneider Post author

    Hi Roger,

    how do you do your backup? what is an unsuccessful backup?

    – if your server is down, is it an unsuccessful backup?
    – if your listener is down, is it an unsuccessful backup?
    – if the tape drive crashes after all datafiles have been backup, but not the archivelog, is it an unsuccessful backup?

    well, you could try the following approach :

    – when you do a backup, write a log file
    – in oem, define a generic service which is a shell script (if you are using linux/unix) and you grep for errors in your logfile
    – you output 0 for a successful backup and 1 for an unsuccessful backup (or the opposite)

    Have a nice day,
    Laurent

  12. Kent Faulkner

    I want to deploy a centralized UDM rather than one on every database (we have 700+). So I need a UDM against the central RMAN catalog that returns a list of registered databases and how long it has been since the last successful full backup. When threshold exceeds “X” days, I will raise an alert via Notification Rule to our Remedy ticketing system. I have the current UDM SQL in place, but it does not distinguish between an archive log backup vs. control file backup. vs. a full backup. Can you help?

    select db_name,
    latency
    from ( select b.db_name,
    round(sysdate – max(a.completion_time)) latency
    from rman.bp a,
    rman.dbinc b
    where a.db_key = b.db_key
    and b.db_name is not null
    /* Exclude list of databases */
    and b.db_name not in
    (SELECT db_name
    from BKP_VERIFY_DB_EXCEPTION_VW
    where exception_type like ‘Rman%’)
    group by b.db_name )

  13. Donna

    Hello:
    Your blog is extremely helpful! In my environment, Cloud Control 12c. We take a full backup once a week. My manager wants an alert raised (I believe this will be done through a metric extension in Cloud Control 12c) if a backup of the datafiles (or in other words, a full backup) is not done after 7 days (not before or on 7 days; but on or after the 8th day).

    In addition, I need a metric extension alert to find out if archivelogs have not been backed up that day.

    Can you PLEASE help me with the queries on these? I need to have these metric extensions in place as soon as possible.

    I greatly appreciate any help you can provide on the queries that I can put in the metric extensions’ steps, which is sort of like the user defined metric.

    Many thanks!

  14. Laurent Schneider Post author

    Hi Donna

    The query in my blog post does monitor the number of hours since the last datafile backup. Just create an alert when it reaches 7*24

    The query for redo logs is also in my blog post.

    Cheers
    Laurent

Comments are closed.