Monitoring the age of the last backup with OEM

My customer wants to receive alerts if a database has not been backed up (either unsuccessful backup or no backup) for ages.

As the customer have Oracle Enterprise Manager Grid Control 10gR2, I started using User Defined Metric (UDM) yesterday. Thanks Troy for his comment at OEM Generic Service

I would do 2 checks, backup of datafiles and backup of redo logs.

Here the steps :
- Open a database target (LSC01)
- Click on User-Defined-Metrics
- Create
- Metric Name = Age of datafile backup
- Type = Number
- Output = Single Value
- SQL Query : the age in hour since the oldest checkpoint time of the newest backup
select (sysdate-min(t))*24 from
(
  select max(b.CHECKPOINT_TIME) t
  from v$backup_datafile b, v$tablespace ts, v$datafile f
  where INCLUDED_IN_DATABASE_BACKUP='YES'
  and f.file#=b.file#
  and f.ts#=ts.ts#
  group by f.file#
)
- Credentials : dbsnmp/*****
- Threshold Operator > Warning 24 Critical 48
- Repeat every 1 hour
- OK

Same for redologs, with a name of Age of redolog backup query of
select (sysdate-max(NEXT_TIME))*24 from v$BACKUP_REDOLOG

I am not going to do this for each instance, so I will create a monitoring template
- Setup
- Monitoring Templates
- Create
- Select the target (LSC01)
- Name = Age of last backup
- Metric Threshold : remove all metrics except Age of datafile backup and Age of redolog backup
- Policies : remove all policies from template
- OK

Start applying your new metric to your databases.
- Apply
- Add
- Select all your databases
- Continue
- dbsnmp/*** (if you have the same database password for dbsnmp on all databases, it is easier)

It is now possible to define alerts.
- Preferences
- Notification Rules
- Create
- Apply to specific targets : Add you productive databases group
- Deselect Availability Down
- Metric: Add : Show all: Check User defined metric : Select : Age of datafile backup , Age of redolog backup
- Severity : Critical and Clear
- Policy : None
- Method : Email

After a while, you can monitor the metric with a historical graphic
User Defined Metric graph in OEM

19 thoughts on “Monitoring the age of the last backup with OEM”

  1. Were you able to actually place the UDM into the templates? _ i had a tar open on this as it didnt work for me ..
    derwin at rim dot com

  2. OEM uses the following for their metric
    select
    to_char(max(completion_time) ,’DD-MON-YY HH24:MI:SS’) lastbackup,
    round(sysdate – max(completion_time),0) numdays
    from (SELECT completion_time
    FROM v$backup_set
    UNION
    SELECT completion_time
    FROM v$datafile_copy
    union
    select sysdate-365 from dual
    )

  3. daryl,
    thanks a lot for the update. Yes, I did integrate is on my template.

    My targets are 9.2.0.7 / AIX5L and my OEM is 10.2.0.2 / Solaris. What is not supported in 10gR2 is multicolumn UDM in monitoring template.

    About OEM metric, you makes me worried. You mean that metric already existed … I will have a look !

  4. Ah multicolumn – that was my problem – atleast its in the template now.
    I was refering to the value that gets displayed on the database page. “Last Backup”

  5. And now the apply is working too .. Thanks Laurent. I had all but given up on mass deploying UDM since I couldnt get the templates to work. I wrestled with support for a month trying get them to understand the problem let alone solve it – and you have have solved it for me in 5 minutes — “UDM cant be multicolumn for use in templates”.

  6. Great post, just what i was looking for. Do you know if there is a decent manual to created customized reporting (with company logo’s, different headers etc…)?

  7. (dang got the spam answer wrong and lost my nice reply)
    Let me just say that my tar was from before the date on the above tar. The rep insisted that there was no problem. After a month of me “fighting” with him. I gave in and gave up.
    Please post more OEM UDM – I have one for example, that alerts when a user approaches max connections based on their profile restrictions. This catches thos run away java apps that dont disconnect.

  8. daryl: for your example, you may want to try something like :
    select max(c) from (select username,(select count(*) from v$session where username=dba_users.username)/decode(limit,’0′,-1,’UNLIMITED’,null,’DEFAULT’,(select decode(limit,’0′,-1,’UNLIMITED’,null,to_number(limit)) from dba_profiles where profile=’DEFAULT’ and resource_name=’SESSIONS_PER_USER’),to_number(limit)) c from dba_profiles join dba_users using(profile) where resource_name=’SESSIONS_PER_USER’ );

    and set a metric at 0.8

    HTH

  9. @Laurent

    ….of course, I want always more, then that what is obvious ;-)

    I have seen your suggested solution, but what I want is to completely change header and footer (for instance it, the report, always generates a “created by OEM” remarks and stuff).

    I want to generate a OEM report which can be used as a base for a SLA reporting solution (and/or if possible) imbedded in a portal…

  10. Hi Laurent!

    I stumbled across this when trying to set up something to report on my last backups… and thought “Wow … this is exactly what I needed!” Thanks.

    So I have set it up in OEM, and yes, it is showing me that my backups are being performed, and is warning me when a backup has not been performed within the last 24hours, or a serious alert when last backup is not within the last 48hours …. so far so good!

    My issue, now is that I discovered that my last backup was not in fact successful. OEM reports that the backup has failed, however, according to my alerting that I have set up with your guidence on this blog entry, I have a backup within the last 24 hours, so no alert.

    Thus, I’m looking to further extend on the alerting you have provided in this blog. Do you know a way of modifying what you have in your blog entry to alert me that I don’t have a SUCCESSFUL backup .
    ie. aim – I want to be alerted if I don’t have a SUCCESSFUL backup in the last say 24 hours – no alert – good news! If Alert – needs investigating to determine (a) if the backup process infact ran, and then (b) if it ran, why it was not successful.

    Regards,
    Roger Barton

  11. Hi Roger,

    how do you do your backup? what is an unsuccessful backup?

    - if your server is down, is it an unsuccessful backup?
    - if your listener is down, is it an unsuccessful backup?
    - if the tape drive crashes after all datafiles have been backup, but not the archivelog, is it an unsuccessful backup?

    well, you could try the following approach :

    - when you do a backup, write a log file
    - in oem, define a generic service which is a shell script (if you are using linux/unix) and you grep for errors in your logfile
    - you output 0 for a successful backup and 1 for an unsuccessful backup (or the opposite)

    Have a nice day,
    Laurent

  12. I want to deploy a centralized UDM rather than one on every database (we have 700+). So I need a UDM against the central RMAN catalog that returns a list of registered databases and how long it has been since the last successful full backup. When threshold exceeds “X” days, I will raise an alert via Notification Rule to our Remedy ticketing system. I have the current UDM SQL in place, but it does not distinguish between an archive log backup vs. control file backup. vs. a full backup. Can you help?

    select db_name,
    latency
    from ( select b.db_name,
    round(sysdate – max(a.completion_time)) latency
    from rman.bp a,
    rman.dbinc b
    where a.db_key = b.db_key
    and b.db_name is not null
    /* Exclude list of databases */
    and b.db_name not in
    (SELECT db_name
    from BKP_VERIFY_DB_EXCEPTION_VW
    where exception_type like ‘Rman%’)
    group by b.db_name )

Leave a Reply

Your email address will not be published.


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>