My customer wants to receive alerts if a database has not been backed up (either unsuccessful backup or no backup) for ages.
As the customer have Oracle Enterprise Manager Grid Control 10gR2, I started using User Defined Metric (UDM) yesterday. Thanks Troy for his comment at OEM Generic Service
I would do 2 checks, backup of datafiles and backup of redo logs.
Here the steps :
– Open a database target (LSC01)
– Click on User-Defined-Metrics
– Create
– Metric Name = Age of datafile backup
– Type = Number
– Output = Single Value
– SQL Query : the age in hour since the oldest checkpoint time of the newest backup
select (sysdate-min(t))*24 from
(
select max(b.CHECKPOINT_TIME) t
from v$backup_datafile b, v$tablespace ts, v$datafile f
where INCLUDED_IN_DATABASE_BACKUP='YES'
and f.file#=b.file#
and f.ts#=ts.ts#
group by f.file#
)
– Credentials : dbsnmp/*****
– Threshold Operator > Warning 24 Critical 48
– Repeat every 1 hour
– OK
Same for redologs, with a name of Age of redolog backup query of
select (sysdate-max(NEXT_TIME))*24 from v$BACKUP_REDOLOG
I am not going to do this for each instance, so I will create a monitoring template
– Setup
– Monitoring Templates
– Create
– Select the target (LSC01)
– Name = Age of last backup
– Metric Threshold : remove all metrics except Age of datafile backup and Age of redolog backup
– Policies : remove all policies from template
– OK
Start applying your new metric to your databases.
– Apply
– Add
– Select all your databases
– Continue
– dbsnmp/*** (if you have the same database password for dbsnmp on all databases, it is easier)
It is now possible to define alerts.
– Preferences
– Notification Rules
– Create
– Apply to specific targets : Add you productive databases group
– Deselect Availability Down
– Metric: Add : Show all: Check User defined metric : Select : Age of datafile backup , Age of redolog backup
– Severity : Critical and Clear
– Policy : None
– Method : Email
After a while, you can monitor the metric with a historical graphic
Were you able to actually place the UDM into the templates? _ i had a tar open on this as it didnt work for me ..
derwin at rim dot com
OEM uses the following for their metric
select
to_char(max(completion_time) ,’DD-MON-YY HH24:MI:SS’) lastbackup,
round(sysdate – max(completion_time),0) numdays
from (SELECT completion_time
FROM v$backup_set
UNION
SELECT completion_time
FROM v$datafile_copy
union
select sysdate-365 from dual
)
daryl,
thanks a lot for the update. Yes, I did integrate is on my template.
My targets are 9.2.0.7 / AIX5L and my OEM is 10.2.0.2 / Solaris. What is not supported in 10gR2 is multicolumn UDM in monitoring template.
About OEM metric, you makes me worried. You mean that metric already existed … I will have a look !
Correction: Goes into the template but does not get applied to the various targets.
Ah multicolumn – that was my problem – atleast its in the template now.
I was refering to the value that gets displayed on the database page. “Last Backup”
And now the apply is working too .. Thanks Laurent. I had all but given up on mass deploying UDM since I couldnt get the templates to work. I wrestled with support for a month trying get them to understand the problem let alone solve it – and you have have solved it for me in 5 minutes — “UDM cant be multicolumn for use in templates”.
Great post, just what i was looking for. Do you know if there is a decent manual to created customized reporting (with company logo’s, different headers etc…)?
I will have a look. I did not try the reports yet.
daryl: check note 399217.1
it will be available in 11g
(dang got the spam answer wrong and lost my nice reply)
Let me just say that my tar was from before the date on the above tar. The rep insisted that there was no problem. After a month of me “fighting” with him. I gave in and gave up.
Please post more OEM UDM – I have one for example, that alerts when a user approaches max connections based on their profile restrictions. This catches thos run away java apps that dont disconnect.
marco: you can add an image (logo/header) to your report with Elements –> Add –> Image Display
Daryl: you can Register to my blog to avoid those math challenge
daryl: for your example, you may want to try something like :
select max(c) from (select username,(select count(*) from v$session where username=dba_users.username)/decode(limit,’0′,-1,’UNLIMITED’,null,’DEFAULT’,(select decode(limit,’0′,-1,’UNLIMITED’,null,to_number(limit)) from dba_profiles where profile=’DEFAULT’ and resource_name=’SESSIONS_PER_USER’),to_number(limit)) c from dba_profiles join dba_users using(profile) where resource_name=’SESSIONS_PER_USER’ );
and set a metric at 0.8
HTH
@Laurent
….of course, I want always more, then that what is obvious 😉
I have seen your suggested solution, but what I want is to completely change header and footer (for instance it, the report, always generates a “created by OEM” remarks and stuff).
I want to generate a OEM report which can be used as a base for a SLA reporting solution (and/or if possible) imbedded in a portal…
Hi Laurent!
I stumbled across this when trying to set up something to report on my last backups… and thought “Wow … this is exactly what I needed!” Thanks.
So I have set it up in OEM, and yes, it is showing me that my backups are being performed, and is warning me when a backup has not been performed within the last 24hours, or a serious alert when last backup is not within the last 48hours …. so far so good!
My issue, now is that I discovered that my last backup was not in fact successful. OEM reports that the backup has failed, however, according to my alerting that I have set up with your guidence on this blog entry, I have a backup within the last 24 hours, so no alert.
Thus, I’m looking to further extend on the alerting you have provided in this blog. Do you know a way of modifying what you have in your blog entry to alert me that I don’t have a SUCCESSFUL backup .
ie. aim – I want to be alerted if I don’t have a SUCCESSFUL backup in the last say 24 hours – no alert – good news! If Alert – needs investigating to determine (a) if the backup process infact ran, and then (b) if it ran, why it was not successful.
Regards,
Roger Barton
Hi Roger,
how do you do your backup? what is an unsuccessful backup?
– if your server is down, is it an unsuccessful backup?
– if your listener is down, is it an unsuccessful backup?
– if the tape drive crashes after all datafiles have been backup, but not the archivelog, is it an unsuccessful backup?
well, you could try the following approach :
– when you do a backup, write a log file
– in oem, define a generic service which is a shell script (if you are using linux/unix) and you grep for errors in your logfile
– you output 0 for a successful backup and 1 for an unsuccessful backup (or the opposite)
Have a nice day,
Laurent
I want to deploy a centralized UDM rather than one on every database (we have 700+). So I need a UDM against the central RMAN catalog that returns a list of registered databases and how long it has been since the last successful full backup. When threshold exceeds “X” days, I will raise an alert via Notification Rule to our Remedy ticketing system. I have the current UDM SQL in place, but it does not distinguish between an archive log backup vs. control file backup. vs. a full backup. Can you help?
select db_name,
latency
from ( select b.db_name,
round(sysdate – max(a.completion_time)) latency
from rman.bp a,
rman.dbinc b
where a.db_key = b.db_key
and b.db_name is not null
/* Exclude list of databases */
and b.db_name not in
(SELECT db_name
from BKP_VERIFY_DB_EXCEPTION_VW
where exception_type like ‘Rman%’)
group by b.db_name )
do not you have those information in the rman views? I have not done this for a while
Hi Laurent,
I needed to do this same thing in cloud control (i found your blog when trying to find out the best way to do it). In the end i did it myself and then blogged about it.
http://dbaharrison.blogspot.de/2013/07/12c-metric-extension-for-cloud-control.html
Hopefully someone else will find that useful
Cheers,
Harry
thanks!
Hello:
Your blog is extremely helpful! In my environment, Cloud Control 12c. We take a full backup once a week. My manager wants an alert raised (I believe this will be done through a metric extension in Cloud Control 12c) if a backup of the datafiles (or in other words, a full backup) is not done after 7 days (not before or on 7 days; but on or after the 8th day).
In addition, I need a metric extension alert to find out if archivelogs have not been backed up that day.
Can you PLEASE help me with the queries on these? I need to have these metric extensions in place as soon as possible.
I greatly appreciate any help you can provide on the queries that I can put in the metric extensions’ steps, which is sort of like the user defined metric.
Many thanks!
Hi Donna
The query in my blog post does monitor the number of hours since the last datafile backup. Just create an alert when it reaches 7*24
The query for redo logs is also in my blog post.
Cheers
Laurent