Check if it a program is already running in Unix

There is more than one way to do it, the safe is probably to check if /home/lsc/OH_YES_I_AM_RUNNING exists and believe it. This is called the file.PID method and is widely used (Apache used to use it since a long long time). It needs file. It needs cleanup if you reboot your server in the middle of something (and surely you do not want to delete old pid files yourself)

Ok, often you see this :


ps -ef | grep program

There you list all processes and check the lines that contain program. So some does a vi program or anything worse (emacs?), you will get more rows than needed.

Maybe it is fine to run program with different arguments, this must be decided.

Well, take a simple test case :
x1.sh and x2.sh :

#!/bin/ksh
while :
do
  date  > /dev/null
done

let’s try to use ps


$ nohup ./x1.sh &
$ nohup ./x2.sh &
$ jobs
[2] +  Running                 nohup ./x2.sh &
[1] -  Running                 nohup ./x1.sh &
$ ps -ef | egrep 'x[12]'
  u22  9240796  6226164  30 14:56:52  pts/2  0:00 /bin/ksh ./x2.sh
  u22 20840608  6226164  31 14:56:48  pts/2  0:01 /bin/ksh ./x1.sh

So fine so good, I see I have one instance of each program.

Let’s try to see if the results are consistent over time :

 $ n=9999;while :
  do 
    ps -ef | 
      egrep 'x[12].sh'>f
    if [ $(wc -l <f) != $n ]
    then 
      n=$(wc -l <f)
      echo
      date
      cat f
      echo "==> $n"
    fi
  done

Fri Oct 28 15:01:01 CEST 2011
  u22  9240796  6226164  32 14:56:52  pts/2  0:14 /bin/ksh ./x2.sh
  u22 20840608  6226164  28 14:56:48  pts/2  0:14 /bin/ksh ./x1.sh
==>        2

Fri Oct 28 15:01:08 CEST 2011
  u22  9240796  6226164  50 14:56:52  pts/2  0:14 /bin/ksh ./x2.sh
==>        1

Fri Oct 28 15:01:09 CEST 2011
  u22  9240796  6226164  52 14:56:52  pts/2  0:14 /bin/ksh ./x2.sh
  u22 20840608  6226164  53 14:56:48  pts/2  0:15 /bin/ksh ./x1.sh
==>        2

Fri Oct 28 15:01:17 CEST 2011
  u22  9240796  6226164  40 14:56:52  pts/2  0:15 /bin/ksh ./x2.sh
  u22 10944520  9240796   0 15:01:17  pts/2  0:00 /bin/ksh ./x2.sh
  u22 20840608  6226164  31 14:56:48  pts/2  0:16 /bin/ksh ./x1.sh
==>        3

the fact that a subshell (pid 10944520 ) of x2 appear is not a problem for me. I have much more of a problem at 15:01:08 where x1 disappeared !

Conclusion : you cannot trust ps

shell and list of files

How do you loop thru a list of files?

For instance you want to archive than delete all pdf documents in the current directory :

Bad practice :


tar cvf f.tar *.pdf
rm *.pdf

There are multiple issue with the command above

1) new files could come during the tar, so the rm will delete files that have not been archived


filelist=$(ls *.pdf)
tar cvf f.tar $filelist
rm $filelist

2) if there is no file, tar and rm will return an error


filelist=$(ls|grep '\.pdf')
if [ -n "$filelist" ]
then
  tar cvf f.tar $filelist
  rm $filelist
fi

3) this will not work for long list (above 100k documents)


filelist=/tmp/filelist.$(date "+%Y%m%d%H%M%S").$$.$RANDOM
ls|grep '\.pdf' > $filelist
if [ -s "$filelist" ]
then
  tar cvfL f.tar $filelist
  for f in $(<filelist)
  do
    rm $f
  done
fi

As you see, this require special handling. tar for instance use the -L option to accept a list of files, rm could delete files one by one (or in bunches with xargs -L).

This 100’000 limit (the limit may vary for your shell/os) is something that often gets forgotten.

Typical error that could occur are


ksh: no space
bash: Arg list too long

pstree in AIX

For those who do not want to download some linuxlike freeware on your aix box, use ps -T :)


ps -fT 2412672
     UID     PID    PPID   C    STIME    TTY  TIME CMD
  oracle 2412672       1   0   Sep 05      -  0:00 /u01/app/oracle/product/OAS
  oracle  630956 2412672   0   Sep 05      -  6:11     \--/u01/app/oracle/prod
  oracle 1347672  630956   0   Sep 05      - 15:32        |\--/u01/app/oracle/
  oracle 1437836  630956   0   Sep 05      -  1:02        |\--/u01/app/oracle/
  oracle  880820 1437836   0   Sep 05      -  0:32        |   |\--/u01/app/ora
  oracle 1036532 1437836   0   Sep 05      -  0:00        |   |\--/u01/app/ora
  oracle 1134796 1437836   0   Sep 05      -  0:01        |   |\--/u01/app/ora
  oracle 1343712 1437836   0   Sep 05      -  0:33        |   |\--/u01/app/ora
  oracle 1368166 1437836   0   Sep 05      -  1:11        |   |\--/u01/app/ora
  oracle 1384684 1437836   0   Sep 05      -  0:33        |   |\--/u01/app/ora
  oracle 1392862 1437836   0   Sep 05      -  0:32        |   |\--/u01/app/ora
  oracle 1396898 1437836   0   Sep 05      -  0:33        |   |\--/u01/app/ora
  oracle 1482978 1437836   0   Sep 05      -  0:32        |   |\--/u01/app/ora
  oracle 1527890 1437836   0   Sep 05      -  0:00        |   |\--/u01/app/ora
  oracle 1781798 1437836   0   Sep 05      -  0:32        |   |\--/u01/app/ora
  oracle 2195474 1437836   0   Sep 26      -  0:13        |    \--/u01/app/ora
  oracle 1626296  630956   0   Sep 05      - 13:49         \--/u01/app/oracle/

Large zip on Windows

I have never been a Microsoft fanatic nor an anti-microsoft terrorist, but today I could not believe that large compressed folders got corrupted in Windows !

I have send a relatively small zip file (5gb, peanuts) from AIX to Windows per sftp and in Windows Explorer, some files in the compressed folder (read zip) were just pointing to the wrong content.

I had some issues with large zip files on unix, but this was last century! Howcome could a modern filesystem/operating system have such issues?

I have found a few bugs on support.microsoft.com.

Ex: Compressed folder becomes corrupted when larger than 2 gigabytes
Workaround : make sure that you limit the size of a compressed folder to 2 GB or less

Amazing!

_optimizer_random_plan parameter

I was trying to find a workaround for a bug in 11.2.0.2

SELECT *  FROM 
  (SELECT 2 B FROM DUAL WHERE DUMMY = 'Y'), 
  (SELECT 3 C FROM DUAL WHERE DUMMY LIKE '%') 
  WHERE C = B(+);

         B          C
---------- ----------
         2          3

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |     4   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER|      |     1 |     4 |     4   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

As dummy is not Y, B could not be 2.

Ok, I tried :


alter session set "_optimizer_random_plan"=1;

SELECT *  FROM 
  (SELECT 2 B FROM DUAL WHERE DUMMY = 'Y'), 
  (SELECT 3 C FROM DUAL WHERE DUMMY LIKE '%') 
  WHERE C = B(+);

         B          C
---------- ----------
                    3

Execution Plan
----------------------------------------------------------
Plan hash value: 837538736

-------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1146 |  5730 |    27G|
|   1 |  MERGE JOIN OUTER    |      |   603K|  2946K|    27G|
|*  2 |   TABLE ACCESS FULL  | DUAL |   392K|   767K|   136K|
|   3 |   VIEW               |      |     2 |     6 | 69180 |
|*  4 |    FILTER            |      |       |       |       |
|*  5 |     TABLE ACCESS FULL| DUAL |   123K|   240K| 69180 |
-------------------------------------------------------------

Cool, I got correct results! the fact that the cost jumped from 4 to 27 Billions is just a minor annoyance I suppose :twisted:

I also tried


alter session set "_optimizer_random_plan"=0; -- default

alter session  set "_complex_view_merging"=false;

SELECT *  FROM 
  (SELECT 2 B FROM DUAL WHERE DUMMY = 'Y'), 
  (SELECT 3 C FROM DUAL WHERE DUMMY LIKE '%') 
  WHERE C = B(+);

         B          C
---------- ----------
                    3

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |     5 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER  |      |     1 |     5 |     2   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | DUAL |     1 |     2 |     2   (0)| 00:00:01 |
|   3 |   VIEW               |      |     1 |     3 |            |          |
|*  4 |    FILTER            |      |       |       |            |          |
|*  5 |     TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

The cost is now 5 and instead of 4 and the results are correct

The first thing I did is opening a SR, now I am impatiently waiting for Oracle Support guidance…