shell and list of files

How do you loop thru a list of files?

For instance you want to archive than delete all pdf documents in the current directory :

Bad practice :


tar cvf f.tar *.pdf
rm *.pdf

There are multiple issue with the command above

1) new files could come during the tar, so the rm will delete files that have not been archived


filelist=$(ls *.pdf)
tar cvf f.tar $filelist
rm $filelist

2) if there is no file, tar and rm will return an error


filelist=$(ls|grep '\.pdf')
if [ -n "$filelist" ]
then
  tar cvf f.tar $filelist
  rm $filelist
fi

3) this will not work for long list (above 100k documents)


filelist=/tmp/filelist.$(date "+%Y%m%d%H%M%S").$$.$RANDOM
ls|grep '\.pdf' > $filelist
if [ -s "$filelist" ]
then
  tar cvfL f.tar $filelist
  for f in $(<filelist)
  do
    rm $f
  done
fi

As you see, this require special handling. tar for instance use the -L option to accept a list of files, rm could delete files one by one (or in bunches with xargs -L).

This 100’000 limit (the limit may vary for your shell/os) is something that often gets forgotten.

Typical error that could occur are


ksh: no space
bash: Arg list too long

7 Comments

  • good evening,

    This could be one way…

    F=/tmp/x
    find . -name \*.pdf > $F
    tar cvfz /tmp/t.tgz –files-from $F
    xargs –arg-file=$F –max-lines=1 rm

    cheers
    Eric

  • please, before calling rm, check the outcome of the tar !
    What, if the tar fails ? (for example: disk full) – I assume, you don’t want to call rm then, ok ?
    So, don’t forget to test the exit status of the tar – command !

    tar cvf … && rm …

    is the idea

  • “…
    There are multiple issue with the command above

    1) new files could come during the tar, so the rm will delete files that have not been archived

    or:

    1b) a file already added to the tar-archive could have been overwritten by another file with the same name while tar is still running, so the rm will delete files that have not been archived

    file-lists don’t protect you from that

Leave a Reply

Your email address will not be published.

*