shell and list of files

How do you loop thru a list of files?

For instance you want to archive than delete all pdf documents in the current directory :

Bad practice :


tar cvf f.tar *.pdf
rm *.pdf

There are multiple issue with the command above

1) new files could come during the tar, so the rm will delete files that have not been archived


filelist=$(ls *.pdf)
tar cvf f.tar $filelist
rm $filelist

2) if there is no file, tar and rm will return an error


filelist=$(ls|grep '\.pdf')
if [ -n "$filelist" ]
then
  tar cvf f.tar $filelist
  rm $filelist
fi

3) this will not work for long list (above 100k documents)


filelist=/tmp/filelist.$(date "+%Y%m%d%H%M%S").$$.$RANDOM
ls|grep '\.pdf' > $filelist
if [ -s "$filelist" ]
then
  tar cvfL f.tar $filelist
  for f in $(<filelist)
  do
    rm $f
  done
fi

As you see, this require special handling. tar for instance use the -L option to accept a list of files, rm could delete files one by one (or in bunches with xargs -L).

This 100’000 limit (the limit may vary for your shell/os) is something that often gets forgotten.

Typical error that could occur are


ksh: no space
bash: Arg list too long

7 thoughts on “shell and list of files”

  1. good evening,

    This could be one way…

    F=/tmp/x
    find . -name \*.pdf > $F
    tar cvfz /tmp/t.tgz –files-from $F
    xargs –arg-file=$F –max-lines=1 rm

    cheers
    Eric

  2. please, before calling rm, check the outcome of the tar !
    What, if the tar fails ? (for example: disk full) – I assume, you don’t want to call rm then, ok ?
    So, don’t forget to test the exit status of the tar – command !

    tar cvf … && rm …

    is the idea

  3. “…
    There are multiple issue with the command above

    1) new files could come during the tar, so the rm will delete files that have not been archived

    or:

    1b) a file already added to the tar-archive could have been overwritten by another file with the same name while tar is still running, so the rm will delete files that have not been archived

    file-lists don’t protect you from that

Leave a Reply

Your email address will not be published.


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>