I twitted yesterday :
laurentsch copying 1TB over ssh sucks. How do you fastcopy in Unix without installing Software and without root privilege? |
I got plenty of expert answers. I have not gone to far in recompile ssh and I did not try plain ftp.
Ok, let’s try first to transfer 10 files of 100M from srv001 to srv002 with scp :
time scp 100M* srv002:
100M1 100% 95MB 4.5MB/s 00:21
100M10 100% 95MB 6.4MB/s 00:15
100M2 100% 95MB 6.0MB/s 00:16
100M3 100% 95MB 4.2MB/s 00:23
100M4 100% 95MB 3.4MB/s 00:28
100M5 100% 95MB 4.2MB/s 00:23
100M6 100% 95MB 6.4MB/s 00:15
100M7 100% 95MB 6.8MB/s 00:14
100M8 100% 95MB 6.8MB/s 00:14
100M9 100% 95MB 6.4MB/s 00:15
real 3m4.50s
user 0m27.07s
sys 0m21.56s
more than 3 minutes for 1G.
I got hints about the buffer size, about SFTP, about the cipher algorythm, and about parallelizing. I did not install new software and I have a pretty old openssh client (3.8). Thanks to all my contributors tmuth, Ik_zelf, TanelPoder, fritshoogland, jcnars, aejes, surachart, and the ones the will answer after the writting of this blog post…
Ok, let’s try a faster algorythm, with sftp (instead of scp), a higher buffer and in parallel
$ cat batch.ksh
echo "progress\nput 100M1" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M2" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M3" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M4" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M5" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M6" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M7" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M8" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M9" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
echo "progress\nput 100M10" | sftp -B 260000 -o Ciphers=arcfour -R 512 srv002&
wait
$ time batch.ksh
real 0m19.07s
user 0m12.08s
sys 0m5.86s
This is a 1000% speed enhancement 🙂
Laurent
Nice post, but in the first example you do the transfers serially, however in the sftp method you run multiple processes in the background.
Did you try running the scp in the background in parallel too (I’d be interested in seeing the timings)?
Thanks
John.
real 0m27.86s
What about passing less data through the network such as gzip on source, gunzip on target (using ssh) ?
I already used such process to reduce dramatically database copy over the network.
Find out more : http://gasparotto.blogspot.com/2009/02/speed-up-copy-over-network.html
Nicolas.
Hi Nicolas,
This may help on slow network, but not on fast network with high latency
According to man ssh,
Compression is desirable on modem lines and other slow
connections, but will only slow down things on fast
networks
$ time (tar cvf - 100*|gzip|ssh srv002 "gzip -d -c|tar xvf -")
real 3m49.61s
user 0m21.63s
sys 0m6.67s
Did you try the FDT that Chan mentioned in the comments?
I though scp was a wrapper around sftp. So what would cause scp to be so much slower?
Good question!
unresponsible programming of scp?
In Aix, scp accepts a “-C” option that instructs ssh to use compressed data transmission. I think if you can use that and parallel, you’ll get similar results.
1. scp is not a wrapper about sftp. They are separate protocols that use SSH as underlying security method. Specifically sftp is also a file system protocol that allows remote directory listing, while scp is not. Generally speaking scp is known to be faster.
2. In this case, I suspect the main trick (other than 10 channels which gave most of the benefit) was the use of -B to give more bandwidth to the transfer. You don’t have this option (at least not easily) in scp.
@chen I started the test this morning with a huge file… but I know you are impatient to get an answer so I tried again with an export dump that is a 1.66Gb in size. SFTP is clearly faster.
With no option at all, the simple possible test, with 2 different dumps files, both of 1.66G
time sftp oracle@srv004ax:/u02/oradata/201001110940/aaa1.dmp
real 1m30.56s
user 0m40.82s
sys 0m10.76s
time scp oracle@srv004ax:/u02/oradata/201001110940/aaa2.dmp .
real 3m42.98s
user 0m45.85s
sys 0m30.66s
$ ls -lrt
-rw-r----- 1 lsc dba 1781233135 May 19 15:10 aaa1.dmp
-rw-r----- 1 lsc dba 1780086038 May 19 15:14 aaa2.dmp
the files are different, the second a few bytes smaller, loaded after, and sftp is faster.
Generally speaking scp is known to be faster
Sometimes
$ time scp localhost:/etc/hosts xxx
hosts 100% 2483 2.4KB/s 00:00
real 0m0.32s
user 0m0.02s
sys 0m0.01s
$ time sftp localhost
Connecting to localhost...
sftp> cd /etc
sftp> get hosts xxx
Fetching /etc/hosts to xxx
/etc/hosts 100% 2483 2.4KB/s 00:00
sftp> bye
real 0m32.18s
user 0m0.01s
sys 0m0.00s
But this is because I am a slow typer 🙂
Hi
I tried your method but i did not get any performance improvement. Below is the timing to transfer 500MB. I am using HPUX 11.23. Any suggestions and guidance to improve file transfer?
real 2m22.40s
user 0m53.66s
sys 1m11.24s
Thank you.
-haris
You should check what kind of bandwith your network offer. If you have a single 10Mb/s or a shared 10Gb/s or a large amounts of dedicated gigabit, it will differs.
Obviously if you have a few shared 10Gb/s virtualized in a large number of interfaces and you open 8 connections at full speed, other users may be affected.
Maybe try to open 2 channels in parallel to start with.
Did you use compression? It seems your “sys” time is much larger than mine… Try to not compress and compare
Hi Laurent
Thanks for reply.
Actually I tested with 1 file but w/out “-o Ciphers=arcfour” because I am not sure what is that for. I checked the ssh_config file, most of the options are commented. I did not use a compression option -C in the testing.
Here is what i did.
$ time echo “Progress\nput /u07/oradata/EXPDP/DP_FILE1.dmp” | sftp -B 260000 -R 512 srv005:/u01/app/oracle/reorg/.
Should I run multiple files to see the performance ? or is there any other options i should use to increase the speed of the file transfer?
I am sorry if my questions are not valid.
Thank you
-haris
@Laurent Schneider
Hi Laurent
Thanks for reply.
Actually I tested with 1 file but w/out “-o Ciphers=arcfour” because i am not sure what is that for. I checked the ssh_config file but most option are commented. I did not use -C option for compression.
Here is what I did:
$ time echo “Progress\nput /u07/oradata/DP_FILE1.dmp” | sftp -B 260000 -R 512 svr005:/u01/app/oracle/reorg/.
Should i run multiple files to invoke parallelizing?
I am sorry if my question is not valid.
Thank you
-haris
@Laurent Schneider
Hi Laurent
Today I tested with 2 files, 2GB size each file. I did not use compression mode
Result.
srv001:oracle/reorg $ cat batch_transfer0.sh
echo “Progress\nput /u07/oradata/EXPDP/DP_INTERLIVE_OFF_02.dmp” | sftp -B 262100 -R 512 srv005:/u01/app/oracle/reorg/. &
echo “Progress\nput /u07/oradata/EXPDP/DP_INTERLIVE_OFF_03.dmp” | sftp -B 262100 -R 512 srv005:/u01/app/oracle/reorg/. &
wait
srv001:oracle/reorg $
real 10m39.69s
user 9m19.72s
sys 9m0.88s
Any comments or suggestions ?
Thanks
-haris
Hi Haris,
Your comments landed in my spam box, sorry about this…
The cipher suite is the way you encrypt the traffic. arcfour is believed to be faster than the default 3des.
The result will be impacted by your server and network usage… maybe try when there is little activity on the server