Fast Large File Transfers on Windows Shares? Jumbo Frames?
Sometimes you may want to move large amounts of data over a network. The natural and probably naive assumption is that if you have a Gigabit Ethernet network, the transfer will occur at 1Gb/sec.
Many things can prevent this from happening:
1. Flow control on the transmitting node.
If a Gigabit Ethernet equipped server has a 100Mb/sec client in an active data transfer, this “feature” seems to throttle transfers to faster (Gb Ethernet) clients while the transfer with the slower client is active. link
Some drivers, including Intel’s, let you turn off flow control in the device manager.
2. SAMBA/CIFS/SMB/OS inefficiency.
SMB/CIFS is very convenient, as its fully integrated into Windows, but is not the lightest or fastest method of transferring data. You can use FTP or many other methods to transfer data quickly, but most of them are not nearly as convenient as SMB. SMB2.0, which Microsoft introduced in Vista and Windows Server 2008 is much better:
(note arrows indicate direction of transfer, the first computer listed is the initiator. “2008×64<-box” means that the windows server 2008 x64 computer initiated a copy of a file from the “box” computer to itself. )

This improved throughput with SMB2.0/vista comes at a cost of CPU Utilization:
If you want high-performance, use SMB2.0, but don’t expect miracles for free.
————————————————————————————————————————–
On the topic of OS liability in network throughput, there are two registry changes needed for Vista SP1 and Server 2008:
in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile
change NetworkThrottlingIndex to 0xffffffff
and change SystemResponsiveness to 0×64.
This essentially disables the Windows MultiMedia Class Scheduler Service’s ability to prioritize multimedia playback at the cost of networking performance.
3. Encryption Tax
This may have more negative effect on delay then bandwidth, since there must be processing on each end of the transfer, but nevertheless, VPN data overhead can be significant >10% (often depending on packet size distribution as some costs are fixed per-packet).
4. Gigabit Ethernet over the PCI Bus.
It is true that the 32-bit 33Mhz PCI bus has a theoretical peak bandwidth over 1Gb/sec ( 100/3 Mhz * 32 bits = ~1.067 Gb/sec )
While this exceeds 1Gb/sec that Gigabit Ethernet needs, the PCI bus has overhead, is not full duplex, and is sharing that bandwidth with multiple devices on the bus. The effective bandwidth of a regular pci bus is around 600Mb/sec half-duplex, some chipsets are better than others for actual pci bandwidth. PCIe 1x has 4Gb/sec total bandwidth per device (2Gb/sec full-duplex), so PCIe Gb ethernet interfaces have no issues with the bus as a bottleneck)
5. Hard Disc stuff.
SCSI, SAS, ATA-66/100/133, and SATA1/2 all had impressive throughput rates for their time, but the interface was never the bottleneck.
The discs themselves have sustained transfer rates (STR) limited by:
1. Linear Speed L of the Head which is a function of:
- Rotational Velocity V (revolutions per time) on most hard drives this is not dynamic.
- Location of the data r (radial distance of the head where the data is being operated on).
so L=V*2*Pi*r , in a normal 7200 revolution per minute, 3.5″ hard disc the Linear Velocity L = 7200*2*Pi*3.5/2 = ~79168 inches per minute at the outer edge, less as you get closer to the center.
this Linear Velocity reduces in a linear fashion as you approach the spindle of the disc.
2. Linear data density d, (bits per inch), which is usually proportional to the square root of the areal density (higher density means the head can traverse and r/w more sectors for a given linear velocity)
so
Sequential Throughput ~ L * d ~ 2 * Pi * V * r * d
so there you have it, in a given disc, sequential data throughput is linearly related to distance from center.
with the same rotational velocity, a 3.5″ disc will have an outer edge 40% further from the center, which means 40% faster then the outer edge of a similarly dense 2.5″ platter.
————————————————————————–
It may not be called a disc problem, but effective disc large-file transfer rate can be throttled if the data has to be fragmented on various spots on the disc, since that requires head seeks for something that could be a sequential operation. Seeks cost time and transfer no data.
Carefully chosen modern 7200rpm SATA2 high-areal density discs like the this or this can perform sustained sequential reads or writes close to 1Gb per second at the outer edge. The discs I have been messing with ( Western Digital
these graphs are decreasing because the program calls 100% the innermost, and 0% the outermost of the disc… also you may notice the graphs are not linear as I suggested, this is because the horizontal axis is “%” which is % of data, not % of radial distance.
I will not bore anyone with the math/logic to understand why this makes sense, but it does… and graphing it like HD-Tune did here should theoretically yield a quadratic, which it seems to by the pictures.
————————————————————————–
For my testing, I used 2 WD WD640AAKSs in Raid0:
If the data was placed at the inner edge, where throughput is only 800Mb/sec that could obviously throttle any Gigabit Ethernet transfers relying on that.
In this case I am not worried about data location because my drives are empty and windows/ntfs is fairly smart about data placement, here is where my 10GB test file was placed:

Exactly where I would want it, at the outer edge. I can rely on Windows to do this consistently because the disc is otherwise empty.
So with this scenario, I am confident the discs will not bottleneck my transfers.
6. Framing overhead
Standard Ethernet frames are 12,000 bits each, in a bus capable of doing 1,000,000,000 bits each second this can be significant framing overhead.
72,000 bit “Jumbo” frames are small enough to be effectively checked for integrity by Ethernet’s 32-bit CRC, while reducing the overhead per bit ratio for huge (compared to frame size) data transfers.
note each end of a network and all nodes in the path which are traversed must support these “jumbo” frames in order for the transaction to use them.
further reading on jumbo frames: http://sd.wareonearth.com/~phil/jumbo.html , http://docs.hp.com/en/783/jumbo_final.pdf
well, I wanted to explore this in the real world of windows on platforms with abundant processing power:
Jumbo Frames make a big difference here, but arguably more important is the observation that the transfer is a lot faster if the current holder of the file initiates the transfer, rather than the recipient.

Here is where we see the other benefit of jumbo frames, CPU utilization is less than half on the recipient of the transfer, regardless of who initiates it.
—————————————————————————————————————————–
2008×64 is a q9450 12MB 45nm quad core @ 333×8=2.66Ghz Core 2 Quad “Yorkfield” CPU
with 8GB of ram
Windows server 2008 x64
Intel 82566DC Gigabit Ethernet on PCIe w/9.11.5.7 drivers from Microsoft on 6/21/2006
DG33TL motherboard
dedicated 2xWD640AAKS in RAID0 on ich9r for network share w/ntfs cluster size of 64kB and hdd&volume cache enabled.
—————————————————————————————————————————–
Both Computers are directly connected to a Trendnet TEG-S8/A switch.
—————————————————————————————————————————–
“box” is an e6300 2MB 65nm dual core @ 333×7 = 2.33Ghz Core 2 Duo
with 4GB of ram
vista x64 sp1
Atheros L1 Gigabit Ethernet on PCIe w/2.4.7.13 drivers from Atheros on 4/28/2008
Asus G35 motherboard
dedicated 2xWD640AAKS in RAID0 on ich9r for network share w/ntfs cluster size of 64kB and hdd&volume cache enabled.
—————————————————————————————————————————–
Why does it matter which side initiates?
Before I put this to rest, Let’s examine what goes on during the transfer, to see why it is faster when the sender initiates:
well lets consider the transfer from box to 2008×64
when it’s initiated by the sender, this happens:
| sender and initiator | recipient | |
| Network Throughput | ![]() |
![]() |
| Memory Usage | ![]() |
![]() |
This seems pretty normal, let’s look again at the difference in transfer rates depending on the initiator.

Now lets look and see why it’s slower when the recipient initiates:
| sender | recipient and initiator | |
| Network | ![]() |
![]() |
| Memory | ![]() |
![]() |
Whatever exactly is going on in the sending computer when a transfer is initiated by a recipient, it’s clearly using as much RAM as it can find, until there’s none left. When it runs out of RAM, the transfer goes on without degradation of performance, which leaves me at a loss to why it needed all that ram so badly to begin with.
Let’s look at CPU use:

Although it uses more CPU cycles on the sending node, it’s clearly better on modern multi-core CPUs with SMB2.0 as-is to initiate a transfer with the sender of a file as throughput is significantly better, and it doesn’t pillage the node of its RAM.
—————————————————————————————————————————————
Conclusion:
- High Performance is possible in Windows Shares with SMB2.0
- Jumbo Frames are good
- Sending node should initiate large transfers
- Must be careful to rule out disc bottlenecks, especially important to consider data location on the drive.
This entry was posted on Thursday, July 24th, 2008 at 12:00 and is filed under it. Find similar posts by selecting any of the following tags: it, networking. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.










on 31.07.2008 at 18:36 houkouonchi wrote:
I am curious why the speeds suck so bad on windows unless you use jumbo frames or newer samba? The only time I have seen decent speeds with samba is when going from Vista -> Linux and I was able to get 95 megabytes/sec or just under 800 megabits. Linux -> linux I never have any problems hitting 950 megabits and it holding stable for the entire transfer (not using jumbo frames at all).
on 16.08.2008 at 21:16 Turtle wrote:
Yes Yes!, What is up with the ram on the sending computer. I have a similar environment with the same issue. When vista initiates transfer from 2k8 server ram rockets… NOT good. any ideas for a fix or reason for the problem. Great article!!! You’ve got to love the 6400AAKS in the raid. Peace out. Thanks!
on 29.09.2008 at 00:47 Daniel wrote:
Excellent post. I have the same issue with the RAM usage when the client requests the download. Odd and wasteful.
on 24.11.2008 at 22:03 Ciro wrote:
I learned a lot with this post! I was already cursing my gigabit adapters, but it seems SMB1 is not ready for such high speeds. I will try to use something more lightweight for my transfers now. Sadly, nothing beats the practical side of SMB in Windows. Thanks for the post! :)
on 09.09.2009 at 14:05 David wrote:
Vista/2008 work differently than xp/2003. They allocated as much memory as possible for OS and for application. What it appears to be doing is caching the disk information into memory, as it is faster to read from memory to the network than it is from disk to the network.