After weeks of poking around in the code, I've finally figured out what was causing download corruption in Squid 2.4 and 2.5.
Note that the corruption does not occur in UFS due to its synchronous file access, so if you don't mind slow performance, stick with that store format. For the performance minded who use AUFS, read on.
The scenario:
1) user makes request
2) squid sets up read request in AIO queues:
2002/10/28 14:04:31| storeAufsOpen: fileno 00007DFE
2002/10/28 14:04:31| storeAufsRead: queueing read because FD < 0
3) comm_poll runs again, and sees that client hit 'Stop' on browser:
2002/10/28 14:04:31| clientReadRequest: FD 66: reading request...
2002/10/28 14:04:31| commSetSelect: FD 66 type 1
2002/10/28 14:04:31| clientReadRequest: FD 66: (104) Connection reset by peer
2002/10/28 14:04:31| comm_close: FD 66
4) storeAufsClose marks the flags->close_request = 1 on the sio:
2002/10/28 14:04:31| storeAufsClose: dirno 0, fileno 00007DFE, FD -1
5) squidaio_poll_done hasn't heard about any cancellation, so it keeps going:
opens:
2002/10/28 14:04:31| squidaio_poll_done: 0x86d7878 type=1 result=0x86d8a3c
2002/10/28 14:04:31| OPEN of /content/00/2D/00007DFE to FD 17
2002/10/28 14:04:31| DONE: 17 -> 0
***reads (and writes results to memory areas which were freed above!!!!!)***
2002/10/28 14:04:31| squidaio_poll_done: 0x86d7878 type=2 result=0x86d55fc
2002/10/28 14:04:31| READ on fd: 17
2002/10/28 14:04:31| DONE: 2132 -> 0
closes:
2002/10/28 14:04:31| squidaio_poll_done: 0x86d7878 type=4 result=0x86d8a3c
2002/10/28 14:04:31| CLOSE of fd: 17
2002/10/28 14:04:31| DONE: 0 -> 0
At this point, the request has been written out to a random memory area. Unfortunately, it occasionally ends up in someone else's download. Thus, the corruption.
The fix is simply to move the check for close_request a little higher up in store_io_aufs.c so that it occurs after the OPEN and before the READ).
I was able to replicate the corruption at least once a minute in my testing. The patch below has been running solid for the past few hours with ZERO corruption.
Although I never heard back from the squid developers as to whether they were working on this, hopefully they will have something to say now that it has (ostensibly) been solved. Hello...anybody there?
The standard disclaimers apply of course: Works for Me, YMMV, etc.
-Phil Oester
(not a squid developer, but I did sleep at a Holiday Inn express last night)
--- original/src/fs/aufs/store_io_aufs.c Sun Aug 11 16:14:35 2002
+++ fixed/src/fs/aufs/store_io_aufs.c Mon Oct 28 14:39:32 2002
@@ -286,6 +286,10 @@
aiostate->fd = fd;
commSetCloseOnExec(fd);
fd_open(fd, FD_FILE, storeAufsDirFullPath(INDEXSD(sio->swap_dirn), sio->swap_filen, NULL));
+ if (aiostate->flags.close_request) {
+ storeAufsIOCallback(sio, errflag);
+ return;
+ }
if (FILE_MODE(sio->mode) == O_WRONLY) {
if (storeAufsKickWriteQueue(sio))
return;
@@ -293,8 +297,6 @@
if (storeAufsKickReadQueue(sio))
return;
}
- if (aiostate->flags.close_request)
- storeAufsIOCallback(sio, errflag);
debug(79, 3) ("storeAufsOpenDone: exiting\n");
}
Received on Mon Oct 28 2002 - 16:19:57 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:10:55 MST