Greetings all,
I am new to Squid (as of yesterday), so please excuse my abundant
ignorance.
I'm working on a small research project, for which I have a need
to ``get in between'' the browser and rest of the Universe in order
to capture and/or diddle some of the information going back and
forth.
Now, I _could_ build my own little proxy, from scratch, in order to
accomplish this, but frankly, I'd rather not.
I did a bit of web searching yesterday, and it didn't take me long
before I found Squid and saw that it might be a useful code base
for me to start from in order to build the kind of tool I actually
need to build. I hate reinventing the wheel, so if possible, I would
indeed like to leverage off this existing code base (Squid).
In a nutshell, I need to be able to:
a) capture and log to local disk a selected subset of all HTTP
messages going outwards from the browser to the proxy. In
particular, I need to capture/log all GETs and POSTs resulting
from the completion/submission of HTML FORMs, and...
b) diddle/edit (in some minor ways) all HTML passing from the
proxy to the browser. (This could be done either at the time
the proxy gets the page in question from some remote server
or else on-the-fly, just as the proxy is about to send the
page down to the browser. It really doesn't matter either
way, as long as I can affect what the browser receives.)
All I really need to know, I think, in order to acomplish the above two
tasks is where, exactly, within the existing Squid code I should place
my `hooks' for these two operations. Fine names and line numbers would
be most helpful. (I don't know my way around the code at all, so although
I could probably find the Right Points to install the hooks myself, it
would probably take me some time to do that, however one or more of you
folks may just be able to tell me where to go... in the code that is. :-)
As regards to goal (a), if someone will just tell me where I might install
a hook in the existing code that would have the effect of grabbing _all_
messages going from the browser to the proxy, that will be sufficient to
get me started, I think. I already know how _I_ can programmatically
differentiate FORM submissions from other types of HTTP messages that the
browser might send to the proxy. Si I will write the code to do the
differentiation and the logging (of just the form submissions). I just
need to know where to hook in my code additions (for Squid) to do that.
Regarding both goals, I _do_ understand (I think) that I won't be able to
do anything useful (as regards to EITHER logging or diddling) of messages
passing between the browser and the proxy in cases where the messages in
question are encrypted (https?) stuff. That's OK. I understand the
problem. It will still be useful to me (and I will settle for) being
able to do the kinds of logging and diddling/eduiting that I want to do
on just the subset of messages that are unencrypted.
Any help/pointers would be greatly appreciated.
P.S. Just in case I need to allay any unwarranted fears, allow me to say
again that I'm only working on a small (private) research project whose
goal is just to collect some data about web pages. I am _not_ building
any clever commercial tool that will do evil things to other people's
HTML. 'nuff said.
Received on Thu Aug 30 2001 - 12:38:55 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:01:57 MST