On 01/11/2011 06:27 PM, Amos Jeffries wrote:
> On 12/01/11 14:20, Alex Rousskov wrote:
>> On 01/11/2011 02:08 AM, Kinkie wrote:
>>> On Tue, Jan 11, 2011 at 8:03 AM, Amos Jeffries<squid3_at_treenet.co.nz>
>>> wrote:
>>>> The profiler aufs assertions are caused by the profiler not being
>>>> thread
>>>> safe but attempting to account operations inside each of the AIO
>>>> threads.
>>>> Lack of thread safety is due to the stack the profiler maintains of
>>>> what
>>>> states are currently being counted.
>>>>
>>>> I don't believe we should be maintaining such a stack. Instead I
>>>> think we
>>>> should leverage the existing system stack by having the
>>>> PROF_start(X) macro
>>>> create a counter object in the local scope being counted. When the
>>>> local
>>>> scope exists for any reason the system stack object will be
>>>> destructed and
>>>> the destructor can accumulate the counters back into the global data.
>>>
>>> +1. I like this.
>>
>> I have a crude patch that implements such scope-based profiling. We used
>> it to find performance regressions in Squid3 (still an ongoing
>> project)...
>>
>> I would be happy to find that patch and post if somebody wants to
>> polish it.
>>
>> Thank you,
>>
>> Alex.
>
> Yes please. I threw together a patch last night. I'm happy compare and
> merge the two for inclusion.
Attached. It is crude and dirty, as promised. The code did work well for
me at some point though. Notes and caveats:
* Most of the stuff you need is in src/base/Profiling.{cc,h}. The rest
is mostly unimportant.
* ProfileProgress(Here) macro starts profiling from the current location
until the end of the async call or something like that. You can use many
of them. It was handy when you want to find the performance culprit in a
long function that does many things. Most or all of the
ProfileProgress(Here) calls added by the patch should be removed. They
were used for chasing problems.
* ProfileScope() macro is what this patch is using to implement
PROF_start(). I think we used either ProfileProgress(Here) or
ProfileScope(), but not both at the same time.
* Profiling scopes know when they are nested so that you can
report/exclude children time.
* The stats collection is not thread-safe because two threads may be in
one scope at the same time. If you want thread-safety, consider using
atomic integers.
* The patch includes several optimizations that you would have to
remove. I did not do that in fear of corrupting the patch. Not all of
those optimizations will result in visible performance improvement. They
need more work.
HTH,
Alex.
This archive was generated by hypermail 2.2.0 : Thu Jan 13 2011 - 12:00:04 MST