[rtg] buffering mechanism

Brad Killebrew brad at txic.net
Thu May 25 14:38:00 EDT 2006


But we still love you bill. unf.


On Thu, 25 May 2006, bill fumerola wrote:

> On Wed, May 24, 2006 at 04:07:16PM -0700, Matt Provost wrote:
>> Yeah I saw this before, but like you said it doesn't patch cleanly
>> against the current source.
>
> it did when i first sent it out (aug 2004) and i merged the vendor source
> and regenerated the patch monthly until i left yahoo in july 2005. i
> kept three perforce branches (vendor, yahoo, public) and some makefile
> magic to accomplish this.
>
> anyways.
>
>>                              The db stuff changed quite a bit with the
>> new drivers. In any case, the buffering that I did happens outside of
>> the db drivers so it will work with any of them. If I had more time I'd
>> love to get some of your changes ported to the new version, but I don't
>> at the moment.
>
> the sqlbuf code i pointed to does the buffering in an db independent
> way. i haven't looked at the current code. doing buffering completely
> w/o some db-dependent knowledge has a few problems that spring to mind.
>
> 0) does the current code coalesce rows over a single poll period? this
> is the single largest performance bottleneck that the old code suffered.
>
> 1) you have to at least know the difference between a soft error (network
> hiccup, server restart on a persistent connection) repair) and a hard
> error (index corruption).  otherwise without treating them differently
> it may cause buffering to occur needlessly on some tables.
>
> 2) you need to know the max amount of rows or characters a query can
> contain and this is database dependent.
>
> the new db layer may have abstracted most of that logic out. still i
> wonder how much work it'd be to have written an sqlbuf_pgsql_cfg() and
> sqlbuf_flush_pgsql() versus Yet Another Database Abstraction Layer.
>
> in fact there are/were a number of things that limit{ed,s} rtg usage in
> high performance (many hosts and/or many targets per host) environments:
>
> 0) does the buffering code send the data using a helper thread? does it
> do it in between poll periods? does it insert every time a snmp query
> happens? i could run thousands of targets on a sub-10 second interval,
> down the database for an hour, and not a single poll interval was missed
> and every insert was coalesced and buffered.  try that without a helper
> thread.
>
> 1) what is the mutex locking situation? last i checked if you increased
> max threads too large you could have every thread hit the same device.
> set too low and you could have one device stall all your threads. my
> version had host-based locking so no N>1 targets in a host{} stanza would
> be polled until the first was complete. this was my second largest
> performance improvement.
>
> 2) related to #0: per-thread db connections are just a wasteful use of
> resources. if they're still there i'm sorry for whoever i just offended.
>
> 3) per-instance timeouts, retry, snmp port, etc need to be per-host.
> some devices can be treated aggressively, some need tender care. most
> globally configured parameters in rtg should be inherited but able to
> override.
>
> 4) removal of targets that return hard errors or consistently timeout.
> the user needs to be able to define targets that stay no matter what and
> define how many times constitutes "consistently".
>
> 5) i still hate libtool. you guys were willing to carry a few thousand
> #ifdefs for insane reasons (NEW_TARGET_FORMAT, FEATURES) but a handful
> of HAVE_MYSQL/HAVE_PGSQL weren't going to work? can you have both compiled
> in with the current code? is there any consideration that you could have
> multiple databases at some point and point some targets (hosts?) at one
> and some at another? could these be one mysql and one pgsql?
>
> 6) there were some minor performance points where snmp_sessions were
> being created and torn down per-target at runtime when they could be
> cached and used per-host. snmp oids were being compiled at runtime instead
> of at config-time. etc. i'm not sure how measurable the performance
> difference was but it sure made the code cleaner.
>
> i don't mean to piss on any parades, but the community could have a lot
> of the above written and maintained by me at the expense of my former
> employer. now it's going to take someone rewriting it or bringing my
> patch up to date. hell, even the last time my code was committed it was
> behind giant #ifdef FUMEROLA and turned off essentially.
>
> i've considered bringing my work up-to-date myself but: i don't have any
> warmer feeling that it would be committed this time than i did last time.
> i don't have a giant environment to test it on like i did before but i
> could get around that. more to the point, if i'm going to rewrite/update
> the sql buffering/coalescing, config reader, config internal storage,
> half the snmp code, and all the #ifdef and other C cleanups... then i
> start to realize that's 50% of the code, i start thinking about code
> forks and then i get angry at the development model.
>
> anyways, i bring up these performance/usability/feature things once a
> year now and i'll crawl back into my hole for 2006 until i'm convinced
> my work will end up in the tree or someone pays me my hourly rate to
> give me something to walk away with if it doesn't get committed.
>
> apologies for sour grapes and posting before my morning coffee,
> -- bill
> _______________________________________________
> RTG mailing list
> RTG at fireflynetworks.net
> http://fireflynetworks.net/mailman/listinfo/rtg
>


More information about the RTG mailing list