[rtg] rated update
Matt Provost
mprovost at termcap.net
Wed Nov 4 21:53:25 EST 2009
I just wanted to give everyone an update on the status of rated, my fork
of rtgpoll.
I've completely rewritten the snmp code so that it uses getnexts now,
similar to the way that snmpwalk works. This lets it walk through
sections of the snmp tree on the polled device without having to specify
each oid. My goal is to eliminate the need for scripts that generate
target lists.
Because there isn't a config for each oid now, it has to generate a
mapping of oids -> iids (the id that it uses in the database tables). So
I made another table that just stores any oid that it sees and the
corresponding auto-incremented iid.
Again because it doesn't know about every oid in a config it determines
whether it's 32 or 64 bit and whether it's a counter or a gauge from the
snmp response.
Another change is that each thread chooses a host and does all the oids
for that host by itself. This means that two threads will never
simultaneously make requests to the same device, which could overwhelm
its cpu. It also reduces the amount of locking done internally in the
poller since it only has to lock when it gets a new host, not for each
target.
To reduce database contention, each host writes the polled data into its
own table. So two threads won't ever update the same table. If you're
polling hundreds of devices you will end up with a lot of tables but
generally databases don't care. But this helps a lot with the (more?)
common case where you have a smaller number of devices with a lot of
oids each.
In order to make all of this easy to use, I've added code to the
database driver (just postgres for now) that automatically creates all
of these tables as it needs to. So there's no need for a script to
prepare the database before you run the poller.
The target file format has also changed. So now you can group hosts into
templates that share the same targets. Usually you have multiple devices
where you want the same information from each one.
Here's a sample target config:
template switch {
# ifHCInOctets
target .1.3.6.1.2.1.31.1.1.1.6;
# ifHCOutOctets
target .1.3.6.1.2.1.31.1.1.1.10;
# ifHCInUcastPkts
target .1.3.6.1.2.1.31.1.1.1.7;
# ifHCOutUcastPkts
target .1.3.6.1.2.1.31.1.1.1.11;
host myswitch1 {
address 192.168.1.1;
community public;
snmpver 2;
}
host myswitch2 {
address 192.168.1.2;
community public;
snmpver 2;
}
}
When it's done walking each target, it does one more insert for the
target oid that records the number of getnexts that it did this round,
and the rate. This lets you look at the performance of the poller itself
and you can identify oids that are slow the respond. For example, on
some of my switches the temperature sensors are very slow compared to
the interface counters. So I could start up a second rated poller that
uses a longer interval for those oids.
I'm sure there are lots of other changes and fixes that I've made that
I'm forgetting to list here. The code should be a lot faster and more
stable than before, and I've been using valgrind to fix any memory
leaks.
You can download the latest code and get the source at:
http://github.com/mprovost/rated
At the moment it only works with the postgres driver.
I'd like to hear of any success/failures people have with the code or
any comments about the new changes.
Thanks,
Matt
More information about the RTG
mailing list