[rtg] RTG database pruning.

Matt Simerson matt at layeredtech.com
Sun Mar 2 10:49:05 EST 2008


Hello RTGers,

I'm new to RTG so I beg forgiveness should I ramble too far off into  
the bushes.

We have a fairly large RTG installation with about 23,000 ports being  
monitored. We have a number of polling machines that poll and feed  
data back into a central MySQL database. For the most part, this works  
pretty well. However, we have a bit of a problem in that our database  
size it getting out of hand. We're up to 184GB of bandwidth data and  
running out of disk space.

As you can imagine, having that much data causes problems, such as  
taking a really long time to generate reports. We replicate the  
databases to a mysql slave and offload all the reporting to it. To  
give you an idea of what I mean by slow; I just wrote a report that  
exports a CSV file containing every interface and how much bandwidth  
it passed during a given time range. I chose CSV because our bean  
counters can easily import the billing system. The network team and  
managers can click the email attachment and open it in Excel.

To generate that report for a one month period takes about 26 hours to  
complete. On AMD Opterons with 8GB RAM. I have heavily tuned MySQL to  
perform adequately under this workload. Tuning helped. A lot. But it  
can only go so far.

So, we're faced with the choice of buying some really expensive  
hardware to move RTG onto or find some way to reduce the size oft the  
databases. We really don't want to simply discard old data.

I've done a bit of work with rrdtool in the past and got the bright  
idea that I could implement some data aggregation. Rather than do  
averaging like rrdtool does, I could reduce the data set significantly  
by writing a script that condensed older bandwidth data. I chose to  
condense an hours worth of bandwidth data into a single record which  
reduces the record count by one twelfth.

I wrote a perl script that loops over every interface. It finds the  
oldest record and then generates a list of 1 hour intervals. Then it  
queries the database for all the records in each 1 hour interval and  
generates a summary entry. After successfully writing the summary  
entry into the database, it deletes all the entries that comprised  
it.  Here's an example:

         range: 20070228220000 - 20070228230000 - 12 records.
         id:  609 count: 15418917957  time: 2007-02-28 22:03:04
         id:  609 count: 15322162907  time: 2007-02-28 22:08:04
         id:  609 count: 16916402934  time: 2007-02-28 22:13:00
         id:  609 count: 15361727173  time: 2007-02-28 22:17:59
         id:  609 count: 15906038566  time: 2007-02-28 22:23:09
         id:  609 count: 16745419536  time: 2007-02-28 22:28:01
         id:  609 count: 16628650088  time: 2007-02-28 22:33:02
         id:  609 count: 15940862962  time: 2007-02-28 22:38:00
         id:  609 count: 15126799678  time: 2007-02-28 22:42:59
         id:  609 count: 16288752555  time: 2007-02-28 22:47:59
         id:  609 count: 14757762970  time: 2007-02-28 22:52:59
         id:  609 count: 14268752126  time: 2007-02-28 22:58:01
  INSERT INTO ifOutOctets_5 (id,counter,dtime) VALUES  
(609,188682249452,20070228225959);
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:03:04' AND  
id=609 AND counter=15418917957;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:08:04' AND  
id=609 AND counter=15322162907;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:13:00' AND  
id=609 AND counter=16916402934;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:17:59' AND  
id=609 AND counter=15361727173;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:23:09' AND  
id=609 AND counter=15906038566;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:28:01' AND  
id=609 AND counter=16745419536;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:33:02' AND  
id=609 AND counter=16628650088;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:38:00' AND  
id=609 AND counter=15940862962;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:42:59' AND  
id=609 AND counter=15126799678;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:47:59' AND  
id=609 AND counter=16288752555;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:52:59' AND  
id=609 AND counter=14757762970;
  DELETE FROM ifOutOctets_5 WHERE dtime='2007-02-28 22:58:01' AND  
id=609 AND counter=14268752126;

I test ran this on one interface and then previewed the changes shown  
in RTG.
BEFORE:




and AFTER:




As you can see, the transfer amounts are still calculated and shown,  
but none of the rate info. I read through the PHP code, which calls  
rtgplot.cgi and then I perused the rtgplot.h and rtgplot.c files  
trying to figure out what I'd have to alter to get lines to print. I  
didn't get very far.

Anyone got an idea how I might be able to accomplish this?  The data  
consolidation script is one I'd be happy to contribute back to the RTG  
community if others might find it useful. Unless rtgplot.cgi can be  
coerced into printing this aggregated data, its usefulness is somewhat  
limited.

Matt Simerson
Unix Automation Developer
Email: matt at layeredtech.com
Phone: 214.564.6085

Layered Technologies, Inc.
On-Demand Utility Computing & Hosting Solutions
Learn more>> http://www.layeredtech.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.grdata.com/pipermail/rtg/attachments/20080302/03c43c4c/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rtgplot.cgi.png
Type: image/png
Size: 10064 bytes
Desc: not available
Url : http://lists.grdata.com/pipermail/rtg/attachments/20080302/03c43c4c/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rtgplot-2.cgi.png
Type: image/png
Size: 2697 bytes
Desc: not available
Url : http://lists.grdata.com/pipermail/rtg/attachments/20080302/03c43c4c/attachment-0003.png 


More information about the RTG mailing list