FineConnection.com

Network monitoring with Monitor one - a six step guide

Posted On: August 19, 2005 - 15:18 by Admin

Why monitor your network?

In today's connected world, real-world usage of computing revolves around the concept of networking, and thorough network management is extremely important. It is therefore somewhat surprising to discover that so many businesses and organizations fail to spend a reasonable amount of time and money to set up some kind of reliable and useful network management.

The obvious (yet often overlooked) fact is that without a reliable network, there can be no reliable networking! Yet details of what the network is doing, how it is performing and where the problematic areas are, is often simply not available.

The importance of a business's network is often overlooked or dismissed as "simply there", yet the critical nature and growing importance of LANs in a business organization make it obvious why network management is absolutely vital, and cannot be omitted or neglected!

Possibly one of the reasons for the small number of network management applications in use is the unfamiliarity with the network management approach, and the complexity of many suitable applications in general.

Step one: visualize your network.

  • The best way of truly understanding how a network functions is to use a network management application that can display a graphical representation of your network.
  • Steer clear of management applications that only allow you to monitor network health via lists of detected hardware in your network. These applications only focus on individual devices and do not take the important network relationships into account (further explained in step two).
  • Building your network maps as accurately as possible will improve the error tracking process and the speed of solving network problems. It will also help you to locate the trouble spots, and will help you to decide where to add new hardware to introduce fault-tolerance.

Step two: setup alerting and logging.

Before setting up an effective alerting mechanism, consider the following:

  • To check the status of a network device, the device must be able to respond to status requests of a management station. Only manageable equipment will respond to polling. If possible, the amount of unmanageable equipment should be limited, as it will create black spots.
  • Not all network equipment is of equal importance. Backbone ATM switches are usually much more important - in terms of impact for continuity in case of failure - than for example a terminal server or printer.
    bullet In general, network managers want to be informed about major network events such as a power failure of a backbone switch or a crash of the corporate mail-server. A non-functioning printer is less important and can wait - especially when alerting is performed via a pager or mobile phone during the weekends.
  • Network managers should only be alerted once about the same failure. The management application should be able to evaluate the event stream to pinpoint the root cause of the failure and prevent superfluous and incorrect alerting.
  • To provide this level of alerting, use a management application that allows you to assign different priority levels to different types of equipment and one that also includes a suitable error control feature to provide the intelligent alerting.
  • A often-encountered negative side-effect of intelligent alerting is that sometimes not only incorrect and superfluous, but also less important events will be hidden. Less important events are events that do not require immediate action but are nevertheless important since they can indicate potential problems. So, pay also attention to these events, enable logging and check them regularly.

Step three: collect historic information for baselining and trending purposes.

  • Baselining is a broad term for any analysis method that compares changes in actual data against a baseline. The most common use of baselining is as a tool in performance management for trending analysis - comparing a performance metric to a historical value to find a trend that can be used to estimate future performance or needs. A second use of baselining is for monitoring network health (watching for changes in problem indicators), which is a proactive form of fault management.
  • Before you can define thresholds for proactive network problem detection, it is essential to know the normal behavior (baselines) of your network. Determine which information to collect from every different type of device to get a clear picture of its typical behavior. Keep in mind that the collected data is also to be used later to determine threshold values!
  • Collecting historic data also allows you to trace back why and when a problem occurred in the past, and why and when it may happen again in the future!
    bullet Historic information combined with well-defined threshold monitoring are essentials that help you to discover potential problems before they actually occur.

Step four: set up threshold monitoring.

  • In general, you can rely on two different approaches to monitor network health, or specifically to monitor the individual devices that form the network.
  • Health monitoring by polling usually requires a management application that can read individual SNMP MIB fields of a network device, and can also check these values against known baseline values to determine if there is a potential problem. You can also rely on the trap mechanism, however polling is preferable. If network connectivity is lost, polling will reveal this failure. While a device does send traps when experiencing problems there is no guarantee that a trap will be delivered to the monitoring station in case of serious network troubles.
  • The network traffic generated by polling is limited. Depending on your needs you can define polling periods ranging from 1 minute to 1 hour. For example: 1000 threshold data reads based on a 10 minutes time period will result in 2000/600 = 4 frames per second! - Even a 9k6 dial-up line can be used!
  • When you start defining thresholds, first concentrate on thresholds that will monitor known problems. If a server is frequently suffering from hanging processes, define a threshold that detects 100% CPU utilization within - for example - a 15 minutes time-period. Define specific thresholds for every different type of equipment and activate them by default. Define thresholds to monitor all or at least the most important - services of the monitored equipment.
  • For a file server for example - define thresholds that keep track of free disk space, disk failures, server temperature and network interface error rates, for uninterruptible power supplies define thresholds to trigger power failures and monitor output power load.

Step five: define real-time graphing.

  • While collected historic data can tell you how your network will behave and what can be expected of it in terms of performance and reliability, real-time statistics are important for allowing you to perform detailed in-depth analysis.
  • To better serve your users you also need real-time tables and graphs, which allow you to immediately respond to basic user requests.
  • Monitor one allows you to create and save shooters (SNMP request definition-files) to show real-time tables, graphs or Meters. To view these statistics, you only need to execute the appropriate shooter.
    bullet Define essential real-time graphs and tables, and keep them at hand for immediate assistance in case you need them. Before building a new shooter, always ask yourself what type of user question it can help you giving the answer.
  • Define shooters that help you answer the most frequently asked questions. Use standards: for every SNMP enabled device, define - for example - at least definition files to retrieve the mib-2 system and ifEntry tables. For routers add also definition files to read the ipRouteEntry table and to graphically display the traffic/load per interface. For application servers create files to show CPU utilization, user sessions, disk usage, buffer usage etc.

Step six: stay alert!

  • Never lay back! Downtime is not an option!
  • Check your log files every day.
  • Pay attention to changing circumstances.
  • Try to interpret and to explain repeating events.
  • Fine-tune threshold settings, but do not change them every day!

The following list outlines a set of key terms that are used in context with SNMP, Network Management and Monitoring.

By Paul van Bergen
Founder FineConnection
Oct 2002

Valid XHTML 1.0 Strict

News

InfoFineConnection is pleased to announce the availability of the new stable Monitor one version FP1.106.391 (February 2008).
ChartsFor superior trending and long-term analysis, Monitor one can act as a "front end" for RRD. RRD is a system to store and display time-series data. The RRD can also perfectly be used for exporting logged trending data to text files for use in spreadsheets or databases. More...
If you're using HP/Compaq servers with Insight manager agents in your network, click here to learn how they can be monitored with Monitor one.
PDAMonitor one provides an interface to messaging gateway systems, making it easy to send alert messages to pagers, mobile phones, PIMs and wireless devices.
MonitorThe Monitor one "Desktop" option allows you to save Monitor one desktop configurations to the database for quick access later.
CertificateThe new version also comes with a new licensing policy. The required license type is now only determined by the number of device objects on the network map from which you want to monitor uptime. The number of concurrently running Shooters (SNMP monitors) is now "unlimited" in all versions (was dependent of the license type!)
Font_and_ColorThe new version allows you to define the font name, size and color for object labels on the network map.