Table of contents[Index] FineConnection web site[FineConnection web site]

Error control

About error control

Error control (EC) is a very powerful feature that helps you quickly locate a problem, prevents superfluous Alerting and incorrect interpretation of a problem. EC tries to find the root-cause of a device that doesn’t respond to status requests anymore.

To be more precise, the above means that if a "No response" event occurs for a device, EC tries to find out whether the event is caused by a definite failure of the device itself OR by another device experiencing problems in the chain of devices (network paths) between the station running Monitor one (the "ThisStation" object on the map) and the device!

EC uses the information provided by the network map (connections and device types) to find out which device causes a "No response" event. It is therefore extremely important to set up your network maps as accurately/factually as possible. If a device "A" is physically connected to device "B", draw a link between them on your network map accordingly!

graphics44
Without EC

By a failure of "Switch 10", four servers get the "No response" status. If Alerting by email is enabled, the network manager receives 5 email alerts (from which 4 are superfluous and incorrect!).

graphics45
With Error control

Only "Switch 10" gets the "No response" status. The servers all get the "Unknown" status (blue tick). The network manager only receives one alert email.

How Error Control determines the root-cause of a "No response from device" event

Every time a device stops responding to status requests, EC verifies the status of all devices in the chain (network path) of devices from the ThisStation object to the device that stops responding. If one of the devices in the chain already has the "No response" status, Monitor one assumes this the root-cause of the event. In this case, the device that stops responding gets the blue tick.

If more than just one chain exists (because of network redundancy), Monitor one verifies all possible network paths!

In order to determine all possible network path(s) from the "ThisStation object" to a device, Monitor one needs two pieces of information:

  1. Link or connection information (which device is connected to which other device)
  2. If a device has more than one connected interface, does this device forward traffic? Does it route or switch packets or is the second interface just used for redundancy reasons and is it "hot-standby"?

Monitor one extracts link or connection information from the network map. It is therefore extremely important to draw network maps as factually as possible. The information whether or not a device forwards traffic comes from the definition of the Class each device belongs to (The checkbox This device forwards traffic via routing, switching, bridging or repeating on the Add/Modify a Class window). It is obvious that if you fail to set this option correctly, EC will not work as expected!

The list below shows some examples of device Classes that forward traffic.

The list below shows examples of devices with more than one connected interface that do not forward traffic

Enabling Error control

Enabling Error Control is simple; just add the ThisStation object to the network map and add a link object between the object and the switch or hub to which it is actually connected. The ThisStation object is a special purpose object representing the physical workstation (or server) that runs the Monitor one software. The ThisStation object is the key object for the "Error control" feature.

graphics46

After adding the "ThisStation" object (and also after each time you add or remove links between device objects) the EC information database needs to synchronize. The EC icon on the Monitor one control panel has changed to the Sync icon: Sync icon. In order to start synchronizing, just click this icon. After seconds the Icon will change back to the normal EC Icon.

graphics47During synchronizing, Monitor one automatically switches to Designer mode and will prevent you from entering Designer mode while processing!

graphics47The time it takes to synchronize the EC information database heavily depends on the amount of redundancy (the number of redundant paths) in your network and can take from less than a second to a couple of minutes!

Verifying Error Control activity

If Error Control is enabled, it takes more time before a "No response" status is propagated to the multilevel network map structure and the control panel. The color of the "EC panel" on the Monitor one main window shows Error Control activity.

graphics47The "ThisStation" object can only be added once (of course!)

Verifying network paths used by Error control

You can verify whether your map is "EC proof" by enabling EC and after that clicking the graphics48 speedbutton on the Monitor one control panel.

Example 1.

graphics49

A small company has two offices in different cities connected by internet via ADSL. The Firewall in the main office has a problem and is down. As you can see from the screenshot, EC is enabled (the "ThisStation" object is present on the map) but nevertheless all devices in the remote office have been marked "down" (erroneously)!

In the above case, the problem is caused by not checking the "This device forwards traffic…." checkbox for the Class the device "InternetCloud" belongs to. As a result, Monitor one "thinks" that it cannot reach the remote office devices at the other end of the WAN link. Monitor one "thinks" that there are no network paths available from the "ThisStation" object to the devices in the remote offices and displays the little "network disconnected" symbols at the bottom left of each device in the remote office. The "InternetCloud" device represents the huge internet routing network in one device.

After checking the "This device forwards traffic……" checkbox for the Class the "InternetCloud" device belongs to, the network map shows:

graphics50

Example 2.

graphics51

The screenshot above shows another interesting example. For reasons of redundancy, a cluster system has two connections to two different switches. Only the first NIC is active, the second one is "Hot-standby". By mistake, the "Forward" setting of the Class the device "Cluster1" belongs to is checked. Switch4 is actually down! Because of the "forward" setting of Cluster1, Monitor one "thinks" that there is an alternate network path to device Switch3, gets no reply from device Switch3 and marks it accordingly.

After clicking the EC verifier speedbutton graphics52 on the Monitor one control panel, the map shows:

graphics53

Only TestServer1 has the "No Error Control information available" indicator (it is not connected).

After resetting the "Forward" control (unchecking the checkbox) of the Class the device Cluster1 belongs to, the map shows:

graphics54