How would you approach Network Management?

“Tools must do the work. People must think.” – When it comes to the management of IT environment, way too often I see this principle forgotten. We spend too much time for something what can be a) Automated;  d) Done by the tool with much better quality.

My personal statistics of customer environments tells that the most popular management scheme is a variation of “hero approach”, where few “heroes” within a company have the knowledge required, but all that – is inside their head. You can recognize such case if:

  • You try to remove these key people from the process, and things start falling apart. Even on holiday they need to check emails and be reachable.
  • It takes ages for a newcomer to learn IT environment because of lack of documentation, tooling and common standards.
  • Presence of standards or rules means little, as they are often bypassed, creating undocumented exceptions.
  • Most things are done manually.
  • Reporting and statistics is incomplete or absent.
  • Operation is reactive: issues can be detected only after users start to scream.

On the other side of the spectrum is a well-oiled IT Delivery machine which works in proactive mode, by itself. There are many definitions of an IT Maturity Model – use the one you like.

However, it is easy to critisize: much harder is to come up with something what makes sense. If I ask myself how I would start building proper foundation,- then before processes and procedures I would start with tooling and automation. I did have chance to put it in practice, and can confirm it’s working.

So this post is a brief overview of toolset that provides the foundation for IT Service Delivery.

.

1. Network Monitoring System

There must be a vendor neutral tool able to act as a heart of IT operations. Everything else hardly makes sense if we don’t have a portal able to give us information on inventory, availability, events, and other essential domains. Don’t spare money on this one: buy the tool you need.

2. Naming Standard

Before starting to populate your NMS, devices must be named properly. Kind of obvious thing, but seldom done properly. Based on name, we can run inventory reports, raise custom alarms, group things by region, etc. One of common approaches uses hierarchical CCLLLTTNNN notation, where CC is 2-letter country code, LLL is 3-letter location code, TT is 2-letter device type and NNN is 3-digit sequential number. Or whatever other naming standard that suits the size and diversity of your organization.

3. Inventory and Availability

Now with proper name assigned, all devices that we care about – must be put in NMS and inventorized. Along with that, all Telnet/SSH/SNMP access credentials are standardized. Once that is done, Availability domain is completed too, as each device will have its reachability statistics automatically collected.

4. Authentication and Authorization

Get rid of common passwords – deploy AAA engine on Radius or Tacacs protocol. Access should be personalized and accounted for. Configuration rights must be given on a “need to have” basis.

5. Event management

All devices must send Syslog and SNMP traps to your Network Management Station. Not for the sake of “sending”, but for raising automated alarms if particular events are detected. Your NMS must be able to raise service tickets.

6. Capacity

Basic SNMP-based interface utilization must be collected for all critical interfaces. It may be enabled on step 3, but also as a separate exercise. Later you may decide to enable (x)Flow on your devices.

7. Configurations

Your NMS must be able to do automated and manual backups of your device configurations. It will help to detect unauthorized changes and roll back in case of troubles.

8. Software

Standardize on certain software levels. This simplifies management and minimizes amount of bugs that we may encounter.

.

Well, that’s it. I mean, easy part done. We now have the basic tools to do some analysis and start taking influencing decisions.

From my experience however, it’s not the biggest challenge to have the above steps implemented.  It is the procedures that ensure 100% up-to-date status  and proactive operation of all above mentioned domains.


Alex Mavrin, CCIE #7846

Visit http://www.apteriks.com and use FREE ONLINE tools for network professionals.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: