I have been trenching forward with bringing our alerting more in line with SLA, Escalation, and tiers compatibility and hopefully scalable as the business grows.
- Alerts setup as T1, T2, T3 naming conventions that lead back to Tier 1, Tier 2, etc. Each Tier has a general SLA expectation/commitment to it. Then there is a naming scheme to keep that structure in place along with AOR - Area of Responsibilities or disciplines.
- Servers, VM Environment, Disk Storage/SAN, basically Systems - these alerts go specifically to the Server Team/s.
- Routers, switches, WAN links, LAN links, Firewalls - these alerts go specifically to the Network Team/s.
- Application performance, even specific app servers, specific services - these alerts go specifically to the Application Development Team/s.
- HVAC, room temperatures, generator power - these types of alerts go specifically to the Facilities Team (and secondary to Server Teams & Network Teams for areas of common interests)
- IPS, IDS, and some Firewall - these types of alerts go specifically to the IT Security Team.
Management does not receive alerts as I don't believe would fit their function within business; I would imagine that wouldn't yield an efficient work process between Engineers/Support Staff and Management. But ehh...What do I know? There are reports whether on demand or custom built and polled, auto delivered on a schedule, etc that help Management keep abreast and keep a hand on the heartbeat of the area they are responsible for. If an outage occurs they are notified at the appropriate time but not through active alerts.
If I could change anything probably the biggest thing is getting our escalation process embedded into our company culture as we refine it. As others have mentioned here, the Engineer or specific Support Staff that is working on an issue/trouble shooting it, should not be responsible for communication to the organization or be the POC for the issue. It is extremely difficult for someone to work on an issue but then be tasked with responding to status, calls, emails, texts, whatever. So to refine our process while continuing to get "buy-in" on how our organization reports, escalates, and checks/reports status of situations, who responds, etc.
It is always exciting watching companies grow and mature into the best businesses they can become. It is a pleasure to be a part of such growth Image may be NSFW.
Clik here to view. with any of the companies in which I have worked. It allows us for the creative creation of the entire Monitoring Environment and how it runs, what it does, how it acts, what it looks like, who gets to see the creations and act upon them - empowering business support units.
NetEng33