Role Overview:
We are seeking an NOC Infrastructure Engineer to join our IT operations team. The ideal candidate will have a foundational understanding of IT infrastructure, with a strong desire to grow and develop their technical skills in a fast-paced, mission-critical environment. The L1 NOC Infrastructure Engineer will be the first point of contact for monitoring and responding to alerts, troubleshooting issues, and ensuring the stability and performance of the infrastructure and systems.
Key Responsibilities:
Monitoring and Incident Detection:
- Continuously monitor IT Infrastructure, systems using Solarwinds.
- Respond to alerts and incidents as they arise, ensuring quick detection and resolution of issues.
- Escalate issues to L2 or L3 engineers as necessary, following established protocols.
Initial Troubleshooting:
- Perform troubleshooting on IT Infrastructure and system issues, including Systems, Windows and Linux OS, DB, Storage system, Cloud Solution, Virtual Environment, MS Azure etc.
- Use remote tools to access and troubleshoot IT Infrastructure devices and servers.
- Document all incidents, troubleshooting steps, and resolutions in the incident management system.
Incident Management:
- Categorize and prioritize incidents based on severity and impact.
- Ensure timely communication with affected stakeholders and provide updates until the issue is resolved.
- Follow up on escalated issues to ensure they are being addressed promptly by higher-level support teams.
IT Infrastructure and System Maintenance:
- Assist in routine IT Infrastructure and system maintenance tasks, including applying patches, updates, and configuration changes.
- Support in performing health checks on IT Infrastructure devices, servers, and critical services to ensure optimal performance.
Documentation and Reporting:
- Maintain accurate and up-to-date documentation of the IT Infrastructure topology, configurations, and standard operating procedures (SOPs).
- Generate daily, weekly, and monthly reports on IT Infrastructure performance, incidents, and other key metrics.
Collaboration and Communication:
- Collaborate with L2 and L3 engineers, as well as other IT teams, to resolve complex issues and improve IT Infrastructure reliability.
- Participate in team meetings to discuss ongoing issues, share knowledge, and contribute to continuous improvement initiatives.
Shift Work and On-Call Support:
- Be willing to work in a 24/7 shift environment, providing round-the-clock support as part of the NOC team.
- Participate in on-call rotations to ensure coverage during off-hours and weekends.
Required Qualifications:
- Experience: 3+ years of experience in Large Industry monitoring Digital Infrastructure in production environment
- Technical Skills: Experience in troubleshooting infrastructure components (Systems, Windows and Linux OS, DB, Storage system, Cloud Solution, Virtual Environment, MS Azure etc.
- Monitoring Tools: Experience with IT Infrastructure monitoring tools (SolarWinds) and incident management systems (ServiceNow).
- Troubleshooting: Strong analytical and problem-solving skills, with the ability to perform basic troubleshooting on IT Infrastructure and system issues.
- Communication: Excellent communication skills, both written and verbal, with the ability to document and explain technical issues clearly.
- Certifications: Microsoft/Linux/Cloud /Hardware/Solarwinds certifications are preferred.
Preferred Qualifications:
- Experience in a 24/7 operations environment, particularly in a NOC or similar role.
- Familiarity with ITIL processes, particularly incident and problem management.
- Knowledge of cloud environments (e.g., AWS, Azure) and their monitoring tools.
- Good understanding of Digital & Cybersecurity service management processes
- Exposure to configuration management tools