Senior Site Reliability Engineer
Friday, July 9, 2021
AvidXchange is the industry leader in automating invoice and payment processes for mid-market businesses. Founded in the year 2000, AvidXchange processes over $140 billion transactions annually across its network of more than 600,000 suppliers, transforming the way 6,000 customers in North America pay their bills. AvidXchange is distinguished as a global fintech unicorn and one of the fastest growing technology companies in the U.S. with 1,400 employees supporting customers across seven office locations. Our employees live by our core values, including “Innovate to Change the Game”, “Passion about Customer Success”, “Win as a Team”, “Play to our Strengths”, and “Have a Blast”. We are on a mission to create something different at AvidXchange. Come join the team!
The Senior Site Reliability Engineer is responsible for providing continuous feedback of site health, reliability, availability, and user experience for a specific Avid product domain. You’ll help redefine standards and practices as our SRE model evolves to align by domains. You’ll have the opportunity to leverage leading edge technology, keep your skills fresh and up to date, and mentor junior SRE teammates by building out processes, libraries, and guidelines.
This role is expected to understand the product in depth, collect and analyze meaningful measurements and provide feedback to the business, Software Engineering and Product teams. The SRE will work very closely with the key stakeholders to help drive changes to increase customer satisfaction, product availability, reliability, and the completion of strategic technical initiatives.
In addition to monitoring and integration with the observability platform, a heavy focus will be placed on automation opportunities and automating operational processes to maintain 99.9% availability of the product. These efforts are in addition to Production SaaS Operational and Support responsibilities to quickly respond to and resolve production incidents, prevention of service disruption, and continuously improving the MTTR.
Performs application specific production support, incident management, problem management, RCAs, and service restoration as needed to quickly respond to and resolve production issues.
- Free up the developer resources to focus on developing new features in the product by handling most of the relevant aspects of how to operate the products effectively and proactively manager customer experience.
- Plan and achieve high availability, performance, and availability of the product service.
- Ensure pro-active monitoring of all core services and processes to prevent un-planned service disruption.
- Implement self-healing and scalability of technical services to avoid un-planned disruptions.
- Establish observability of the business system health by integrating with the observability platform using automation
- Maintains operations runbook for during business hour and off-hours system support.
- Partners with the engineering to ensure successful change management from development to delivery.
- Implements and trains team members on the tool consolidation strategy to optimize spend versus value for our end to end monitoring platform.
- Contributes to definition of strategy, standardization of technologies, and establishment of patterns for rapid and continuous development and application of automated solutions to address reliability issues and automate manual tasks.
- Leads, implements, and trains team members on measurement capability of core product availability across Azure and AvidXchange Cloud using HTTP endpoint testing and synthetic user testing.
- Present usability, reliability, incident, and user experience of the core product services to senior and/or executive leadership on a weekly basis.
- Define and report SLOs/SLAs for 99.9% availability to executive leadership and business partners.
- Influences product delivery teams to implement usability and reliability enhancements leading to improved user experience index scores and improved availability
- Provide detailed analysis and troubleshooting for systems outages providing feedback to product/software engineering
Required Experience, Qualifications, & Skills
5 plus years of experience in a Site Reliability Engineering or Software Engineering role.
- Bachelor’s degree in Computer Science, Information Technology or equivalent experience plus certifications
- 3-5 years of working experience with Windows Server OS administration
- 3-5 years of working experience with SQL Server and Entity Framework ORM
- 3-5 years of working experience writing and tuning SQL queries
- 3-5 years of working experience with IIS configuration and scalability
- 3-5 years of working experience of VMWare VSphere
- 3-5 years of working experience with ASP.Net MVC
- 2-4 years of working experience with Dynatrace, Azure monitor, AppInsight, log analytics
- Familiarity with RESTful API
- Experience with Azure, or similar cloud providers
- Strong understanding of web hosting infrastructure and high availability architecture
- Experience measuring and monitoring .NET applications, SQL Servers/Database, and Serverless cloud resources or equivalent Java-based experience
- PowerShell or Linux scripting for creating automated routines for ensuring site availability
- Development/coding experience and skills for writing custom automation solutions
AvidXchange is an equal opportunity employer. AvidXchange is committed to equal employment opportunity in accordance with applicable federal, state and local laws. AvidXchange will not discriminate against applicants for employment on any legally recognized basis. This includes, but is not limited to: veteran status, race, color, religion, sex, sexual orientation, gender identity, gender expression, national origin, age and physical or mental disability.
- Job Family
- Job Function
Software Product Usability Engineering
- Pay Type
- Employment Indicator