
Lead Software Engineer
Intellum
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives.
- Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience.
- Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department.
- Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline.
- Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence.
- Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".
Requirements
- 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
- Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
- Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.
- Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
- SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
- Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
- Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).
- Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.
- Documenting solutions and training operational teams on how to effectively support and maintain systems.
- Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.
Benefits
- Medical - 100% of employee premiums for selected individual plans
- Dental - 100% of employee premiums covered
- Vision - 100% of employee premiums covered
- LinkedIn Learning
- 401(k) plus matching (US Based Only)
- Unlimited PTO
- Calm subscription
- Annual Company Retreat
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Ruby on RailsCloud ComputingInfrastructure as CodeTerraformAnsibleSQLPostgreSQLMonitoring SolutionsKubernetesIncident Management
Soft Skills
LeadershipMentorshipCollaborationCommunicationProactive Problem-Solving