
Senior Research Engineer / Research Scientist
METR
full-time
Posted on:
Location Type: Hybrid
Location: Berkeley • California • 🇺🇸 United States
Visit company websiteSalary
💰 $250,000 - $450,000 per year
Job Level
Senior
Tech Stack
Python
About the role
- We're seeking a researcher to help us better understand AI capabilities.
- Previous work in this vein includes agent time horizons, a commonly-used metric for measuring AI progress, and RCTs on open-source developer productivity.
- Lead a project investigating transcripts as a source of evidence about agent capabilities.
- Improve METR's time-horizon metric to make it more externally valid, more interpretable, and more predictive on threat-model relevant capabilities.
- Design and build experiments testing agent capabilities in the wild.
- Lead large-scale human-subjects experiments measuring the impacts of AI agents on economically-valuable R&D.
Requirements
- You can write code. At the very least, you should be able to quickly write a write a data analysis script in Python to answer an important question. Bonus points if you can write a clean PR too.
- You're excited to get your hands dirty. METR researchers often interact with LLMs in a wide variety of scenarios, read lots of agent transcripts, and closely review human outputs (e.g. video recordings of developers in our productivity RCT).
- You are undaunted by open-ended mandates. You can take a confusing or ill-posed question and produce insightful and helpful frameworks/proposals/results.
- You should be able to read, understand, and critique a research proposal. You're able to understand how particular projects fit into METR's overall mission.
- You're a good written communicator. Bonus points if you can write a great paper.
- You work fast and are highly reliable.
Benefits
- $21k referral bonus
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Pythondata analysisexperiment designstatistical analysisprogrammingRCTsagent capabilitiestranscript analysishuman-subjects experimentsclean PR
Soft skills
written communicationproblem-solvingframework developmentreliabilityadaptabilitycritical thinkinginsight generationproject leadershipinterpersonal skillstime management