Tech Stack
KerasOpen SourcePySparkPythonPyTorchReactScalaSDLCSparkSQLTableauTensorflow
About the role
- Design, develop, and deploy AI solutions to automate expense and invoice management processes and solve complex optimization problems
- Build and train machine learning models, develop algorithms, and integrate AI systems into existing software and infrastructure
- Translate product design requirements into pipeline of heuristic and stochastic algorithms
- Perform exploratory data analysis for product feasibility studies and ground truth testing
- Execute SQL queries and/or python scripts to manipulate, analyze and visualize data
- Implement explainable AI solutions and rationalize model inferences
- Follow SDLC processes, adopt agile-based processes/meetings and peer code-reviews
- Work with machine learning engineer/architect to deploy data products into production
- Follow and understand legal data use restrictions
- Contribute to algorithm library development and design for ML, NLP and XAI
- Deliver product pipelines for deployment to production
- Build applications that integrate third party and self-hosted foundation models
- Fine tune open source foundation models (LLMs, VLMs) with proprietary data
- Develop autonomous AI inference and tool use orchestration using ReAct AI agents
- Provide root cause analysis for machine learning model inference
- Complete data analysis or processing tasks as directed and document data product end to end design and development
- Perform data annotation, labeling and other related data generation activities
- Provide thought leadership, present data product updates and trainings, mentor and lead others as project lead
Requirements
- BS in Statistics, Mathematics, Computer Science or another quantitative field (Graduate degree preferred)
- At least 6 years experience manipulating data sets and building GLM/regression models, ensemble decision trees and neural networks
- Demonstrated hands-on experience with foundation models, GenAI and agents (1-2 years ideal)
- 6+ years of experience developing data science products
- Strong experience using and optimizing common python machine and deep learning libraries such as Scikit learn, PyTorch, TensorFlow, Keras, MXNet and Spark MLlib
- Experience using statistical computer languages (Python, R, Scala, SQL, etc.) to manipulate data and draw insights from large data sets
- Hands-on generative AI development Experience using foundation models (LLMs, VLMs)
- Experience with model fine tuning of open source foundation models with proprietary data
- Experience leveraging AI metrics for monitoring and value tracking
- Knowledge of AI Agent frameworks with recent hands-on experience building an AI agent able to autonomously use data stores, tools and other AI models to solve inquiries
- Deep knowledge of data science concepts and related product development lifecycle
- Working knowledge of machine learning tuning optimization procedures
- Experience working with and creating data architectures
- Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.)
- Excellent written and verbal communication skills for coordinating across teams
- A drive to learn and master new technologies and techniques
- Preferred: Experience with big data analytical frameworks such as Spark/PySpark
- Preferred: Experience analyzing data from 3rd party providers: Google Knowledge Graph, Wikidata, etc.
- Preferred: Experience visualizing/presenting data for stakeholders using: Looker, PowerBI, Tableau, etc.