Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic

Hanif Leoputera

San Francisco,CA

Summary

Experienced research engineer deploying and accelerating multimodal LLMs end-to-end. I stand up HPC (Surm/Kubernetes) clusters, push GPU MFUs with FSDP/quantization/custom CUDA kernels, and ship ultra-low-latency model serving (Triton Inference/Ray Serve, KV-cache, paged attention, Triton Kernels). Patented inventor who owns SLOs, throughput, and $/token.

Overview

7
7
years of professional experience

Work History

Research Engineer

Stealth
05.2024 - Current

Seed-stage startup backed by Khosla Ventures, Greylock, and Asimov Ventures. I build, scale, and ship multimodal LLMs end-to-end—from data to training to ultra-low-latency serving.

  • Built and operate an in-house HPC cluster (Slurm on B200s, InfiniBand, high-performance FS), cutting experimentation & hyperparameter sweeps from weeks to days and sustaining 50%+ MFU on real workloads.
  • Productionized the training stack with quantization (MXFP8), fused/tuned kernels, custom flex-attention, and robust checkpointing—improving throughput by 500%.
  • Pioneered low-data multimodal generation for a confidential “X” modality, adapting CV/audio techniques (diffusion/flow-matching, VQ/codec, multimodal LLMs, self-supervised pretraining) to reach zero-shot performance with only 750 hours of labeled “X”.
  • Prototyped and shipped a full-duplex X→speech assistant (streaming ASR/LLM/TTS, semantic VAD) with barge-in & overlap handling, enabling real-time, interruptible conversations —delivering sub-300 ms TTFT and for natural turn-taking.
  • Shipped an ultra-fast inference stack (Triton/Ray Serve/WebSockets, KV-cache, speculative decoding, paged attention, autoscaling on Kubernetes) with TTFT p99 ≤ [150 ms], and [560 tok/s/GPU] throughput, for our dual transcription and voice synthesis multimodal LLM.
  • Built reliable data pipelines ingesting hundreds of hours of large recordings data/day with validation, weak-supervision labeling (multimodal validations), cutting bad-data incidents by 30%.
  • Wrote training & deployment infra as code (reproducible configs, CI/CD, observability with Grafana), shrinking time-to-first-result by and onboarding time to just a few minutes.
  • Patented inventor in multimodal generation, HW designs and "X" modality (5 filed).
  • Drove delivery: owned roadmaps/SLOs (99% availability), led post-mortems, mentored 4 engineers, and maintained a healthy on-call.

Tech: PyTorch, CUDA, Triton, FlashAttention, Slurm, Ray, Kubernetes, Terraform/Ansible, NVMe/IB, S3/GCS, Triton/Ray Serve, gRPC/WebSockets, Prometheus/Grafana

Senior Machine Learning Engineer

Affirm
06.2021 - 05.2024
  • Invented and shipped a patented Pricing Optimization & Multi-Armed Bandits platform (US20240104645A1) powering real-time, guardrailed pricing at checkout; increased contribution margin 100 bps, beat A/B hit-rate, and cut time-to-learn 10x (more than 1 month -> less than 1 week for 95% confidence)
  • Built a Return-on-Assets (ROA) underwriting system with real-time risk services and feature store; improved approvals at constant loss by 30%.
  • Created a self-serve pricing experimentation platform (online/offline eval + contextual bandit) enabling per-merchant/segment personalization; shrank rollout cycles from weeks → hours.
  • Led end-to-end platform delivery: streaming + batch pipelines, CI/CD, observability, and compliance/auditability; maintained [99.9–99.95%] availability and clear post-mortems/runbooks.


Tech: Kubernetes, Docker, PyTorch, XGBoost, PySpark, Flink, Airflow, gRPC, React/Redux
Skills: ETL · Bandits/RL · Experimentation Platforms · Pricing Science · Distributed Systems

Machine Learning Intern

Electronic Arts
06.2020 - 09.2020
  • Enabled natural-language → charts in a Lex-powered chatbot by orchestrating Amazon Lex + AWS Lambda + Snowflake; users can ask, “show sales by region” and get auto-generated visualizations—no manual SQL.
  • Trained domain-tuned GPT-2 and T5 paraphrasers to boost phrasing coverage and intent robustness in Lex; T5 delivered +10.5 BLEU and ~10% higher variety (ROUGE-L proxy) over GPT-2, improving suggestion quality and recall.
  • Delivered an end-to-end demand forecasting service with Amazon Forecast, orchestrated by AWS Step Functions, surfaced via a Node.js + React UI (instrumented with Amplitude); improved accuracy from 45% → 78% (MAPE-based KPI), with automated retraining and deployments.
  • Built a promotion “what-if” simulator by estimating price elasticity with EconML’s ORF; enabled scenario planning, guardrailed discounting, and uplift/margin trade-off analysis.
  • Developed a contract metadata extraction service using custom spaCy NER (entities: parties, effective dates, renewal terms, amounts, obligations); shipped as an API feeding a searchable datastore for compliance and workflow triggers.

Tech: Amazon Lex, AWS Lambda, Snowflake, Amazon Forecast, AWS Step Functions, Node.js, React, Amplitude, GPT-2, T5, spaCy, EconML (ORF), gRPC
Skills: Conversational AI · Forecasting/MLOps · Causal/Elasticity Modeling · NER/IE · Data Apps & ETL

Data Intern

Traveloka
06.2018 - 09.2018
  • Automated data-quality gates in the ingestion pipeline (PySpark + Airflow) in 6 weeks, cutting bad records 42% and shrinking incident MTTR from ~3h → 35m; anomaly time-to-detect from ~24h → .
  • Tuned ads models via CTR analysis (Matplotlib/Seaborn/Tableau), improving targeting +11% CTR and +3.5% revenue on a core channel; reduced hyper-param search 2 weeks → 3 days with scripted sweeps.
  • Implemented a PySpark schema-tracking wrapper for migrations in 4 weeks, eliminating ~90% of env-mismatch failures and cutting rollout time 5 days → 1 day.
  • Delivered real-time AWS log analytics (S3 → SQS → PySpark) in 8 weeks with , sustaining ~20k events/s (peaks ~50k/s); dashboard added schema-change alerts (p50 90s) plus velocity/frequency/query-mix monitoring.

Education

Bachelor of Science - Mathematics of Computation

University of California Los Angeles
Los Angeles, CA
06-2021

Skills

  • Orchestration: Kubernetes; Terraform/Ansible
  • Data/MLOps: PySpark, Airflow, Flink; CI/CD
  • Storage/I/O: NVMe RAID, parallel FS, S3/GCS
  • Multimodal LLMs (train/eval/serve)
  • Audio synthesis & generation (TTS/vocoders)
  • GPU performance engineering
  • High-throughput inference: Triton, Ray Serve, TensorRT-LLM
  • CUDA/Triton kernels; FlashAttention

Accomplishments

Patent: US 20240104645A1 (Pricing Optimization & Multi-Armed Bandits); additional patents pending.


Open-source contributions: NVIDIA NeMo, TorchTitan, TorchAO, Liger-Kernel, and Hugging Face.

Timeline

Research Engineer

Stealth
05.2024 - Current

Senior Machine Learning Engineer

Affirm
06.2021 - 05.2024

Machine Learning Intern

Electronic Arts
06.2020 - 09.2020

Data Intern

Traveloka
06.2018 - 09.2018

Bachelor of Science - Mathematics of Computation

University of California Los Angeles
Hanif Leoputera