AutoPerf

AutoPerf — Autonomous code optimization with reproducible cloud benchmarks

Autonomous code optimization, backed by real benchmarks

AutoPerf uses LLM pipelines (including RAG) to generate, compile, and benchmark code variants on Google Cloud. CPU support ships first; NVIDIA GPUs and CUDA are next.

Built for HPC simulation, quantitative trading, ML inference, and kernel engineering teams that need measurable speedups.

⟳

Reproducible cloud benchmarks

Pinned Google Cloud instances ensure consistent CPU characteristics for every run.

⚙️

Python + C/C++ support

Start by optimizing numeric kernels, trading models, and scientific loops. CUDA is next.

🧭

LLM-guided exploration

RAG pipelines analyze your codebase, propose candidates, and focus on high-impact tweaks.

📈

Kernel-level insights

Track vectorization, memory access, and scheduling suggestions with benchmark deltas.

🔒

Safe execution

Container isolation, resource limits, and audit trails keep private code under control.

🔁

Auto GitHub PRs

Review diffs, metrics, and commentary directly inside an automatically opened pull request.

How it works

Step 1

Profile

Instrument your workload to surface hot kernels, loops, and call stacks.

Step 2

Generate

Feed LLM + RAG pipelines with repo context to draft high-impact code variants.

Step 3

Compile

Build candidates in sandboxed containers with pinned toolchains and dependencies.

Step 4

Benchmark

Run reproducible trials on dedicated GCP runners to gather real performance deltas.

Step 5

Promote

Ship the winning change via an auto-generated GitHub PR complete with diffs and metrics.

Roadmap

Private alpha

CPU runners ship first. Join the waitlist to reserve a slot and share your workloads.

NVIDIA GPU support

A10/A100-class GPUs follow, enabling CUDA kernel tuning for ML and HPC pipelines.

Extended workflows

Broader language coverage, hybrid CPU/GPU strategies, and deeper profiling integrations.

Built for teams shipping performance-critical software

Whether you run HPC simulations, low-latency trading systems, algorithmic engines, or ML inference, AutoPerf keeps the optimization loop automated while you stay in control of the merge.

• HPC & scientific simulation teams chasing runtime gains.
• Fintech and quant orgs optimising latency-critical code paths.
• ML/AI engineers tuning serving kernels and feature pipelines.

FAQ

Is CPU support live?

CPU runners will ship first in the private alpha. Join the waitlist to be notified as soon as slots open.

Which languages are supported?

We start with Python and C/C++. CUDA and broader GPU workflows are on the near-term roadmap.

How do I review changes safely?

AutoPerf opens a GitHub pull request with diffs, benchmark data, and suggested notes so you stay in full control.

Where do benchmarks run?

Runs execute on pinned Google Cloud machine types to keep results reproducible across sessions.

Reserve your spot

Join the waitlist and be the first to try AutoPerf when the private alpha opens.