Releases: tinygrad/tinygrad
Releases · tinygrad/tinygrad
tinygrad 0.12.0
New Years and Post CES Release
Over 1800 commits since 0.11.0.
At 19075 lines.
Release Highlights
- rangeify [PRs]
- lots of
VIZ=1changes- AMD SQTT & PMC in viz
- AMD wave visualization
- block based disassembly viewer
- NIR / NAK support
- MI300 / MI350 support in AM
See the full changelog: v0.11.0...v0.12.0
Join the Discord!
tinygrad 0.11.0
ONNX support, NV runtime without kernel, and more!
Over 1000 commits since 0.10.3.
At 16671 lines.
Release Highlights
- ONNX support merged into master, no longer need to import from extra to use ONNX [#11675] [PRs]
- lots of runtime changes
- Multi host support over IB [#9746]
- Muon optimizer [#11414]
- Lots of changes in
VIZ=1[PRs]
See the full changelog: v0.10.3...v0.11.0
Join the Discord!
tinygrad 0.10.3
Lots of runtime changes.
Over 800 commits since 0.10.2.
At 12990 lines.
Release Highlights
- support for an RDNA3/RDNA4 GPU attached over USB3 to a ASM2464PD controller [#8766] [PRs]
- more
AMD/AMruntime improvements AMD_LLVM=1to no longer requirecomgrand use LLVM directly [#9543] [PRs]- torch frontend support [#9191] [PRs]
CLOUDwas renamed toREMOTEwith perf improvements [#10166] [#9876] [#10235] [PRs]
See the full changelog: v0.10.2...v0.10.3
Join the Discord!
tinygrad 0.10.2
Minor fixes.
At 11263 lines.
Release Highlights
- CLANG was renamed to CPU
- VIZ should work on release
- Refactors of rewriter
- KERNEL UOp
- Switch WebGPU to Dawn backend, matching Chrome [#8646]
See the full changelog: v0.10.1...v0.10.2
Join the Discord!
tinygrad 0.10.1
LazyBuffers are gone!
At 10941 lines.
Release Highlights
- No LazyBuffer, just immutable UOp + Tensor
- New multi and gradient using graph_rewrite
- Many scheduler upgrades, try VIZ=1
- AM driver for a fully AMD free experience!
- llvmlite no longer a dependency
- DSP simulator
See the full changelog: v0.10.0...v0.10.1
Join the Discord!
tinygrad 0.10.0
A significant under the hood update.
Over 1200 commits since 0.9.2.
At 9937 lines.
Release Highlights
VIZ=1to show how rewrites are happening, try it- 0 python dependencies!
- 3 new backends
- More Tensor Cores
- Core refactors
- Removal of symbolic, it's just UOp rewrite now
- Many refactors with EXPAND, VECTORIZE, and INDEX
- Progress toward the replacement of
LazyBufferwithUOp
See the full changelog: v0.9.2...v0.10.0
See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc
Join the Discord!
tinygrad 0.9.2
Small changes.
Over 700 commits since 0.9.1.
Release Highlights
- experimental Monte Carlo Tree Search when
BEAM>=100[#5598] TRANSCENDENTAL>=2or by default onCLANGandLLVMto providesin,log2, andexp2approximations. [#5187]- when running with
DEBUG>=2you now see the tensor ops that are part of a kernel [#5271]

PROFILE=1for a profiler when using HCQ backends (AMD,NV)

- Refactor
LinearizertoLowerer[#4957]
See the full changelog: v0.9.1...v0.9.2
See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc
Join the Discord!
tinygrad 0.9.1
Now sitting at 7844 lines, less than last release.
Looking to tag releases more often.
Over 320 commits since 0.9.0.
Release Highlights
- Removal of the HSA backend, defaulting to AMD. [#4885]
- tinychat, a pretty simple llm web ui. [#4869]
- SDXL example. [#5206]
- A small tqdm replacement. [#4846]
- NV/AMD profiler using perfetto. [#4718]
Known Issues
- Using tinygrad in a conda env on macOS is known to cause problems with the
METALbackend. See #2226.
See the full changelog: v0.9.0...v0.9.1
See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc
Join the Discord!
tinygrad 0.9.0
Close to the new line limit of 8000 lines, sitting at 7958 lines.
tinygrad is much more usable now.
Just over 1200 commits since 0.8.0.
Release Highlights
- New documentation: https://docs.tinygrad.org
gpuctypeshas been brought in tree and is no longer an external dependency. [#3253]AMD=1andNV=1experimental backends for not requiring any userspace runtime components like ROCm or CUDA.- These backends should reduce the amount of python time, and specifically with multi-gpu use cases.
PTX=1for rendering directly to ptx instead of cuda. [#3139] [#3623] [#3775]- Nvidia tensor core support. [#3544]
THREEFRY=1for numpy-less random number generation using threefry2x32. [#2601] [#3785]- More stabilized multi-tensor API.
- Core tinygrad has been refactored into 4 pieces, read more about it here.
- Linearizer and codegen has support for generating kernels with multiple outputs.
- Lots of progress towards greater kernel fusion in the scheduler.
- Fusing of ReduceOps with their elementwise children. This trains mnist and gpt2 with ~20% less kernels and makes llama inference faster.
- New LoadOps.ASSIGN allows fusing optimizer updates with grad.
- Schedule kernels in BFS order. This improves resnet and llama speed.
- W.I.P. for fusing multiple reduces: [#4259] [#4208]
- MLPerf ResNet and BERT with a W.I.P. UNet3D
- Llama 3 support with a new
llama3.pythat provides an OpenAI compatible API. [#4576] - NF4 quantization support in Llama examples. [#4540]
label_smoothinghas been added tosparse_categorical_crossentropy. [#3568]
Known Issues
- Using tinygrad in a conda env on macOS is known to cause problems with the
METALbackend. See #2226.
See the full changelog: v0.8.0...v0.9.0
See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc
Join the Discord!
tinygrad 0.8.0
Close to the new limit of 5000 lines at 4981.
Release Highlights
- Real dtype support within kernels!
- New
.schedule()API to separate concerns of scheduling and running - New lazy.py implementation doesn't reorder at build time.
GRAPH=1is usable to debug issues - 95 TFLOP FP16->FP32 matmuls on 7900XTX
- GPT2 runs (jitted) in 2 ms on NVIDIA 3090
- Powerful and fast kernel beam search with
BEAM=2 - GPU/CUDA/HIP backends switched to
gpuctypes - New (alpha) multigpu sharding API with
.shard