Releases · tinygrad/tinygrad

New Years and Post CES Release
Over 1800 commits since 0.11.0.
At 19075 lines.

Release Highlights

rangeify [PRs]
lots of VIZ=1 changes
- AMD SQTT & PMC in viz
- AMD wave visualization
- block based disassembly viewer
NIR / NAK support
MI300 / MI350 support in AM

See the full changelog: `v0.11.0...v0.12.0`

Join the Discord!

ONNX support, NV runtime without kernel, and more!
Over 1000 commits since 0.10.3.
At 16671 lines.

Release Highlights

ONNX support merged into master, no longer need to import from extra to use ONNX [#11675] [PRs]
lots of runtime changes
- MI350 support in AMD runtime [#11148]
- blackwell support in NV [#10487]
- userspace driver for NV [#10521]
Multi host support over IB [#9746]
Muon optimizer [#11414]
Lots of changes in VIZ=1 [PRs]
- built-in profiler [#10763]
- memory visualizer [#10960]

See the full changelog: `v0.10.3...v0.11.0`

Join the Discord!

Lots of runtime changes.
Over 800 commits since 0.10.2.
At 12990 lines.

Release Highlights

support for an RDNA3/RDNA4 GPU attached over USB3 to a ASM2464PD controller [#8766] [PRs]
more AMD/AM runtime improvements
- MI300X support [#9585]
- RDNA 3.5 support [#9627]
- RDNA 4 support [#9455]
AMD_LLVM=1 to no longer require comgr and use LLVM directly [#9543] [PRs]
torch frontend support [#9191] [PRs]
CLOUD was renamed to REMOTE with perf improvements [#10166] [#9876] [#10235] [PRs]

See the full changelog: `v0.10.2...v0.10.3`

Join the Discord!

Minor fixes.
At 11263 lines.

Release Highlights

CLANG was renamed to CPU
VIZ should work on release
Refactors of rewriter
KERNEL UOp
Switch WebGPU to Dawn backend, matching Chrome [#8646]

See the full changelog: `v0.10.1...v0.10.2`

Join the Discord!

LazyBuffers are gone!
At 10941 lines.

Release Highlights

No LazyBuffer, just immutable UOp + Tensor
New multi and gradient using graph_rewrite
Many scheduler upgrades, try VIZ=1
AM driver for a fully AMD free experience!
llvmlite no longer a dependency
DSP simulator

See the full changelog: `v0.10.0...v0.10.1`

Join the Discord!

A significant under the hood update.
Over 1200 commits since 0.9.2.
At 9937 lines.

Release Highlights

VIZ=1 to show how rewrites are happening, try it
0 python dependencies!
- Switch from numpy random to threefry, removing numpy [#6116]
- Switch from pyobjc to ctypes for metal, removing pyobjc [#6545]
3 new backends
- QCOM=1 HCQ backend for runtime speed on Adreno 630 [#5213]
- CLOUD=1 for remote tinygrad [#6964]
- DSP=1 backend on Qualcomm devices (alpha) [#6112]
More Tensor Cores
- Apple AMX support [#5693]
- Intel XMX tensor core support [#5622]
Core refactors
- Removal of symbolic, it's just UOp rewrite now
- Many refactors with EXPAND, VECTORIZE, and INDEX
- Progress toward the replacement of LazyBuffer with UOp

See the full changelog: `v0.9.2...v0.10.0`

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Small changes.
Over 700 commits since 0.9.1.

Release Highlights

experimental Monte Carlo Tree Search when BEAM>=100 [#5598]
TRANSCENDENTAL>=2 or by default on CLANG and LLVM to provide sin, log2, and exp2 approximations. [#5187]
when running with DEBUG>=2 you now see the tensor ops that are part of a kernel [#5271]
PROFILE=1 for a profiler when using HCQ backends (AMD, NV)
Refactor Linearizer to Lowerer [#4957]

See the full changelog: `v0.9.1...v0.9.2`

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Now sitting at 7844 lines, less than last release.
Looking to tag releases more often.

Over 320 commits since 0.9.0.

Release Highlights

Removal of the HSA backend, defaulting to AMD. [#4885]
tinychat, a pretty simple llm web ui. [#4869]
SDXL example. [#5206]
A small tqdm replacement. [#4846]
NV/AMD profiler using perfetto. [#4718]

Known Issues

Using tinygrad in a conda env on macOS is known to cause problems with the METAL backend. See #2226.

See the full changelog: `v0.9.0...v0.9.1`

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Close to the new line limit of 8000 lines, sitting at 7958 lines.
tinygrad is much more usable now.

Just over 1200 commits since 0.8.0.

Release Highlights

New documentation: https://docs.tinygrad.org
gpuctypes has been brought in tree and is no longer an external dependency. [#3253]
AMD=1 and NV=1 experimental backends for not requiring any userspace runtime components like ROCm or CUDA.
- These backends should reduce the amount of python time, and specifically with multi-gpu use cases.
PTX=1 for rendering directly to ptx instead of cuda. [#3139] [#3623] [#3775]
Nvidia tensor core support. [#3544]
THREEFRY=1 for numpy-less random number generation using threefry2x32. [#2601] [#3785]
More stabilized multi-tensor API.
- With ring all-reduce: [#3000] [#3852]
Core tinygrad has been refactored into 4 pieces, read more about it here.
Linearizer and codegen has support for generating kernels with multiple outputs.
Lots of progress towards greater kernel fusion in the scheduler.
- Fusing of ReduceOps with their elementwise children. This trains mnist and gpt2 with ~20% less kernels and makes llama inference faster.
- New LoadOps.ASSIGN allows fusing optimizer updates with grad.
- Schedule kernels in BFS order. This improves resnet and llama speed.
- W.I.P. for fusing multiple reduces: [#4259] [#4208]
MLPerf ResNet and BERT with a W.I.P. UNet3D
Llama 3 support with a new llama3.py that provides an OpenAI compatible API. [#4576]
NF4 quantization support in Llama examples. [#4540]
label_smoothing has been added to sparse_categorical_crossentropy. [#3568]

Known Issues

Using tinygrad in a conda env on macOS is known to cause problems with the METAL backend. See #2226.

See the full changelog: `v0.8.0...v0.9.0`

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Close to the new limit of 5000 lines at 4981.

Release Highlights

Real dtype support within kernels!
New .schedule() API to separate concerns of scheduling and running
New lazy.py implementation doesn't reorder at build time. GRAPH=1 is usable to debug issues
95 TFLOP FP16->FP32 matmuls on 7900XTX
GPT2 runs (jitted) in 2 ms on NVIDIA 3090
Powerful and fast kernel beam search with BEAM=2
GPU/CUDA/HIP backends switched to gpuctypes
New (alpha) multigpu sharding API with .shard

Releases: tinygrad/tinygrad

tinygrad 0.12.0

Release Highlights

See the full changelog: v0.11.0...v0.12.0

Join the Discord!

Uh oh!

tinygrad 0.11.0

Release Highlights

See the full changelog: v0.10.3...v0.11.0

Join the Discord!

Uh oh!

tinygrad 0.10.3

Release Highlights

See the full changelog: v0.10.2...v0.10.3

Join the Discord!

Uh oh!

tinygrad 0.10.2

Release Highlights

See the full changelog: v0.10.1...v0.10.2

Join the Discord!

Uh oh!

tinygrad 0.10.1

Release Highlights

See the full changelog: v0.10.0...v0.10.1

Join the Discord!

Uh oh!

tinygrad 0.10.0

Release Highlights

See the full changelog: v0.9.2...v0.10.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Uh oh!

tinygrad 0.9.2

Release Highlights

See the full changelog: v0.9.1...v0.9.2

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Uh oh!

tinygrad 0.9.1

Release Highlights

Known Issues

See the full changelog: v0.9.0...v0.9.1

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Uh oh!

tinygrad 0.9.0

Release Highlights

Known Issues

See the full changelog: v0.8.0...v0.9.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Uh oh!

tinygrad 0.8.0

Release Highlights

See the full changelog: v0.7.0...v0.8.0

Join the Discord!

Uh oh!

See the full changelog: `v0.11.0...v0.12.0`

See the full changelog: `v0.10.3...v0.11.0`

See the full changelog: `v0.10.2...v0.10.3`

See the full changelog: `v0.10.1...v0.10.2`

See the full changelog: `v0.10.0...v0.10.1`

See the full changelog: `v0.9.2...v0.10.0`

See the full changelog: `v0.9.1...v0.9.2`

See the full changelog: `v0.9.0...v0.9.1`

See the full changelog: `v0.8.0...v0.9.0`

See the full changelog: `v0.7.0...v0.8.0`