Skip to content

Releases: tinygrad/tinygrad

tinygrad 0.12.0

12 Jan 17:04
6b0a9f5

Choose a tag to compare

New Years and Post CES Release
Over 1800 commits since 0.11.0.
At 19075 lines.

Release Highlights

  • rangeify [PRs]
  • lots of VIZ=1 changes
    • AMD SQTT & PMC in viz
    • AMD wave visualization
    • block based disassembly viewer
  • NIR / NAK support
  • MI300 / MI350 support in AM

See the full changelog: v0.11.0...v0.12.0

Join the Discord!

tinygrad 0.11.0

19 Aug 21:11
bcc7623

Choose a tag to compare

ONNX support, NV runtime without kernel, and more!
Over 1000 commits since 0.10.3.
At 16671 lines.

Release Highlights

  • ONNX support merged into master, no longer need to import from extra to use ONNX [#11675] [PRs]
  • lots of runtime changes
    • MI350 support in AMD runtime [#11148]
    • blackwell support in NV [#10487]
    • userspace driver for NV [#10521]
  • Multi host support over IB [#9746]
  • Muon optimizer [#11414]
  • Lots of changes in VIZ=1 [PRs]

See the full changelog: v0.10.3...v0.11.0

Join the Discord!

tinygrad 0.10.3

14 May 22:45
9b14e8c

Choose a tag to compare

Lots of runtime changes.
Over 800 commits since 0.10.2.
At 12990 lines.

Release Highlights

  • support for an RDNA3/RDNA4 GPU attached over USB3 to a ASM2464PD controller [#8766] [PRs]
  • more AMD/AM runtime improvements
  • AMD_LLVM=1 to no longer require comgr and use LLVM directly [#9543] [PRs]
  • torch frontend support [#9191] [PRs]
  • CLOUD was renamed to REMOTE with perf improvements [#10166] [#9876] [#10235] [PRs]

See the full changelog: v0.10.2...v0.10.3

Join the Discord!

tinygrad 0.10.2

21 Feb 03:09

Choose a tag to compare

Minor fixes.
At 11263 lines.

Release Highlights

  • CLANG was renamed to CPU
  • VIZ should work on release
  • Refactors of rewriter
  • KERNEL UOp
  • Switch WebGPU to Dawn backend, matching Chrome [#8646]

See the full changelog: v0.10.1...v0.10.2

Join the Discord!

tinygrad 0.10.1

05 Feb 03:26

Choose a tag to compare

LazyBuffers are gone!
At 10941 lines.

Release Highlights

  • No LazyBuffer, just immutable UOp + Tensor
  • New multi and gradient using graph_rewrite
  • Many scheduler upgrades, try VIZ=1
  • AM driver for a fully AMD free experience!
  • llvmlite no longer a dependency
  • DSP simulator

See the full changelog: v0.10.0...v0.10.1

Join the Discord!

tinygrad 0.10.0

19 Nov 00:48

Choose a tag to compare

A significant under the hood update.
Over 1200 commits since 0.9.2.
At 9937 lines.

Release Highlights

  • VIZ=1 to show how rewrites are happening, try it
  • 0 python dependencies!
    • Switch from numpy random to threefry, removing numpy [#6116]
    • Switch from pyobjc to ctypes for metal, removing pyobjc [#6545]
  • 3 new backends
    • QCOM=1 HCQ backend for runtime speed on Adreno 630 [#5213]
    • CLOUD=1 for remote tinygrad [#6964]
    • DSP=1 backend on Qualcomm devices (alpha) [#6112]
  • More Tensor Cores
    • Apple AMX support [#5693]
    • Intel XMX tensor core support [#5622]
  • Core refactors
    • Removal of symbolic, it's just UOp rewrite now
    • Many refactors with EXPAND, VECTORIZE, and INDEX
    • Progress toward the replacement of LazyBuffer with UOp

See the full changelog: v0.9.2...v0.10.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

tinygrad 0.9.2

13 Aug 23:19
518c022

Choose a tag to compare

Small changes.
Over 700 commits since 0.9.1.

Release Highlights

  • experimental Monte Carlo Tree Search when BEAM>=100 [#5598]
  • TRANSCENDENTAL>=2 or by default on CLANG and LLVM to provide sin, log2, and exp2 approximations. [#5187]
  • when running with DEBUG>=2 you now see the tensor ops that are part of a kernel [#5271]
    image
  • PROFILE=1 for a profiler when using HCQ backends (AMD, NV)
    image
  • Refactor Linearizer to Lowerer [#4957]

See the full changelog: v0.9.1...v0.9.2

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

tinygrad 0.9.1

29 Jun 03:16
7bcb74a

Choose a tag to compare

Now sitting at 7844 lines, less than last release.
Looking to tag releases more often.

Over 320 commits since 0.9.0.

Release Highlights

Known Issues

  • Using tinygrad in a conda env on macOS is known to cause problems with the METAL backend. See #2226.

See the full changelog: v0.9.0...v0.9.1

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

tinygrad 0.9.0

28 May 18:48
6fcf220

Choose a tag to compare

Close to the new line limit of 8000 lines, sitting at 7958 lines.
tinygrad is much more usable now.

Just over 1200 commits since 0.8.0.

Release Highlights

  • New documentation: https://docs.tinygrad.org
  • gpuctypes has been brought in tree and is no longer an external dependency. [#3253]
  • AMD=1 and NV=1 experimental backends for not requiring any userspace runtime components like ROCm or CUDA.
    • These backends should reduce the amount of python time, and specifically with multi-gpu use cases.
  • PTX=1 for rendering directly to ptx instead of cuda. [#3139] [#3623] [#3775]
  • Nvidia tensor core support. [#3544]
  • THREEFRY=1 for numpy-less random number generation using threefry2x32. [#2601] [#3785]
  • More stabilized multi-tensor API.
  • Core tinygrad has been refactored into 4 pieces, read more about it here.
  • Linearizer and codegen has support for generating kernels with multiple outputs.
  • Lots of progress towards greater kernel fusion in the scheduler.
    • Fusing of ReduceOps with their elementwise children. This trains mnist and gpt2 with ~20% less kernels and makes llama inference faster.
    • New LoadOps.ASSIGN allows fusing optimizer updates with grad.
    • Schedule kernels in BFS order. This improves resnet and llama speed.
    • W.I.P. for fusing multiple reduces: [#4259] [#4208]
  • MLPerf ResNet and BERT with a W.I.P. UNet3D
  • Llama 3 support with a new llama3.py that provides an OpenAI compatible API. [#4576]
  • NF4 quantization support in Llama examples. [#4540]
  • label_smoothing has been added to sparse_categorical_crossentropy. [#3568]

Known Issues

  • Using tinygrad in a conda env on macOS is known to cause problems with the METAL backend. See #2226.

See the full changelog: v0.8.0...v0.9.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

tinygrad 0.8.0

09 Jan 18:16
2c6f2e8

Choose a tag to compare

Close to the new limit of 5000 lines at 4981.

Release Highlights

  • Real dtype support within kernels!
  • New .schedule() API to separate concerns of scheduling and running
  • New lazy.py implementation doesn't reorder at build time. GRAPH=1 is usable to debug issues
  • 95 TFLOP FP16->FP32 matmuls on 7900XTX
  • GPT2 runs (jitted) in 2 ms on NVIDIA 3090
  • Powerful and fast kernel beam search with BEAM=2
  • GPU/CUDA/HIP backends switched to gpuctypes
  • New (alpha) multigpu sharding API with .shard

See the full changelog: v0.7.0...v0.8.0

Join the Discord!