日本語版

AI Precision Playground

FP4 / FP8 / BF16 / FP16 / TF32 / FP32 / FP64 / MPFR-style high-precision demo

An educational page for experiencing the low precision, mixed-precision accumulation, scaling, and comparison against high-precision reference values used in AI.

Scenario is freely editable in demo_scenario_en.txt

Function quantization demo

What to observe
Watch where each quantized format departs from the reference (thick line), and use the absolute-error graph (log axis) below to see which format has the larger error. Switch scaling to per-block to compare how the error near outliers decreases.

In this single-file version, the reference is computed with JavaScript double precision. For true MPFR table generation, use the "Legacy MPFR graph" tab.

Table of representative values

Softmax stabilization demo

What to observe
With large inputs around 1000 (as in the defaults), the naive version tends to break as exp overflows, while the max-shift version stays stable. Drop the input format to FP8 etc. and check how much the probabilities break in the |error| column of the table.

Softmax algorithm

Converts the input vector \(\mathbf{x}=(x_1,\ldots,x_n)\) into a probability distribution.

\[ y_i = \frac{\exp(x_i)}{\sum_{j=1}^{n}\exp(x_j)} \]

When \(x_i\) is large, \(\exp(x_i)\) overflows, so implementations subtract the maximum \(m=\max_j x_j\).

\[ y_i = \frac{\exp(x_i-m)}{\sum_{j=1}^{n}\exp(x_j-m)} \]

This transformation is mathematically identical but far more stable numerically.

Dot product & accumulation demo

What to observe
Keeping the accumulator at low precision collapses the partial sums midway, while an FP32 accumulator follows the reference — see the partial-sum graph. Setting the distribution to "Uniform + outliers" emphasizes the weakness of low-precision inputs.

Dot product & accumulation

The dot product is a basic operation in matrix multiplication and neural networks.

\[ s = \sum_{i=1}^{n} x_i y_i \]

On AI accelerators, the inputs \(x_i,y_i\) are often kept in low precision such as FP8 or FP16, while the accumulated sum \(s\) is held in a wider format such as FP32.

\[ s_k = \operatorname{round}_{\mathrm{acc}}\left(s_{k-1} + \operatorname{round}_{\mathrm{mul}}(x_k y_k)\right) \]

The goal is to reduce memory bandwidth and compute with low-precision inputs while suppressing accumulated rounding error with a high-precision accumulator.

Format visualization

Plots the positive values representable in small floating-point formats on a number line. Useful for explaining the trade-off between range and significant digits.

What to observe
Notice how the spacing between points (the "gap to next value" column) widens toward the right. As the exponent grows, representable values become sparser, so even at the same bit width larger values have coarser resolution.

Overview: Pursuing high-performance, high-precision computation for the AI era

This page introduces part of the work of the High Performance Computing Laboratory.

The keywords are AI, high-performance computing, and high-precision computing. AI centers on massive low-precision computation of roughly 4–16 bits, whereas scientific computing is dominated by high-precision computation of 64 bits or more. To bridge this gap, the High Performance Computing Laboratory pursues techniques to accelerate high-precision scientific computing on AI-oriented hardware.

Low-precision computation in the AI era

  • Uses formats of the 4–16 bit class, such as FP4, FP6, FP8, BF16, and FP16, in large numbers.
  • Dedicated accelerators and Tensor Core-style units are used to process matrix products, dot products, and convolutions quickly.
  • When lowering precision, scaling is important to suppress rounding error, underflow, and overflow.

High-precision computation in scientific computing

  • FP64 is usually the standard, with multi-word or arbitrary precision such as DD, TD, QD, and MPFR used as needed.
  • In numerical linear algebra, nonlinear equations, eigenvalue problems, ODEs and the like, insufficient precision directly affects the reliability of results.
  • A research focus is to harness the high compute performance of AI hardware for high-precision computation as well.

Basic structure of floating-point numbers

Floating-point numbers represent a real number as a sign, an exponent, and a mantissa. For normalized numbers the conceptual form is as follows.

\[ x = (-1)^S \times (1.M)_2 \times 2^{E-\mathrm{bias}} \]
S: Sign — 0 for positive, 1 for negative.
E: Exponent — Determines the range of representable values. More exponent bits make extremely large or small values easier to handle.
M: Mantissa — Determines the number of significant digits. More mantissa bits resolve nearby values more finely.
scale — In low-precision AI computation such as FP4/FP8, a per-tensor or per-block factor adjusts the representable range.

In low-precision formats, how the bits are split between exponent and mantissa matters. For example, FP8 E4M3 has more mantissa (precision-leaning), while FP8 E5M2 has more exponent (range-leaning).

Comparing floating-point format lengths

The figure below lines up the floating-point formats used on this page with lengths proportional to their actual bit widths, taking MPFR 128-bit as the maximum width. You can compare the lengths of FP4/FP8 against FP64/MPFR 128-bit at a glance, without horizontal scrolling.

Horizontal axis guide: 0, 32, 64, 96, 128 bit from the left. FP4 through TF32 are very short, so a minimum display width is reserved to keep the labels readable.
Sign S Exponent E (range) Mantissa M (precision) MPFR arbitrary precision scale factor
FP4 E2M1
S
E×2
M×1
4 bit
An extreme low-precision 4-bit format. With \(S=1\), \(E=2\), \(M=1\) it is coarse on its own, but combined with a scale it suits AI-inference quantization demos.
FP6 E3M2
S
E×3
M×2
6 bit
An intermediate example between FP4 and FP8. With \(E=3\), \(M=2\) you can see the compromise between range and precision.
FP8 E4M3
S
E×4
M×3
8 bit
The precision-oriented FP8. With \(E=4\), \(M=3\) it has one more mantissa bit than E5M2, suiting quantization comparisons of weights and activations.
FP8 E5M2
S
E×5
M×2
8 bit
The range-oriented FP8. With \(E=5\), \(M=2\) the mantissa is coarser, but it is handy for explaining values or gradients whose magnitudes vary widely.
BF16
S
E×8
M×7
16 bit
\(1+8+7=16\) bits. The exponent is 8 bits, the same as FP32, so the range is wide, while the 7-bit mantissa makes precision coarse.
FP16
S
E×5
M×10
16 bit
A \(1+5+10=16\) bit half precision. In AI it is used for inputs and weights, often with mixed-precision computation that accumulates in FP32.
TF32
S
E×8
M×10
≈19 bit
Conceptually equivalent to \(1+8+10\) bits. It keeps FP32’s exponent width but shortens the mantissa to speed up matrix multiplication.
FP32
S
E×8
M×23
32 bit
A \(1+8+23=32\) bit single precision. It is the standard reference format for AI computation and is also used as an accumulator for low-precision inputs.
FP64
S
E×11
M×52
64 bit
A \(1+11+52=64\) bit double precision. It is the standard format in scientific computing and serves as a baseline for comparing against low-precision AI computation.
MPFR 128
Arbitrary-precision mantissa 128 bit
128-bit setting
MPFR lets you set the mantissa precision freely. On this page, the legacy MPFR graph tab can generate high-precision reference values such as 128-bit.
With scale
Per-tensor or per-block scale factor
Auxiliary
For FP4/FP8 and similar, in addition to the low-precision value itself, a per-tensor or per-block factor \(\mathrm{scale}\) makes a practical range easier to handle even with limited bit width.

Bar length represents total bit width. Within each bar, red is the sign, blue the exponent, and green the mantissa. A longer exponent means a wider range; a longer mantissa resolves nearby values more finely. MPFR is shown not as a fixed-length IEEE format but as an arbitrary-precision format with the specified mantissa precision.

How this maps to the demo tabs

TopicTab to tryWhat you can observe
4–16 bit low-precision AI computationFunction quantization, Softmax stabilizationRounding error, saturation, and the effect of scaling
Mixed-precision computationDot product & accumulationEffect of low-precision input + high-precision accumulator
High-precision computationLegacy MPFR graphFunction tables/graphs at a specified MPFR bit width
Understanding formatsFormat visualization & overviewExponent/mantissa allocation and density of representable values

Arbitrary-precision MPFR graph & table generation (integrated)

In addition to the original input form, the evaluation functions from the uploaded mathfunc.php and mathfunc_mpfr.php have been integrated into this script. Where mpfr_gexpr is available it evaluates at the specified MPFR bit width; otherwise it falls back to PHP standard double precision.

MPFR available
Detected an mpfr_gexpr call from mathfunc_mpfr.php. MPFR graph/table generation uses evaluation at the specified MPFR bit width.
With the MPFR engine, the expression is passed as-is to mpfr_gexpr. In fallback mode it supports the operators + - * / ^ ( ), the functions sin, cos, tan, exp, log, log10, sqrt, abs, floor, ceil, tanh, pow, min, max, and the constants pi, e.

How to use this page

  1. First, use "Floating-point basics" to grasp the overall picture of the formats and the differences in bit length.
  2. Experiment with low- and mixed-precision behavior in "Function quantization," "Softmax stabilization," and "Dot product & accumulation" (see the yellow "What to observe" note in each tab).
  3. Generate high-precision reference values and function tables in "Arbitrary-precision MPFR graph" and compare them with the low-precision results.

▲ Back to top menu

Copyright (c) Tomonori Kouya, High Performance Computing Laboratory
AI Precision Playground v20 (improved) / Intended for educational use.