The tokenizer for heart rate
by Steinar Agnarsson
The next wave of health AI will be built on wearable data. Hundreds of millions of people already wear devices that record heart rate continuously — during exercise, sleep, and daily life. The raw material for health foundation models is being collected at scale, right now, on wrists around the world.
There is a problem with this raw material. It has no labels.
What raw heart rate actually is
A heart rate trace is a sequence of numbers: 72, 74, 73, 71, 68, 95, 112, 134, 152, 161, 158, 142, 118, 94, 78. Timestamps and BPM values. That is all.
A foundation model trained on millions of these traces will learn statistical patterns — correlations between heart rate values and time of day, between resting HR and age, between peak HR and activity duration. It will learn population-level regularities. It will be able to predict, given a partial trace, what the next few values are likely to be.
What it will not learn is physiology.
A heart rate of 155 bpm means something completely different for a marathoner (below lactate threshold, sustainable for hours, predominantly aerobic) than for a sedentary 55-year-old (above threshold, glycolytic, unsustainable for more than minutes). The number is the same. The physiology is opposite. A model trained on raw traces cannot distinguish these two cases because it has never been told where the thresholds are, what endurance means, or how the body partitions fuel between fat and carbohydrate.
This is the labeling problem. Raw heart rate data is abundant but unlabeled. And in physiology, the labels are everything.
What a tokenizer does
In language modeling, a tokenizer converts raw text into structured units that carry meaning. The sentence "The cat sat on the mat" becomes a sequence of tokens, each mapped to a position in a vocabulary that the model understands. The tokenizer doesn't do the reasoning — it prepares the input so the model can.
TrueZone does the same thing for heart rate.
It takes a raw HR trace — noisy, unlabeled, physiologically opaque — and converts it into structured, labeled features: three base parameters (Endurance, Maximum Speed, HRmax) and over 60 derived metrics. Thresholds, substrate balance, fatigue state, recovery kinetics, metabolic zones, training load partitioning, cardiac drift — all interpretable, all grounded in an ODE model of human physiology.
The output is not a prediction. It is a structured physiological description of the individual. Every feature has a defined meaning. Every feature is derived from the same physical model, not learned from data.
Why physics-grounded features matter for AI
A neural embedding learned from data is a black box. Dimension 47 of a 128-dimensional HR embedding might correlate with fitness, or with device type, or with the time zone the user lives in. You cannot know, and the model cannot tell you. If the training population changes, the embedding changes. If the device changes, the embedding changes.
A physics-grounded feature is interpretable and portable. E = 0.75 means the same thing whether the data came from an Apple Watch, a Garmin chest strap, or a clinical ECG. It means the same thing for a 25-year-old runner and a 65-year-old walker. The physiology does not change with the device or the demographic — only the raw signal quality does.
This has three consequences for AI:
Interpretability. A model trained on TrueZone features can be audited. If it predicts metabolic syndrome risk, you can trace the prediction back to specific physiological features — MFI, RZI, substrate balance — and verify whether the reasoning is clinically sound. This matters for regulatory approval, clinical adoption, and trust.
Portability. A model trained on TrueZone features from one device works on data from another device. The feature layer absorbs the device-specific noise. The model sees physiology, not hardware.
Retroactivity. Any existing HR archive — millions of sessions already sitting in databases at Garmin, Apple, Strava, research institutions — can be processed through TrueZone to generate physiological labels after the fact. You do not need to collect new data. You need to label the data you already have.
What this enables
With structured physiological features, several classes of models become possible that do not exist today:
Metabolic health prediction. Train models to predict glycemic response, insulin sensitivity, or metabolic syndrome risk from MFI, RZI, and substrate balance. MFI already correlates R² = 0.99 with glycemic response markers in validation studies. That is a feature, not a coincidence — and it is a feature a model can learn from.
Personalized training AI. Recommendation engines that prescribe training based on individual endurance, threshold positions, and recovery state. Not "users like you also ran 5K today" but "your endurance is 0.62 and your lactate threshold is at 14.2 km/h — here is the session that will improve both."
Clinical digital biomarkers. Physiological features as continuous endpoints in clinical trials. Track metabolic flexibility decline in a drug trial without periodic lab visits. Detect early response to lifestyle interventions from the first week, not the first quarterly blood draw.
Population health surveillance. Process existing wearable data at scale into structured physiological profiles. Identify at-risk populations not from step counts but from metabolic fitness scores that correlate with actual clinical outcomes.
The infrastructure layer
TrueZone is not an AI model. It is the infrastructure that makes AI health models work. The tokenizer, not the language model. The feature layer, not the prediction layer.
Raw heart rate is noise. Structured physiology is signal. The difference between the two is what determines whether the next generation of health AI learns statistics or learns biology.
The data is already on a billion wrists. The labels have been missing. They are not missing anymore.