Mannequin redesign
TimesFM is a patched decoder that tokenizes each 32 consecutive timepoints (patches) as enter tokens and applies a transformer stack on prime of the set of enter tokens to supply output tokens. We then apply a shared multilayer perceptron (MLP) to rework every output token right into a time collection of 128 timepoints.
To create TimesFM-ICF (in-context fine-tuning), begin with the bottom TimesFM mannequin and proceed pre-training it with a brand new context (prediction historical past and all in-context examples). Step one is to make sure that the mannequin doesn’t confuse or confuse the prediction historical past with the examples within the context. Think about you give your mannequin a listing of numbers representing a number of various things. For instance, one retailer sells sun shades, then one other retailer sells umbrellas. Merely combining all these numbers can confuse the mannequin into considering it’s one steady information stream. For instance, if the primary retailer’s gross sales are rising and the second retailer’s gross sales are reducing, the mannequin could incorrectly interpret it as a single up-and-down sample, quite than two separate and easy tendencies.
To repair this, we positioned a particular learnable “frequent delimiter token” after every set of numbers, like a digital “cease signal” or “new paragraph” image. With these delimiters in place, as quickly because the mannequin seems to be on the delimiter token within the instance it noticed earlier, it now not confuses it with the information it’s presently attempting to foretell. This theoretically permits the mannequin to study from patterns in previous examples and apply that data to present predictions. For instance, a mannequin can study that “gross sales for all shops have not too long ago proven a constant directional development, so we must always predict an upward development in sunscreen gross sales for brand spanking new shops.”


