I think there are better usecases of how to use transformers for recommendation systems.

1 min readMay 31, 2023

I think there are better usecases of how to use transformers for recommendation systems. a better and more accepted approach is to train a seq2se2 transformer from scratch where a new vocabulary is introduced.

The new vocabulary (tokens) represent the uuid of each product. The input seqeuence for trainin represent than the pruchase history of a customer and the model needs to predict the next most likely token (product). This can be the last product the customer pruchased or a random product from his history (if you choose the correct the transformer model). For that methodology you can either use a classic BERT model which is dependent on the sequence of inputs or a MLXNET, which is robist against the order of input tokens.

NVIDIA has already created a complete package to train such models on scale. It works really good.

The approach you are proposing here is by far more unstable. Also for the first approach you do not need a model of the size of GPT3 to be successfull, but a much smaller model, which makes it much more eiffcient and reduces cost.

There is a lot of hype around LLMS, but they are by far not the go to solution for everythig. And with a little bit of thought you can easily come up with better ideas than throwing a LLM on everything.

Written by Alex Vaith

Responses (2)