Evaluating the Bayesian performance of LLM
Just like people, for LLM consumer interactions to be efficient, probabilistic estimates of consumer preferences have to be frequently up to date with every new interplay with the consumer. Now, does the LLM behave as if it had probabilistic estimates which can be up to date as anticipated from optimum Bayesian inference? If the LLM’s conduct deviates from the optimum Bayesian technique, how can these deviations be minimized?
To check this, we used a simplified flight advice process by which the LLM acted as an assistant and interacted with a simulated consumer for 5 rounds. In every spherical, each the consumer and the assistant had been introduced with three flight choices. Every flight was outlined by departure time, length, variety of stops, and value. Every simulated consumer was characterised by a set of preferences. For every characteristic, it’s also possible to have sturdy or weak preferences for top or low values for that characteristic (for instance, you would possibly want longer or shorter flights). Or perhaps you do not have a choice concerning this characteristic.
We in contrast the conduct of the LLM to that of a mannequin that follows an optimum Bayesian technique (Bayesian Assistant). The mannequin maintains a likelihood distribution that displays estimates of the consumer’s preferences and makes use of Bayes’ Legislation to replace this distribution as new details about the consumer’s selections turns into accessible. In contrast to many real-world situations the place Bayesian methods are troublesome to specify and implement computationally, this managed setting is simple to implement and permits us to precisely estimate how a lot the LLM deviates from the Bayesian technique.
The assistant’s aim was to advocate flights that matched the consumer’s picks. On the finish of every spherical, the consumer indicated to the assistant whether or not the choice was appropriate and gave the proper reply.


