🤖 AI Summary
The piece breaks down what LLMs actually do: they predict the next token given a text prefix and assign probabilities to many possible next tokens (e.g., after "My name is" a model might give "John" 43% and other names smaller probabilities). The model itself is stateless — the conversation "state" is the transcript your application builds and feeds back to the model. How you sample from the model matters: greedy decoding (always pick the top token) yields deterministic replies; sampling with temperature produces variety and multiple plausible continuations. You can prime behavior by seeding the transcript (system prompts or role descriptions) so the model produces a "character" with particular style, and because the model treats all text the same you can edit past transcript entries and re-complete from modified history.
For AI/ML practitioners this reframes what "the AI said" means: the model is a screenwriter predicting a character’s lines, not an agent with intrinsic beliefs. That distinction matters for reproducibility, prompt engineering, safety and tooling: interfaces and higher-level training (RLHF, safety layers) mediate what users see, but direct access to LLMs enables multi-response generation, transcript forking, retroactive edits, and also risks like manipulated histories or misleading attributions. Understanding token probabilities, decoding strategies, and external state management is essential for robust systems, reliable evaluation, and ethical deployment.
Loading comments...
login to comment
loading comments...
no comments yet