Definition
Top-K sampling limits the model's next-token choices to the K most likely options, regardless of probability mass. Top-K 50 means the model considers only the top 50 most-probable tokens at each step. Less commonly tuned than top-p in modern APIs. Both bounding methods serve similar purposes — preventing the model from picking very-low-probability tokens that produce off-topic output.
Example
Top-K 1 = greedy decoding (always pick most-likely token). Top-K 100 = wider sampling pool.
When to use
Rarely tuned in practice. Default values work for most use cases.