UGLD

Uncertainty-Gated Lexical Decoding — logits processors for HuggingFace Transformers.

Install

UGLD is available on PyPI:

pip install ugld

How it works

UGLD modulates decoding-time lexical interventions according to the model's own uncertainty, so control is applied strongly when the model is unsure and backs off when it is confident. This keeps generated text fluent even at high intervention strengths.

Entropy gate

At each decoding step the model produces a probability distribution $\mathbf{p}$ over its vocabulary. The Shannon entropy of that distribution,

$$ H(\mathbf{p}) = -\sum_i p_i \log p_i $$

is used to drive a smooth sigmoid gate $\phi(\mathbf{p}) \in [0, 1]$:

$$ \phi(\mathbf{p}) = \sigma\!\left(\frac{H(\mathbf{p}) - \tau}{s}\right) $$

$\tau$ is an entropy threshold — the gate is nearly closed when $H(\mathbf{p}) \ll \tau$ (confident prediction) and fully open when $H(\mathbf{p}) \gg \tau$ (uncertain prediction). $s > 0$ controls how fast the transition happens.

Conditioning towards a vocabulary (UGLD-t)

Given a set of green token ids $\mathcal{V}_\text{green}$, UGLD-t blends the model's distribution $\mathbf{p}$ with a conditioning prior $\mathbf{q}$ concentrated on those tokens:

$$ \mathbf{p}' = (1 - \alpha)\,\mathbf{p} + \alpha\,\mathbf{q}, \qquad \alpha = \phi(\mathbf{p})\,\alpha_\text{max} $$

Because $\alpha \in [0,1]$ this is always a valid convex combination. Three built-in priors are available for $\mathbf{q}$:

uniform spreads mass equally over all green tokens; top-k restricts to the top-$K$ green tokens by current probability; renorm re-normalises the model's own distribution over the green set (generally the strongest and most adaptive choice).

Conditioning against a vocabulary (UGLD-a)

Given a set of red token ids $\mathcal{V}_\text{red}$, UGLD-a subtracts a scaled penalty vector $\lambda \mathbf{r}$ from the raw logits $\mathbf{z}$ before softmax:

$$ \mathbf{z}' = \mathbf{z} - \lambda\,\mathbf{r}, \qquad \lambda = \phi(\mathbf{p})\,\lambda_\text{max} $$

Working in logit space avoids negative probabilities. Two weight-vector strategies are available:

fixed applies a uniform penalty of 1 to every red token; dynamic scales each red token's penalty by its current probability (via min-max normalisation to $[1, 2]$), concentrating pressure where it matters most.


Quickstart

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessorList
from ugld import UGLD_Towards, UGLDTowardsConfig

model_name = "gpt2"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

simple_words = [
    " simple", " easy", " basic", " clear",
    " small", " big", " light", " heavy",
    " fast", " slow", " old", " new",
    " good", " bad", " near", " far",
    " start", " end", " help", " use",
]

green_ids = []
for w in simple_words:
    green_ids.extend(tok.encode(w, add_special_tokens=False))
green_ids = list(set(green_ids))

proc = LogitsProcessorList([
    UGLD_Towards(UGLDTowardsConfig(
        green_token_ids=green_ids,
        alpha_max=0.5,
        tau=1.0,
        s=0.3,
        prior="renorm",
    ))
])

inputs = tok("Explain gravity in", return_tensors="pt")

out = model.generate(
    **inputs,
    max_new_tokens=50,
    logits_processor=proc,
)
print(tok.decode(out[0], skip_special_tokens=True))

For a more complete walkthrough, see the interactive notebook:


API Reference

You can find the Full API Reference Here.

Citation

If you use UGLD in your research, please cite:

@inproceedings{papucci-etal-2026-ugld,
    title     = {Lexical Conditioning of Model{'}s Distribution through
                 Uncertainty-gated Soft-Mixing of Probabilities},
    author    = {Papucci, Michele and Venturi, Giulia and Dell{'}Orletta, Felice},
    booktitle = {Proceedings of the Workshop READIxTSAR @ LREC 2026},
    year      = {2026},
    address   = {Palma, Spain},
}