Top-K#

Optimal Cutoff Timestep (OCT)#

\[ g(\boldsymbol{X}) = \arg\min_{\hat{t}}\{ \forall \hat{t}_1 > \hat{t}: \mathbf{1}(f(\boldsymbol{X}[\hat{t}_1])= \boldsymbol{y})\} \]

Top-K Gap for Cutoff Approximation#

The defination of \(Top_k(\boldsymbol{Y}(t))\) as the top-\(k\) output occurring in one neuron of the output layer,

\[ Y_{gap}= Top_1(\boldsymbol{Y}(t)) - Top_2(\boldsymbol{Y}(t)), \]

which denotes the gap of top-1 and top-2 values of output \(\boldsymbol{Y}(t)\). Then, we let \( D\{\cdot\}\) denote the inputs in subset of \(D\) that satisfy a certain condition. Now, we can define the confidence rate as follows:

\[ \textit{Confidence rate: } C(\hat{t}, D\{Y_{gap}>\beta\}) = \frac{1}{|D\{Y_{gap}>\beta\}|}\sum_{\boldsymbol{X}\in D\{Y_{gap}>\beta\}} (g(\boldsymbol{X}) \leq \hat{t}), \]

The algorithm searches for a minimum \(\beta \in \mathbb{R^+}\) at a specific \(\hat t\), as expressed in the following optimization objective:

\[ \arg\min_{\beta} C(\hat t, D\{Y_{gap} > \beta\}) \geq 1-\epsilon, \]

where \(\epsilon\) is a pre-specified constant such that \(1-\epsilon\) represents an acceptable level of confidence for activating cutoff, and a set of \(\beta\) is extracted under different \(\hat t\) using training samples.