Abstract: The digital revolution has created unprecedented opportunities in computing and communication but it also has generated the data deluge with an urgent demand for new pattern recognition technology. Learning patterns in data requires to extract interesting, statistically significant regularities in (large) data sets, e.g. the identification of connection patterns in the brain (connectomics) or the detection of cancer cells in tissue microarrays and estimating their staining as a cancer severity score. Admissible solutions or hypotheses specify the context of pattern analysis problems which have to cope with model mismatch and noise in data. A statistical theory of discriminative learning is developed based on information theory where the precision of inferred solution sets is estimated in a noise adapted way. The tradeoff between “informativeness” and “robustness” is mirrored by the balance between high information content and identifiability of solution sets, thereby giving rise to a new notion of context sensitive information. Cost functions to rank solutions and, more abstractly, algorithms are considered as noisy channels with a data dependent approximation capacity. The effectiveness of this concept is demonstrated by model validation for spectral clustering based on different variants of graph cuts. The concept also enables us to measure how many bit are extracted by sorting algorithms when the input and the pairwise comparisons are subject to fluctuations.