Suppose an experiment is performed that produces a p-value of 1/10,000. This means that the result obtained is unlikely to have happened by chance. Specifically, it means that one would have had to perform 10,000 different experiments before such a result might be expected to happen by chance alone. If one performs an experiment and obtains such a significance level, one is justified in concluding that it did not happen by chance. Rather, something caused this unusual result to happen, e.g., the existence of a code. Suppose, however, one did 10,000 Torah code experiments, each experiment with the same protocol but with different key word sets. And suppose that one of them yielded a p-value of 1/10,000. Such a result is quite expected and we cannot conclude that anything (e.g., a code) more than chance is operating here.
Consider a simple example. Suppose you think of a random number between 1 and 100 and challenge a friend to guess the number. If he succeeds on the first try, then it is startling because by chance alone the probability of him doing so is 1/100. Suppose he guesses the number on the second try. This is still interesting but not quite as startling as guessing it on the first try. We can actually calculate the significance (mathematically, the expectation) by simply multiplying the probability of success with one guess by the number of guesses. In this case it is 2 * (1/100) = 1/50. Suppose he takes 50 different guesses? Then his probability of success is 50 * (1/100) = 1/2 and this is not significant at all! If he takes 100 different guesses he is sure to guess the right number; his probability of success is 100 * (1/100) = 1.
The conclusion is that number of experiments performed is just as important as the probability of success calculated because the true measure of success is the product of these two numbers. Thus, if one claims to have done an experiment and obtained a probability (of success given that exactly one experiment was performed) of 1/10,000 but has actually done, say, another 499 experiments in secret, then the experimental results have been misrepresented. What should be reported is that 500 experiments were done before obtaining the 1/10,000 result, in which case the true significance of the experimental result could be calculated as 500 * (1/10,000) = 1/20, which by the Bon Ferroni inequality is an upper bound on the true p-value. If the experimenter does not report the 499 experiments and only reports the 1/10,000 result of the 500th experiment the report is fraudulent and the 500th experiment is said to contain hidden failures. In this case there were 499 hidden failures.