I have read your previous blog and thought your reasoning through variance was sound. I did not feel it would need such uh, statistical jack hammering such as Montecarlo simulation!
Anyhow, I think you can easily detach your hypothesis from the framework of ELO: Just claim that games between player A and B are drawn from some distribution P(A| theta) where theta is a parameter containing all things you do not want to select for, such as Player B, Map, amount of games played, lag, time of the day, et cetera! The task of a tournament, according to you, is then to let player A pass on if P(A|theta)P(theta) > P(B|theta)P(theta)*, that is: the player more likely to win any game (which would traditionally correspond to ELO). And then you just argue as in your first blog post about this, that more samples implies smaller volatility (or you can show that there is some kind of inequality à la P(wrong player advances) < 2^-M(n), where M is some function that is monotonically increasing in n, the number of sample points, just model the event 'wrong player advances' as coinflips with P(E|theta) where E is the wrong player).
However, in the end... it's like Solarsail writes, audience satisfaction is probably the driving factor behind tournament formats and not the quest to "learn" the distributions P(X|theta)P(theta) for all players X as good as possible under a theta that was decided on by the organizers. Of course it correlates, but it's not the same. :>
* Where P(theta) is basically the distribution of "outside" condition such as opponent, map, game number, etc. I made a small leap of faith here assuming that whatever distribution of theta the tournament had up until now is the same as it will have in later stages, which is obviously not true as really good people will save strategies for important opponents, the number of games changes (f.e. looser bracket plays more than winner bracket, so doing well in a looser bracket set might require a different skill set than doing well in winner bracket). But if we were to consider something like this, we would either need to estimate theta for later stages of the tournament with historical data which is ridiculous as pointed out by SKC.
ps: Significance Value... for Monte Carlo Experiment... HOW DOES THAT EVEN WORK. I mean, either the model is a good approximation of reality or it is not. There is no real world data!