Your situation reminded me of the way IMDB sorts movies by rating, even though different movies may receive vastly different total number of votes. They use something called a credibility formula which is apparently a Bayesian statistics way of doing it, unlike the frequentist statistics with p-values and null hypotheses that you are looking for atm.