Methods, systems and programming for evaluating query suggestions quality. In one example, a plurality of query suggestions are provided in a ranking to a user. A user activity with respect to one of the plurality of query suggestions is detected. A position of the one of the plurality of query suggestions in the ranking is determined. A quality measure of the plurality of query suggestions is calculated based, at least in part, on the user activity and the position of the one of the plurality of query suggestions.