A method for teleconference includes receiving a first media stream that carries a first audio, and a second media stream that carries a second audio, and receiving a first audio weight for weighting the first audio and a second audio weight for weighting the second audio. The first audio weight and the second audio weight are different from each other and determined based on at least one of (i) content of the first audio and content of the second audio or (ii) received customization parameters. The method further includes generating a mixed audio by combining a weighted first audio based on the first audio weight and a weighted second audio based on the second audio weight.