Here are some suggestions for distinguishing music from voice: Music usually has melody, which uses a wider range of sustained frequencies than voice. Polyphonic music has harmony, which uses more chords than does voice. A chord usually has multiple harmonics and subharmonics, while voice is much more limited in its harmonics.
As general guidance, I would write tests based on realtime Fourier analysis, comparing samples of the range of music and the range of voice which you wish to distinguish (you have to make decisions about this so you know when you are successful). Each test can yield a measurement of effectiveness that you can use to direct the evolution of your ideas and program. Basically, if a test program gets 50% correct answers when faced with music and voice samples, then it is 0% successful, but if it gets 100% correct answers, it is 100% successful for that set of samples.