Vocalizations, like any other phenotypic trait, can change over time. These changes can have several causes which can be generally divided into drift (the random accumulation of mutations that are generally neutral in terms of fitness) and selection (changes that are under some directional pressure and tend to increase fitness). The forms of drift and selection fall into five major categories. In any given population, these categories can act independently of one another, they can act in concert, or they can oppose each other. The five categories are Cultural Drift, Genetic Drift, Cultural Selection, Natural Selection, and Sexual Selection.
Cultural Drift is the process where changes in vocalizations occur by chance. These changes can come from imitation errors as young individuals attempt to copy the sounds produced by adults. They can also arise in the form of innovations where an adult incorporates a new sound element into its vocalization. The accumulation of these changes can eventually lead to the formation of new vocal types.
Genetic Drift is the random accumulation of mutations at loci that regulate sound production. Ass these mutations accumulate, the physiological and mechanical abilities of an organism to make sounds may be altered. Due to this, genetic drift is likely to have a greater effect on the evolution of vocalizations when the mutations happen to effect the limits of performance for an animal.
Cultural Selection occurs when there is differential propagation of vocalizations across generations. This can occur in the form of vertical transmission from parents to offspring, horizontal transmission between peer groups or siblings, or oblique transmission from adults to unrelated young. Unlike the following two mechanisms, cultural selection is not directly driven be fitness. Instead, variations in vocalizations can spread through a population for other reasons. One example is because of a dominant individual using one particular variation and not others. Another example is when a particular frequency transmits through a habitat better than others, such as how low frequency sounds travel through dense foliage farther tan high frequency sounds. This would lead more young individuals to be exposed to low frequency sounds and so learn to imitate them.
Natural Selection can influence vocalizations directly, because of some fitness benefit that a particular vocalizations give the signaler, or indirectly, by altering some physical structure that is used is sound production (changing bill morphology adapting to different seed sizes, for example). The most commonly discussed role of natural selection in vocal evolution is through the process known as reinforcement. Reinforcement is where two populations have diverged to the point where hybrids between the populations are less fit than pure bred members of either population. This might be because the two populations have split to use foods of two different sizes. A hybrid might not be good and consuming either food size, and so be less fit. If such hybrid disadvantage exists, natural selection is expected to favor individuals of each population that tend to avoid mating with individuals of the other population. Vocalizations are frequently the first from of contact that two individuals have, and so they are in a unique position to moderate interactions and will tend to evolve towards greater species-level specificity.
Sexual Selection can take the form of intersexual selection or intrasexual selection. Intersexual selection can drive the evolution of vocalizations by the preferences of one sex (usually the female) for particular vocalizations of the other sex (usually the male), by sensory bias where one sex (usually the male) uses a vocalization that the other sex (usually the female) is predisposed to respond to, when a display can only be produced by individuals of high fitness, when the production of a display carries some fitness cost such as increased risk of predation, or when a vocal display can inform the receiver as to their likely genetic compatibility with the sender. Intrasexual selection on vocalizations come ins the form of members of same sex (usually males) using vocalizations to compete with one another. Here, the evolution of vocalizations can occur when vocalizations contain information about the sender. This information can be in the form of the senders size, strength, willingness to fit, social status, etc. Facets of vocalizations that are often favored include increased vocal complexity, high amplitude, low frequency, and high calling rate.
These mechanism for the evolution of vocalizations are most thoroughly studied in bird songs. However, bird calls may be susceptible to all of these types of evolution as well. This would be particularly true of calls that are learned, as opposed to innate, for which more and more examples are being discovered.