Technology, AI and ethics.

Thinking AI’s Voices: Gender and Identity


Thinking AI’s Voices: Gender and Identity

by Aude Gouaux-Langlois and Belinda Sykora

Driving home from an appointment in town, you turn on the GPS of your car to help you find your way. “Arrived at destination”, says the voice. In the elevator, you send a voice message to your friend to confirm dinner plans: “I’ll be there at 8pm”. Even though you don’t pay attention to it anymore, the indication “Third floor” resonates in the elevator just before the doors open to your apartment’s floor. Once home, you are greeted by Alexa who immediately responds by the affirmative when you asked her to play “It’s a hard day’s night” by The Beatles. The speakers blast the song and you distinctly say “Alexa, softer”, which results in a more comfortable loudness.

In our daily life we are surrounded by voices. The constant stream of voices is made of our inner voice, the voices of others (like our coworkers, friends or people in our environment), recorded voices of the elevator or the voice mail, as well as computer generated voices like Siri or Alexa. The voice acts as a medium and embodiment of the Artificial Intelligence. When we are dealing with AI, we use our voice as a tool to interact with this kind of technology and it is then manifesting itself through sound. The question then arises as to which sound is given to AI, in other words, its materiality. What could AI sound like? Do variations exist? Because AI is computer generated, its gender can be decided: either masculine, feminine or gender neutral. Is it possible to give AI a gender-neutral sound? In the modern age of programming, the feeling that we are starting on a blank page is noticeable. But is that really so?

Before the invention of the phonograph, the voice was confronted with the written. The voice stood for the living, the immediate, the floating in contrast to writing, which embodied rigidity, stability, and permanence. When Edison recorded the voice and played it for the first time, people felt uncomfortable, unheimlich. Indeed, the voices of dead people could be heard and recalled. Bodiless voices were floating among the living. With the possibility of reproducing voice and its history, which can be traced back for over a hundred years, we are now confronted with a previously unknown phenomenon: a variety of bodiless voices constantly buzz around us and feed us with information. One of these voices is the voice of AI, which will be increasingly important in the future.

In order to give a brief overview of the dimension of voice and how it is perceived, the possibilities of the voice will be described first. The voice as a performative phenomenon can be described in its appearance with different characteristics. It carries temporality and spatiality in itself. It is immediately ephemeral, a fleeting event, nothing rigid nor reproducible, retrievable or repetitive. The voice represents itself in the sound and it is wrapped in sound. The fact that the voice can be described in terms of tonalities and the feelings it conveys does not mean that we can simply define it by a list of characteristics. It is perceived as a sensual, psychological, physical, semiotical as well as mediating content. It has an intersubjective effect as a separating and connecting element. The voice does not only create identity for an individual, but at the same time forms community and conventions. Therefore, contrariness is always an essential characteristic: the voice is a paradox par excellence. The AI’s voice is created in the style of the human voice in which we perceive femininity and masculinity as well as the role we attribute to both.

Judith Butler developed the concept of “performativity of gender” at the beginning of the 1990s. She challenged biological gender as a category to define identity: gender is not only determined by the biological sex but also through acts of speaking and doing. In Gender Trouble, she questions the binary gender categories “man “or “woman” and states that they are constructions of language. According to Butler, the construction of gender starts when the baby is born. At birth, it will be immediately named and categorized: ranging from being “it” to “she” or “he”.[1] From then on, the repetition starts and never ends. Being always named and described as “man” or “woman” also constricts a person in the norm. Performance in Butler’s sense is not a one-time original event, but repetitive and constrained by norms and conventions: “Performativity is not a singular act, but a repetition and a ritual, which achieves its effects through its naturalization in the context of a body, understood, in part, as a cultural sustained temporal duration”.[2]

We expose ourselves consciously or unconsciously to repetitive auditory information encapsulated in the voice and process them unconsciously. As human beings, we learn through repetition: by repetitive hearing of the reproducible bodiless voices, we also uncritically assimilate to learn their sound characteristics. Accordingly, among other things, stereotypical gender roles can be spread, repeated, and perceived again and again.

Being always named and described as “man” or “woman” also constricts a person in the norm. Performance in Butler’s sense is not a one-time original event, but repetitive and constrained by norms and conventions.

The sound of the voice in relationship to the biological sex is subject to certain evaluations depending on the socio-cultural imprints of a society – and this applies in particular to the voice pitch. There are indeed tendencies to attribute certain characteristics to the female voice, such as “emotional”, “loving”, or “helpful”. On the contrary, the male voice is attributed with characteristics such as “dominance”, “assertiveness” or “competence”. 

In a frequency range of 175 Hz and 262 Hz, we speak of the female voice. The male voice tends to be in the range of 98 Hz and 131 Hz.[3] Depending on social norms, a male voice that speaks in the higher frequencies is associated with female characteristics; the same applies to a female voice in lower frequencies. A deep female voice, for example, is considered more competent than a high pitched female voice, while a male voice with a high frequency is considered more incompetent. In order to avoid such evaluations and categorizations, attempts on how to develop a gender-neutral voice for the AI are now being made. The logical conclusion is that a neutral voice must lie in the frequency range in between. But is that really the case?  “Q” – a gender-neutral voice developed by the agency Virtue together with the Danish linguist Anna Jørgensen – lays in the frequency range from 145 to 175 Hertz.[4] Therefore, it overlaps in the lowest female voice frequency range but does not enter the highest male voice frequency range. The reason for defining this particular frequency range lays in the fact that five voices were chosen due to their non “typical” male or female sound. These were played to test people’s voices several times and adjusted again and again until the majority of the voices were perceived as gender-neutral. In the context of this study, the method of research is based on how people perceive the voice. In other words it can be stated that perception is a key element in deciding how the neutral voice must sound. Thus, the way we deal with new technologies involved in programming, the voice of AI reflects the way we relate to voice in our daily life and in society.

Two famous AI voices are the ones embodying Alexa and Siri – two AI based tools that have an assistant function. Their clear service purpose, combined with their recognisable female voices, enables us to notice that the female voice still carries these outdated stereotypical ideas – even in new technologies like programming voices for AI.

However, this of course goes short if one assumes a global voice networking – since, as already mentioned, different assessments take place depending on the culture and development of a society. Therefore, the question arises whether the use of a neutral voice is necessary in order to break up old evaluation structures. On the one hand, the use of the gender-neutral voice would prevent the opportunity to reinterpret and take advantage of the diversity of feminine and masculine voices. On the other hand, this neutral sound would dissolve the gender gap as well as enable a separation between a human being and a computer.

Is it even possible to find a global answer?

On the one hand, the use of the gender-neutral voice would prevent the opportunity to reinterpret and take advantage of the diversity of feminine and masculine voices. On the other hand, this neutral sound would dissolve the gender gap as well as enable a separation between a human being and a computer.

The work of the artist Holly Herndon points towards a direction of symbiosis, a fusion of the masculine and the feminine. Holly Herndon collaborated with the AI expert Jules LaPlace to create Spawn, a neural network contributing to her composition process. Holly Herndon writes a score, records it with her ensemble of five singers, and feeds it into Spawn. During Herdon‘s concert at the Volksbühne in Berlin, the ensemble started a call and response with the audience. The recorded result was then fed into Spawn and contributes to “teach her”.[5] Interestingly, the AI is being fed with a multiplicity of singing voices and result in a genderless mixed choir. This artistic approach can be inspiring in order to develop AI voices in a humanistic way.

The embodied identity of AI is carried within the voice. For this reason, it is very important to understand the consequences of the voice given to AI. We are outlining the importance of opening up a new way to think about voice, identity, and AI. As stated earlier, we perpetuate stereotypes in a new technology. Is the non-binary voice a solution? We consider it to be a step, but it is not the end of the thinking process yet. Referring to Butler, we see gender as a spectrum and likewise, the AI voice can assimilate this idea. With the awareness of entering a relatively new field, we like the idea of shaping a voice that integrates the feminine and the masculine as a spectrum in order to escape these binary role models. The feminine and the masculine is present in various forms in each human being and we now have the opportunity to think further and sketch a new path that includes ethical thinking, artistic approaches, and a cultural dialogue.

  • [1]

    Butler, J. (1993). Bodies that matter: on the discursive limits of “sex”. New York: Routledge

  • [2]

    Butler, J. (1999). Gender Trouble: Feminism and the Subversion of Identity, New-York: Routledge

  • [3]

    Habermann, G. (1978, 1985). Stimme und Sprache. Ein Einführung in ihre Psychologie und Hygiene, Stuttgart: Georg Thieme Verlag

  • [4]

  • [5]

    Holly Herndon uses the pronoun she/her referring to Spawn.

Share Post
Aude Gouaux-Langlois

Aude Gouaux-Langlois is a composer, musician and sound artist from France, working with different sound sources that she removes from their context and combines with her own voice. Her work merges music, sound design and technology in an organic way.

Belinda Sykora

Belinda Sykora lives and works as an artist, musician and theorist in Berlin and Vienna. Her works deal with language and sound in an interdisciplinary way, using different technical means to play with the perception of the recipients. These include binaural sound walks, sound installations, radio plays and performances. Aude Gouaux-Langlois and Belinda Sykora founded the artist collective Ekheo in 2016 during their master's degree in "Sound Studies" at UdK Berlin. Their work about the voice includes artistic research in the field of auditory culture, performance, radio art and experimental music.

Show Comments