sdf PhD





Exploring visual representation of sound
in computer music software through
programming and composition



Selected content from
a thesis submitted with a portfolio of works to the University of Huddersfield in partial fulfilment of the requirements for the degree of Doctor of Philosophy

December 2013

Samuel David Freeman

Minor amendments
April–June 2014

2.2 Looking at psychological aspects

Whereas the physical motions of sounding objects may give rise to visualisation by virtue of mechanical interaction between materials, the psychoacoustic aspects of sound, being as they are constructed within the mind, require abstract systems of symbolic representation to bring them to the visual domain. Once in the visual domain, any representation of auditory domain attributes must also undergo the physiological and psychological processes of visual perception in order to be comprehended by the human observer. Within such discussion one may venture a question toward what extent perceptions, of either domain, are directly resultant of human physiology compared the extent by which cultural conditioning influences interpretation of physical (light or sound) stimuli; the subject of synaesthesia between sound and visual perceptions may also spring to mind, but it is beyond the scope of this project to investigate these factors. It is, nonetheless, useful to include some context with regard to a few of the differences and similarities found between perceptions in these two sensory domains.

2.2.1 Morphophoric mediums and (in)dispensability

The following points of reference from Roger Shepard (1999) serve here as a contextualising primer to the exploration of strategies for the representation of musical pitch and organised time in the creation of portfolio works.

Shepard describes that both pitch and time, like visual space, are 'morphophoric mediums' (p. 154, italics removed):

In their 1971 paper Attneave and Olson made the point that there is something fundamental about pitch, as there is about time. There are many other attributes to the organisation of sound – such as timbre, spatial location, and loudness – but the attributes of pitch and time have special importance. Attneave and Olson called pitch a morphophoric medium, meaning that it is a medium capable of bearing forms.
Visual space is also a morphophoric medium. For example, if a right triangle is presented in space, the triangle can be moved around and still be recognised as the same triangle. The medium of space is therefore capable of bearing a form that preserves its identity under transformation. Similarly, pitch patterns like simple melodies and harmonies can be moved up and down in pitch and still be recognized by musicians as being the same pattern. […]
Time is a powerful morphophoric medium. Other dimensions used in music (such as loudness timbre and spatial location) are not morphophoric. […]

Shepard continues with examples that contrast the domains of visual and auditory perception in terms of cognitive attributes that are either dispensable or indispensable (pp. 155–157). This concept of the '(in)dispensability of attributes' can be illustrated by experiment, as is summarised:

In the visual domain a pair of projectors are used to project spots of coloured light that have two main attributes: colour and location.[n2.8] Loudspeakers, taken as the auditory counterparts to the projectors, are used to produce simple (test tone) sound objects, such that the main attributes in the auditory domain are pitch and location.[n2.9]

[n2.8]   Different colours to those described by Shepard are used in this re-telling of the situation. The size and shape of the projected spots are to be considered equal regardless of projector angle etc

[n2.9]   For this experiment, imagine the front three speakers of a standard 5.1 surround sound configuration.

First, in the visual domain, a red spot and a green spot are projected side by side, and two discrete objects are perceptible by the human observer. If the two projectors are moved so that the spots of light overlap – thus dispensing of the location attribute – then the human will see a single object: a yellow spot of light. Location is indispensable in visual perception because the two spots of light cannot be seen as separate without that attribute. For the auditory counterpart to this first part of the experiment, middle C is sounded in the left speaker while the right speaker plays E above middle C, and the two different tones are heard coming from the two sides. To dispense of the location attribute, the centre speaker is used to play the same two tones together; the human is able to correctly perceive the two pitches in the sound, and so the location attribute can be said to be dispensable in auditory perception.

Next, the two spots of light are again projected side by side, but this time both spots will be yellow, thus to dispense with the attribute of colour. For the auditory domain equivalent in the experiment, the attribute of pitch is dispensed while maintaining that of location by using both left and right speakers to each play a tone of pitch D above middle C. While the human observer continues to see the two yellow spots as discrete objects (thus colour is dispensable), only one sound object would be perceived as being located between the two speakers (and thus pitch is indispensable).

One might think that pitch in audition is analogous to color in vision because both are frequency-related phenomena. However, the experiment using tones and speakers shows that the two are not analogous, because pitch is indispensable and color is dispensable. […] The indispensability of space in vision and of pitch in audition are parallel to both of those attributes being morphophoric media. So the analog to visual space is not auditory space but auditory pitch. (Ibid., §13.7)

2.2.2 Paradox and pitch perception

One finds paradox at various juncture in the unpacking of concepts within this research, and this has been a source of both inspiration and frustration to the creative process. Continuing the theme of contrasting visual and auditory perceptions, and moving towards the discussion of pitch- and frequency-space representations – while also maintaining a connection to the works of Roger Shepard – here are the words that open David Benson's section on 'musical paradoxes' in Music: A Mathematical Offering (2007, p. 158):

One of the most famous paradoxes of musical perception was discovered by R. N. Shepard, and goes under the name of the Shepard scale. Listening to the Shepard scale, one has the impression of an ever-ascending scale where the end joins up with the beginning

Impossible staircase
Figure 2.4: Penrose stairs optical illusion

The Shepard scale illusion is a 'demonstration of pitch circularity'.[n2.10] 'Pitch is the subjective [psychoacoustic] variable corresponding most closely to the objective [physical] variable frequency' (Loy, 2006, p. 158), and the Shepard scale auditory illusion is achieved by playing on the human perception of octaves within the pitch attribute of sound.

[n2.10]   ASA present Auditory Demonstrations with links to sound-files; 'The first is a discrete scale of Roger N. Shepard, the second is a continuous scale of Jean-Claude Risset.' (“Acoustical Society of America - Circularity in Pitch Judgement,” 1995)

[n2.11]   Public domain image, online at (accessed 20130310)

While Benson uses the visual analogy of an ever-ascending staircase (Penrose stairs, see Figure 2.4[n2.11]), Roger Shepard himself (in Cook, 1999, p. 158) compares the effect to the upward spiralling of a barber pole in connection to the visualisation of pitch values on a helix; Shepard explains the significance of a helical visualisation as a psychologically founded representation of pitch (ibid., p. 157):

By examining the abilities of subjects to perceive and to produce pitch patterns under transformation, an appropriate representation of pitch can be found. [It has been] found that this representation corresponds to the log frequency scale, which is nice in that it preserves […] any interval under transformation, such as any arbitrary microtonal relationship. The log scale does not, however, represent the fact that certain intervals are special, such as octaves and perfect fifths. Ethnomusicological studies have found that octaves and perfect fifths seem to be culturally universal; that is, although musical systems and scales of all cultures differ in many respects, they almost always contain an octave relationship, and often the perfect fifth as well. […]
The German physicist Moritz Drobisch (1855) proposed that tones be represented on a helix. This helix, on the surface of a cylinder, places octaves immediately above and below each other.

As a three-dimensional model the Drobisch type of helix readily manifests the pitch-class and octave-level aspects of perceived pitch in the visual domain.[n2.12] Shepard goes on to describe how extensions to the concept, employing double helix, toroid, and helical cylinder forms, may provide visualisation of the special relationships between both octave and fifth intervals. However, given the two-dimensionality of both paper and screen – which are where my compositional practices transpire – the three-, four-, and five-dimensional constructs described by Shepard seem impracticable. Of course it is perfectly commonplace for three-dimensional objects to be represented on flat surfaces, but, nevertheless, it was decided early on in this project that the visual representations to be explored would all be two-dimensional constructs that may suggest, but do not require, higher dimensions. By working only on the plane, conflicts of perception and problems related to perspective and judgement of adjacency between points of a viewed object – the very things upon which the paradoxical illusion of Penrose stairs predicate – can be avoided.

[n2.12]   Aside it is noted that the date of origin for the Drobisch type of helix has been cited as mid-nineteenth-century

Returning to Shepard describing the auditory demonstration of circularity in pitch perception (ibid., p. 158):

The inspiration for generating the circular Shepard tones came from the fact that Max Mathews at Bell Labs had created the first program for generating sounds using the computer. This aroused great excitement, and the author did an experiment using a subdivision of the octave into 10 equal steps instead of the traditional 12. A note's position within the octave is called its Chroma, and the chroma circle is the base of the helix [described above]. The tones were formed by using a symmetrical structure, so that after an octave had been traversed, the ending tone was identical to the beginning tone.

It is, perhaps, worthy of note that computer music software, which was a new thing at that time, played an important role in the creative practice of Shepard's research. A second observation here is that many of my own works have, both prior to and since reading the above quoted, utilised equal division of the octave in to a non-duodecimal number of steps.[n2.13] There is, however, a more pressing motivation for the inclusion of the above quotation, and that is to question this use of the word 'chroma':

Utilization of that word in this context is standard (see, for example, Mauch et al., 2009; Bertin-Mahieux et al., 2010; Sumi et al., 2012), and it has been attributed to Geza Rèvèsz[n2.14] (Loy, 2006, p. 163):

[n2.13]   See for example the nRadii work (§4.4.1), and in some of my 2010 web audio work:

[n2.14]   Geza Rèvèsz is author of Introduction to the Psychology of Music which was published 1954 as an English translation of Einührund in die Musikpsychologie published 1946.

Rèvèsz (1954) developed a two-component theory of tone, suggesting that there are at least two principal interlocking structures in pitch [perception] which he called tone height [and] chroma

Loy also writes that in developing the demonstration of pitch circularity, 'Shepard (1964) wanted to test Rèvèsz?s theory' (ibid., p. 167). It seems, nevertheless, that using a colour-rooted word to label an attribute of pitch within a spatial representation is – if not paradoxical, then at least – contradictory to the consideration of the (in)dispensability of colour compared to pitch (as outlined in §2.2.1). To remove the association of pitch perception to conceptions of colour, the term 'pitch-class' is preferred for describing position within the octave; thus, pitch-class-circle is written instead of 'chroma circle'; the way that this appears more cumbersome to write is, then, accepted as a paradoxical quirk. The pitch-class-circle is addressed as an aspect of the spiroid-frequency-space (§6).


← 2.1: Looking at physical aspects

2.3: Looking at looking at sound →