sdf PhD





Exploring visual representation of sound
in computer music software through
programming and composition



Selected content from
a thesis submitted with a portfolio of works to the University of Huddersfield in partial fulfilment of the requirements for the degree of Doctor of Philosophy

December 2013

Samuel David Freeman

Minor amendments
April–June 2014

3.1 Pixels

For the 2009 Huddersfield Contemporary Music Festival (HCMF) I collaborated with Duncan Chapman as part of The Music of Electricity (MoE) learning and participation programme. The project was connected to the Raw Material exhibition[n3.1] of recent works by Tim Head. Chapman conducted a series of workshops with A-level students and Year 6 pupils, culminating in a performance[n3.2] during the festival. I was commissioned to produce an 'interactive sound installation' for the project: a software-based system through which sounds recorded during the workshop sessions could be heard online. At first hesitant to take on the work, a decisive factor in accepting the commission came from reading texts by, and about, Tim Head which appealed to me as both a creator and an appreciator of things:

[n3.1]   At Huddersfield Art Gallery from Saturday 21 November 2009 until Saturday 9 January 2010.

[n3.2]   26 November 2009, at Huddersfield Art Gallery.

Tim Head's work is about instability and uncertainty: of images, of perception and of the individual's relationship with the wider world. […] His work might be characterised as a search for visual equivalents for the tension between what we perceive to be the truth and what we know to be the truth. (Tufnell, 2003)


Speculation on the elusive and contrary nature of the digital medium and its unsettled relationship with the physical world and with ourselves forms the basis for recent work. […]
By operating at the primary one to one scale of the medium's smallest visual element (the pixel or inkjet dot) and by treating each element as a separate individual entity the medium's conventional picture making role is bypassed. You are no longer looking at a representation of an imported image in virtual space but are looking directly at the raw grain of the digital medium itself. The medium is no longer transparent but opaque. The slippery weightless world of the virtual is edged out into the physical world. (Head, 2008)

In particular, the idea of 'treating each [pixel] as a separate individual entity' is something that had emerged in my own programming at Masters level, and which has continued since to feature in my approach to the software medium. I eventually gave the name pixels to the MoE interactive installation which resides online at

3.1.1 The sound of electricity

Participants of the workshops were introduced to the so-called twitching-speaker way of making sound that is described in Handmade electronic music: the art of hardware hacking (Collins, 2006, p. 20) as:

a beautiful electric instrument, evoking the spirit of nineteenth-century electrical experimentation (think twitching frogs legs and early telephones) out of nothing more than a speaker, some batteries, wire, and scrap metal.

When the positive and negative terminals of a (usually 9 volt) battery are connected to the terminals of a speaker, the electromagnetic interaction within the circuit causes the speaker-cone to be pushed forward (or pulled backward if the polarity is inverted). One of the connections in the simple circuit is left unfixed so that it can be carefully closed by hand to cause the speaker to twitch when the connection is made; a variety of buzzing noises can be achieved by finding positions in which the electrical connection fluctuates, and further modifications to the sound of a twitching-speaker are possible, for example by placing loose objects upon the speaker-cone. Participants recorded and edited a variety of sounds that were made in this way.

Lo-fi electronics such as the twitching-speaker are a feature of my own performance practice. Whereas the decision was made to focus exclusively on software-based systems in my phd composition, I have been interested to bring concepts of that practice into the digital domain.

3.1.2 System architecture

From a technical perspective the system that I designed for the MoE pixels installation comprises two main parts: (1) the front end is web based and has an interactive audiovisual element created with ActionScript3 (AS3) in Flash; and (2) at the back end of the system is a maxpat[n3.3] that generates XML entries that are used by the Flash element of the system via drag-and-drop of folders containing sound files. The system was constructed prior to the workshops (using a dummy set of sounds from my own archive). Chapman wanted the workshop participants to categorise the recorded sounds into different types, according to their own interpretation of the material; I suggested the use of six category groups that could then be visually identified using distinct colours on screen. Each pixel of a standard computer screen comprises red, green, and blue elements (RGB), and all available colours are produced in each pixel on screen by setting the levels of these elements. The six colours used in the MoE pixels are those made of either two (in equal measure), or just one of the three base elements, as shown in the following table:

[n3.3]   Definition: maxpat (noun) a patch made in MaxMSP and saved with the '.maxpat' file extension.

Name: red yellow green aqua blue magenta
RGB mask: 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 1 0 1

Six folders are named to correspond with the six colour names, both on the local machine where the sound-files are edited and categorised, and then on the web hosting which is accessible via FTP connection. The original plan was to use the jit.uldl object to handle the uploading of all the sound-files and the generated XML file to the hosting, all from within a maxpat. That aspect of the design was dropped because of issues relating to Jitter objects causing problems within the MaxMSP Runtime on the Windows OS (which is how the back end of the system was to run, while its development was on the Mac OS). The workaround for this problem was to use separate FTP software for the uploads, which meant less automation of the process but more stability.

When the MoE pixels webpage is accessed by the public audience, the sound-files are loaded in Flash, one at a time, using information from an XML file which includes the filename and designated colour of each sound-file. Upon successful completion of each load, a visual representation of the sound-file is created on screen: a random number is multiplied by the appropriate 'RGB mask' (as in the table above); the resultant colour is given to a square that is placed at randomly selected cartesian coordinates which conform to a 20-by-20 grid upon the Flash stage; the square itself is the size of one such grid cell. When the mouse-point passes over one of these coloured squares, the associated sound-file is played and the location of the square is changed, again to a randomly selected cell of the grid. The location selection algorithm is designed so that only an 'unoccupied' cell is given as the new location for a square. After all the sound-files have been loaded, the webpage will look something like is seen in Figure 3.1:

MoE Pixels
Figure 3.1: MoE Pixels

It was found that in some situations the mouse-point can be moved over a square without the 'play sound and move' behaviour being triggered. Although my initial reaction to this was to wonder where I'd gone wrong in the AS3 programming, it was soon embraced as feature, rather than a bug, and the 'mouse down' (md) and 'mouse up' (mu) functions were added to the code in order to access the expected movement and sound. There is also another, less often encountered, bug by which a square seems neither able to move nor play its sound; to address this the md function was further modified to also trigger any one randomly selected sound-file from those loaded, thus avoiding a situation in which nothing at all happens in response to the mouse input.

Adapting the code of a program, during its development, to embrace unexpected behaviours as features of the software is a commonly employed strategy in my practice. In many forms of creative art, it is often the 'happy accidents', as Bob Ross put it,[n3.4] that are most compelling and that enable a work to grow beyond its original conception.

[n3.4]   See for example, or as quoted by Bennett (2005, p.48).

3.1.3 Audience interaction

None of the above description of the software is provided to the audience of the MoE pixels work. When the work is [accessed online], only three words appear on screen: the title (punctuated with a colon) at the top; a hyperlink to 'reload' the webpage (thus resetting/re-randomising the Flash element) at the side; and, at the bottom, the word 'about...' hyperlinks to general information about the HCMF MoE project (text similar to that given at the start of §3.1). In withholding instruction and description of the work, the aim is to engage the imagination of the audience who are invited (by implication) to explore the screen and discover the sounds. The specific connections that I have made between concepts, sound-files and visual representations may or may not be identified, and it is of no consequence if they are or not. There is no 'correct' way to interact with the system and so no one (who can see and/or hear, and is able to control a mouse-point on screen) ought to feel excluded from participation. Talking to Duncan Chapman and Heidi Johnson (of the HCMF) about the work, during its development, I would often use the term 'Flash toy', but the word toy is in no way meant to belittle the work; the most serious and sophisticated electric instruments and softwares can be equally referred to as such, the emphasis being that the playing of/with them is fun.

Correspondence from one visitor to the work included a snapshot of their screen showing 'the fruits of [their] labour' while 'enjoying [the online installation]' – an outcome that must have taken quite some time to achieve, testifying to the success the non-direction approach towards the motivation of individual creativity when interacting with the MoE pixels – see Figure 3.2 (Baldwin, personal communication by email, 2009):

MoE Pixels, image from Baldwin (2009)
Figure 3.2: MoE Pixels, image from Baldwin (2009)

Another use of the system that was unexpected during its development was instigated by Chapman who suggested adaption of the system as a software instrument for laptop ensemble. A number of non-internet-connected laptops each had a copy of the MoE pixels system (both its front and back end parts) running locally, and A-level student workshop participants created their own sets of organised sounds to load in and play via pixels during the Gallery performance. Under Chapman's direction the students devised an indeterminate score for the performance which involved dice marked with the six colours.

One of the great appeals of creating software-based works is the way that the software medium lends itself to re-appropriation. It is also of interest to observe that audience experience of software-based artworks tend to involve some sort of participation beyond observation. In the anticipated interaction with pixels, the audience are the online visitors who become (perhaps unwitting) participants in an improvised musical experience; free interpretation and self-determined duration and direction exist within the infinite but bounded possibilities of the piece. In the described case of pixels-as-instrument, the laptop-performers (by being aware of the underlying software architecture, and interacting also with the backend maxpat element) became a sort of 'super-user' audience to my work; the performance piece that they then created was entirely their own and had its own audience and aesthetics.

3.1.4 Inherent paradox

While engaging, to some extent, with the aesthetics of the Raw Material exhibition, pixels is knowingly abstracted from the true nature of the computer screen medium. Presentation of the work offers the suggestion that each coloured square is a magnified representation of a computer screen pixel (as suggested by the colon after the title above the Flash stage). Under critical examination, however, the interactive behaviour of entity relocation exhibited in the work betrays that association. A computer screen consists of many pixels which, if one brings one's eye close enough to the screen, can be seen as individual squares that are arranged in the manner of a grid or matrix.[n3.5] The apparent appearance of visual forms on screen is achieved by the different colouration of many pixels across the surface; human perception is generally of the represented form(s), rather than of the picture's elements (pixels) within the medium.

[n3.5]   Not all pixels are square, but I have yet to work with a computer screen on which they are not.

In pixels, when a coloured square is seen to disappear from under the mouse-point at the same time as a square of that same colour appears in another location, the perception is (potentially/probably/intended to be) of a single entity that has relocated; this identification is supported in the auditory domain by the fact that the then appeared square would go on to trigger the very same sound as before should it be 'moused' again. These squares are, indeed, programmed into the system as 'sprite' entities, so that from the software perspective it really is the case that the same representation of a square is having its position on screen changed. It is thus that the suggested association of those squares as pixels is undermined: in the physical medium of a computer screen the pixels do not move, they only change colour.

One is led to consider that visual representation in software, on screen, is illusionary in every instance. 'There is no spoon' comes a popular refrain from the (Wachowski and Wachowski, 1999) film The Matrix, and as noted early in this thesis (§1.2.1) there is no sound on screen: only representations of it. Further, then, one may concede that even the representations of sound that appear on screen are only percepts stimulated in the eye of the beholder by the matrix of pixels that comprise the surface of a screen.

There are different approaches to GUI programming that software may take. Flash employs a paradigm of vector based graphics in which shapes are described in code as shapes with attributes such as size, position, and colour; the programmer need not think about the actual pixels that will be used produce the described shapes on screen, and there are many advantages to this approach. There is, however, an aesthetic appeal – as expressed by Head in the Raw Material exhibition – to the direct control of individual pixels, and many of my works incorporate an approach of matrix cell value manipulation.

3.1.5 Towards programming with matrices

One seldom thinks of the individual pixels on screen, of how each pixel is changing colour at its fixed position, but rather one accepts as concrete the visual forms that appear upon the display. Similarly, one seldom thinks in terms of the additive RGB components that create a colour that is seen on screen. By conceptualising visual representations in software from the ground up, this project has sought to nurture a sense of connection to that aspect of the software medium, as in the type of connection one may expect of a composer to the sounds with which they compose. The influence on this project of Tim Head's aesthetic can perhaps be identified in this sentiment, but the motivation of my exploration in this area is to appreciate the nature of the screen medium as the material basis for visual representations of sound in the software medium.

Data matrices that can correspond directly to the actual pixels on screen are a conceptual starting point from which are programmed software systems that set matrix cell values in order to manifest visual forms on screen. The RGB channels of the GUI domain can be thought of as discrete layers. At a coding level, the three values that describe the colour of a pixel in an image are often accompanied by a fourth value, known as alpha, that is used to describe transparency (or opacity). When working with Jitter matrices in MaxMSP the arrangement of those four layers follows the ARGB format in four planes; the 'char' data type is used because its 8-bit size is sufficient for the numeric values that comprise the four layers of each cell in a grid matrix that is to be displayed as an image on screen. For example, a Jitter matrix that is given the name 'm', and the horizontal and vertical dimensions that match the classic VGA standard,[n3.6] would be declared in the code of a maxpat as jit.matrix m 4 char 640 480. 'The planes are numbered from 0 to 3, so the alpha channel is in plane 0, and the RGB channels are in planes 1, 2, and 3.'[n3.7] This project has been mostly interested in working with just the RGB planes from which the GUI is manifest on screen; Awareness of the alpha plane is necessary in programming, but it is considered ancillary to the visual manifestation.

[n3.6]   The VGA standard resolution is commonly known to be 640 (wide) by 480 pixels; see for example (Christensson, n.d.)

[n3.7]   From 'What is a Matrix?' in Max 5 Help and Documentation accessed within MaxMSP, but also online at


← 3: In software, on screen

3.2 Visyn →