A Smarter Guitar Instruction Program

guitar anim

Here is an academic theory paper I completed that discusses the research and design of a guitar instruction program.

Guitar Genius

Theory Paper by Melissa Pelletier


I along with a collaborator have designed a guitar instruction computer program for use in school classrooms.  This program can be integrated into a school’s existing curriculum as a tool for beginning guitar players with the guidance and aid of their instructor.  In order to learn to play the guitar, one must combine learning content, theory, and musical notation, with playing the actual instrument.  Inherent in the process, there is a separation between the learning source and the instrument to practice on.  It is important to present the essential material in a format that will minimize the amount of extraneous cognitive load.

The aim of instruction should be to reduce extraneous cognitive load caused by inappropriate instructional procedures (Sweller, 2005) as created by the process of switching back and forth between the content and the instrument.   Learning how to play an instrument can impose great demands on the working memory, and cognitive load can be further exacerbated by poor instructional design.

We are designing our program based on the cognitive theory of multimedia learning, stating that people learn from words (verbal material) and pictures better than they do from words alone (Mayer, p 31 2005).  Conventional music instruction programs have presented aspects of lessons (listening to music, explaining techniques, reading music) as discrete activities – sometimes in different media sources.  While this approach may seem more organized and intuitive, it does not account for the limitations of the working memory.

A better instructional design would simultaneously combine elements of listening to, learning how to play, and reading music in one  technological source.  We are using a computer based platform that will provide a multimedia, multi-modal environment for guitar instruction.  With the flexibility and creativity allowed with designing a computer based program we will be providing all of the necessary components of effective instruction.

We have utilized a number of theories and principles in the creation of our computer program, however not all of the concepts are mentioned in this paper  The student will be presented with narrated lessons heard concurrently with an animation on screen instructing them how to play the guitar (modality principle; Low, Sweller, 2005), they will also be presented with content, notations and animations on the same screen to avoid extraneous cognitive load by switching back and forth between content areas (spatial contiguity; Mayer, 2005 p. 184), and their learning will be improved by being given the control over the pacing of the instruction (learner control – pacing; Plass et al, 2009).  Furthermore, we base our design and selection of theories on the dual coding theory (Paivio, 1986)

As in many studies, cognitive load theory suggests that many instructional designs are ineffective because they ignore universal and fundamental aspects of cognition (Sweller, 2005).  In music, schemas represent the tendency for humans to find coherence in a continuous stream of auditory events (Deliege, 2001).  A schema is “a data structure for representing the generic concepts stored in memory” (Rumelhart, 1980, p 34).

Each individuals schemata are unique, because schema is based largely upon individual experience.  People bring to tasks imprecise, partial, and idiosyncratic understandings that evolve with experience.  Additionally, these understandings are utilitarian for the most part, rather than necessarily accurate (Driscoll, 2005, p. 130).

While everyone’s schemata may be unique and seemingly impossible to test empirically, there are still ways to explain musical schema acquisition.   The capacity of schemas to represent complex musical knowledge (Dowling and Harwood, 1986) helps us to understand and to anticipate what we see and hear, and therefore provides a scaffold for applying meaning to musical information (Boltz, 2001; Kessler, Hansen and Shepard, 1984).

Dual Coding Theory:

An individual’s musical schema may be developed through a number of models for the storage of information in long-term memory.  One theory is the dual-code model of long term memory.  There are two systems of memory representation, one for verbal information and the other for nonverbal information (Paivio, 1971).  Thus, for input such as concrete words, two codes are possible.

The meaning of the words can be represented by the verbal system, but images of the words can also be represented by the imaginal system.  Kosslyn (1980) suggested that images may be important to learning in enabling learners to represent what is not depicted in the instruction and then to transform these representations to facilitate comprehension and problem solving.

The abstract memory of both the sounds and images associated with playing music, and musical notation may be generated through the dual code model.  It is incorporated into the cognitive theory of multimedia learning by proposing that the human information-processing systems contain an auditory/verbal channel and a visual/pictorial channel. (Mayer, 2005, p. 33)

There are two ways of conceptualizing the differences between the two channels – one based on presentation modes and the other based on sensory modalities.

The presentation mode approach focuses on whether the presented stimulus is verbal (such as spoken or printed words) or nonverbal (such as pictures, video, animation, or background sounds).  One channel processes verbal material and the other channel processes pictorial material and nonverbal sounds.  This conceptualization is most consisted with Paivios’s (1986) distinction between verbal and nonverbal systems.  In contrast, the sensory-modality approach focusses on whether learners initially process the presented materials through their eyes or ears.

According to the sensory-modality approach, one channel processes visually represented material and the other channel processes auditorily represented material.  This conceptualization is most consistent with Baddeley’s (1986, 1999) distinction between the visuospatial sketchpad and the phonological (or articulatory) loop.

Presentation mode approach focuses on the format of the stimulus-as-presented, the sensory-modality approach focuses on the stimulus-as-represented in working memory.  The major difference concerning multimedia learning rests in the processing of printed words and background sounds.  On screen text is initially processed in the verbal channel in the presentation mode approach but in the visual channel in the sensory-modality approach.  Background sounds, including nonverbal music, are initially processed in the nonverbal channel in the presentation mode approach but in the auditory channel in the sensory mode approach.  (Mayer, 2005, p 35)

There appears to be dual coding of auditory music and speech (Deutsch, 1970; Sharps and Pollitt, 1998).  Berz (1995) proposed a modified conceptualization of Baddley’s (1990) working memory model in which an additional slave system exists for the storage and processing of music.  Salame and Baddeley (1989) suggested the possibility that a filter differentially allows access to a range of acoustic cues.  What is more, because we can hear and remember sounds that are distinct from speech, they also reasoned that a separate acoustic store might be available for dealing with this type of material.

Modality Principle:

Under certain, well-defined conditions, presenting information in visual mode and other information in auditory mode can expand effective working memory capacity and so reduce the effects of an excessive cognitive load (Low, Sweller, 2005, p 147).  This effect is called the modality effect or modality principle.  The instructional version of the modality effect derives from the split attention effect, a phenomenon explicable by cognitive load theory.

It occurs when multiple sources of information that must be mentally integrated before they can be understood have written (and therefore visual) information presented in spoken (and therefore auditory) form.  According to the modality principle, students learn better when the associated statements are narrated rather than presented visually. (Low, Sweller, 2005, p. 147)

It may be possible to increase effective working memory capacity by presenting information both the visual and auditory mode rather than one single mode.  This technique can be utilized in a multimedia, computer based guitar instruction program by utilizing a narration for all content, providing the musical notation in auditory format, while simultaneously presenting a visual animation of how to physically play the instrument.  If the content was presented in a visual format on screen, this would compete with the animation’s content for processing space in the learners visual-spatial sketch pad.

The occurrence of increased working memory capacity due to the employment of a dual, rather than a single mode of presentation, is termed the modality effect (Low, Sweller, 2005 p 150).  Allport, Antonis, and Reynolds (1972) reported two experiments that demonstrated that effective memory capacity was increased when a dual, rather than a single modality was used.

The results indicated that participants could repeat continuous speech while concurrently processing unrelated visual items.  In one experiment, participants were required to repeat an auditory prose passage (a task known as shadowing) while simultaneously committing to memory verbal or nonverbal material.  There were three sets of test items that required memorization: a list of 15 words presented orally, 15 words presented in a written form, and 15 photographs.

Results from the memorization task showed that, in the absence of shadowing, there was no significant difference in the memorization of orally presented words, visually presented words, or the photographs.  In contrast, when participants were required to shadow the auditory passage, memorization of the orally presented words declined significantly, while memorization of the visually presented words or photographs was not significantly affected.

Apparently, concurrent performance of two tasks is more impaired when the tasks are performed in the same modality than when they are performed in different modalities.  This research is useful to us in designing a guitar instruction program in that it points to the need for a combination of narration and on screen animation to maximize the effectiveness of the lessons in working memory.

The modality effect has also been found in research where the task performed concurrently with shadowing does not involve memory.  For example, Shaffer (1975) tested a skilled typist on typing a prose message while shadowing a different prose message.  Relative to the no-shadowing condition, the skilled typist could type a visually presented prose passage and simultaneously shadow a different auditorially presented prose message without a decrement in typing accuracy.

In contrast, when the prose message to be typed was auditory rather than visual, both typing and shadowing performance declined significantly.  If humans are capable of processing a higher amount of information by utilizing the auditory and visual channels at the same time, taking advantage of this will hopefully make the complicated task of learning the guitar seemingly easier.

Music does not necessarily offer the same conditions under which dual modal materials have been tested in the past.  The nature of auditory music often compels at least one information source to be delivered aurally, in contrast to language which can be presented in either an auditory or visual mode.  This situation effectively reverses the experimental comparison between formats typically used when testing for the modality effect: from a visual-visual (uni-modal presentation) to visual-auditory (dual modal presentation) comparison, to an auditory-auditory (uni-modal presentation) to auditory-visual (dual modal) comparison. (Owens, Sweller 2008)

Owens and Sweller (2008) conducted a study in which the principles of cognitive load theory were applied to the design of an alternative to conventional music instruction hypothesized to facilitate learning.  To test both the split attention and dual-modality hypotheses, three conditions were used in the experiment.

A conventional split-attention format placed musical notation above and separate to written explanations.  In contrast, an integrated format placed each explanatory statement directly adjacent to its associated item of musical notation.  A third dual-modal condition displayed the musical notation in an identical visual format to the integrated and split attention conditions; however, the explanatory statements were delivered on compact disc as auditory statements.

The experiment demonstrated that spatial integration of visual text and musical notation, and dual-modal delivery of auditory text and musical notation, were superior to the spatially separated placement of the same visual materials, demonstrating the modality effects as well as spatial contiguity.

Although these propositions are fully supported by the findings of the experiment,  they may contradict intuitive or traditional practices in music: listening to music, explaining concepts, and pointing out score references are often undertaken as ostensibly discrete activities, cross referencing where necessary to integrate relevant sources of information. (Low & Sweller, 2008)  This is especially relevant where understanding is required, rather than simply the identification or recall of related musical information.

Regardless of how attractive or instructionally neat discrete sources of information might first appear, holding in working memory sufficient real-time auditory material to facilitate the integration of individual elements, may quickly exhaust working memory resources.

Spatial Contiguity:

In addition to the modality effect mentioned in the above experiment, spatial contiguity was also utilized in order to improve the instructional effectiveness of the music lesson.  This is another technique we utilized in the design of our guitar instruction computer program.  The theoretical rationale for spatial contiguity is that it reduces the effort required to scan back and forth between the text and the graphic – a form of extraneous processing (Mayer, 2005, p 186).  Mayer (1989) had students read a paper based lesson on how brakes work and then took a transfer test.

For some students (integrated group), the words describing an action – such as a piston moving forward in the master cylinder – were placed next to the corresponding part of an illustration – such as the master cylinder.  For other students (separated group), each picture was at the top of a page and the corresponding words were in a paragraph at the bottom of the page.  The integrated group performed much better than the separated group on a transfer test, yielding an effect size of above 1.

Another study by Sweller, Chandler, Tierney & Cooper (1990) had students learn to solve geometry problems by examining worked examples.  In the integrated booklet, the text and symbols describing each step were placed next to the corresponding part of the geometry diagram, whereas in the separated booklet, the text and symbols describing each step were placed below the geometry diagram.  The time to solve a transfer problem was less for students who learned with the integrated rather than the separated booklet.

There is strong an consistent support for the spatial contiguity principle: people learn more deeply from a multimedia message when corresponding text and pictures are presented near rather than far from each other on the page or screen.  We are utilizing these findings in our program by integrating notation on or next to images of the corresponding pieces of the guitar, and situating all content with the perspective of the guitar from the players point of view.

Presenting content and animations separately requires to learner to hold onto information long enough to apply it to new content.   If the activities are spatially integrated, especially with such a cognitively demanding activity as learning a musical instrument that requires the acquisition of a tactile skill, the extraneous cognitive load will be reduced.  Many of the competing products on the market today, for teaching beginners how to play the guitar, are riddled with distractors that interfere with essential material.

Many programs use video tutorials that incorporate people playing the guitar as a model for the lesson to be learned.  A few problems we noted were that the guitar player is featured from a distance – making viewing the actual notes and strings being played difficult.  One particular program attempts to counteract this by splitting the screen four ways:  one screen shows the full image of the person playing from a distance, one shows a close up of the chords or notes being played on the fretboard, one shows the guitar being strummed, and a forth image is the name and logo for the program being used.

This is a nice idea in theory, but the images and content compete for visual attention.  The lesson could be formatted differently, such as: taking out the split screen and providing an image of the whole guitar from the students perspective.  This image of the guitar could then zoom in to the fretboard to show the finger placement, then pan right to the strumming animation (which would not be very distinct from other strumming exercises), then zoom back out to the image of the entire guitar.  The screen would never jump from one image to another, would not split the guitar up into separate images and would keep the perspective the same at all times.  All content would be visually integrated with these images.

Learner Control – Pacing:

The emerging learner control of pacing principle states the learning is improved when learners are given   control over the pacing of information through features such as start/pause/stop buttons (Plass, et al 2009).   The high element interactivity inherent in music instruction places limits on working memory capabilities.  Intrinsic cognitive load, arises from the intrinsic complexity of information (Sweller, 2005, p 27).

Information is complex if multiple elements must be simultaneously processed in working memory because they interact, resulting in a heavy intrinsic cognitive load (Sweller, 1994).  Time signatures in music form an example of material that imposes a heavy intrinsic cognitive load (Owens, Sweller, 2008).

Comprehending finger placement on the fretboard in relation to the corresponding notes and anticipating the next finger placement to create the next note or chord in a song are such activities with both high element interactivity and high intrinsic cognitive load for music instruction.  Even though we have spatially integrated these activities in one technological source and on one screen, they may still put great demands on working memory.

Utilizing learner control of pacing is a method to soften these demands.  We have done this by designing every lesson in our program in video format that can be started/stopped/paused at the learner’s discretion.  On the bottom of each screen will be a conventional video control with a play/pause button and a progress bar.  The progress bar includes a small circle that designates the progress of the video lesson.

Some studies on learner control of pacing have indicated that even the feeling of being in control over one’s learning can improve comprehension of the animation (Plass et al, 2009). The quick pacing and rigidity of the sequence of non interactive dynamic representations places heavy demands on working memory, as information presented at earlier stages in the animation must be stored and then integrated with information presented at later stages (Hegarty, 2004).

Research support for this effect has been found in several studies. For example, Mayer and Chandler (2001) compared a continuous 140-s animation without user control to a version of the same animation that was divided into 16 parts, with a ‘‘continue’’ button, which allowed users to advance the presentation from one part to the next.

The results showed that students with learner control outperformed those who did not have any control over the pacing of the animations, and that transfer test performance was higher, and cognitive load reduced, when learners first received the version of the animation with learner control and then the version without control, as compared to the reverse order.

Another consideration of utilizing the principle for our guitar instruction program is that it allows a learner to speed through or skip parts of a presentation that he or she perceives as easy, and to focus on the more difficult parts (Schwan and Riempp, 2004).  This can avoid a redundancy effect through learner control.  The redundancy effect occurs when additional information presented to learners results in learning decrements compared to the presentation of less information (Sweller, 2005, p 159).

We are assuming most if not all of our learners with this program are novices and have no guitar experience.  However, we take into account the possibility of our learners having some prior knowledge.  Allowing them to skip past sections of the lessons if they are already familiar with the content is an added benefit of utilizing learner control of pacing.


Programs intended to teach a musical instrument are intrinsically limited in terms of working memory capacity.  One must process content from an outside source not just in working memory to be integrated into a musical schemata, but in such a way that a musical, tactile skill is concurrently developed.  This limitation should point to the need to maximize the effectiveness of any medium used to teach a musical instrument.  Utilizing a multimedia computer program allows for the simultaneous processing of visual and auditory data.

We are designing our program based on this dual-channel assumption by including the modality principle.  We are eliminating the need to spatially integrate the content from two or more locations, and thereby reduce extraneous cognitive load, with the use spatial contiguity.  Furthermore, by creating a simple, streamlined design, free of clutter, confusing choices, or embellishments, we are utilizing the coherence principle.  Combined, these principles should make the complicated task of learning to play the guitar in a classroom environment somewhat easier.


Allport, D.A., Antonis, B., & Reynolds, P. (1972). On the division of attention: A disproof of the single channel hypothesis. Quarterly Journal of Experimental Psychology, 24, 225-235.

Baddeley, A.D. (1986). Working Memory. Oxford, England: Oxford University Press

Baddeley, A. (1990). Human memory: Theory and practice. Boston, MA: Allyn & Bacon.

Baddeley, A.D. (1999). Human Memory. Boston: Allyn & Bacon.

Berz, W.L. (1995). Working memory in music: A theoretical model. Music Perception, 12, 353-364

Boltz, M. (2001). Musical soundtracks as a schematic influence on the cognitive processing of filmed events. Music Perception, 18, 427-454

Clark, R.E., & Paivio, A. (1991). Dual coding theory and education. Educational Psychology Review, 3, 149-210

Deliege, I. (2001). Introduction: Similarity perception, categorization, cue abstraction. Music Perception, 18, 233-243

Deutsch, D. (1970). Tones and numbers: Specifically of interference in intermediate memory. Science, 168,1604-1605.

Dowling, W.J., & Harwood, D.L. (1986). Music cognition. Orlando, FL: Academic Press

Driscoll, M.P. (2005). Meaningful Learning and Schema Theory. In Psychology of Learning for Instruction, (p. 130). Boston: Pearson Education, Inc.

Hegarty, M. (2004). Dynamic visualizations and learning: Getting to the difficult questions. Learning and Instruction, 14(3), 343–351.

Kessler, E.J., Hansen, C., & Shepard, R.N. (1984). Tonal schemata in the perception of music in Bali and in the West. Music Perception, 2, 131-165

Kosslyn, S.M. (1980). Image and mind. Cambridge, MA: Harvard University Press.

Low, R., & Sweller, J. (2005). The Modality Principle in Multimedia Learning. In R.E. Mayer (Ed.), Cambridge Handbook of Multimedia Learning  (pp. 147-158). New York: Cambridge University Press.

Mayer, R.E. (1989). Systematic thinking fostered by illustrations in scientific text. Journal of Educational Psychology, 81 240-246

Mayer, R.E. (2005). Cognitive Theory of Multimedia Learning. In Cambridge Handbook of Multimedia Learning (pp. 31-48). New York: Cambridge University Press.

Mayer, R.E. (2005). Principles for Reducing Extraneous Processing in Multimedia Learning: Coherence, Signaling, Redundancy, Spatial Contiguity, and Temporal Contiguity Principles. In Cambridge       Handbook of Multimedia Learning (pp. 183-200)

Mayer, R.E., Bove, W., Bryman, A., Mars, R., & Tapangco, L. (1996). When less is more: Meaningful learning from visual and verbal summaries of science textbook lessons. Journal of Educational Psychology, 88,    64-73

Mayer, R. E., & Chandler, P. (2001). When learning is just a click away: Does simple user interaction foster deeper understanding of multimedia messages? Journal of Educational Psychology, 93(2), 390–397.

Owens, P., & Sweller, J. (2008). Cognitive load theory and music instruction. Educational Psychology, 28, 29-45

Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart & Winston.

Paivio, A. (1986). Mental representations: A dual coding approach. New York: Oxford University Press

Plass, J., Homer, B.D., Hayward, E.O. (2009). Design factors for Educationally Effective Animations and Simulations. US Government, Springer.

Rumelhart, D.E. (1980). Schemata: The building blocks of cognition.  In R.J. Spiro, B.C. Bruce, & W.F. Brewer (Eds.), Theoretical issues in reading comprehension. Hillsdale, NJ: Erlbaum.

Salame, P., & Baddeley, A. (1989). Effects of background music on phonological short-term memory. The Quarterly Journal of Experimental Psychology, 41A, 107-122.

Schwan, S., & Riempp, R. (2004). The cognitive benefits of interactive videos: Learning to tie nautical knots. Learning and Instruction, 14, 293–305.

Shaffer, L.H. (1975). Multiple attention in continuous verbal tasks. In P.M.A. Rabbitt & S. Dornic (Eds.),   Attention and performance V (pp. 157-167). London: Academic Press.

Sharps, M.J., & Pollitt, B.K. (1998). Category superiority effects and the processing of auditory images.  The Journal of General Psychology, 125, 109-116.

Sweller, J. (1994). Cognitive Load Theory, Learning Difficulty, and Instructional Design. Learning and Instruction, 4, 295-312

Sweller, J. (1999). Instructional design in technical areas. Camberwell, Australia: ACER Press.

Sweller, J. (2005). Implications of Cognitive Load Theory for Multimedia Learning. In R.E. Mayer (Ed.),   Cambridge Handbook of Multimedia Learning (pp. 19 – 30). New York: Cambridge University Press.

Sweller, J. (2005). The Redundancy Principle in Multimedia Learning.  In R.E. Mayer (Ed.), Cambridge    Handbook of Multimedia Learning (pp. 159-167). New York: Cambridge University Press.

Sweller, J., Chandler, P., Tierney, P., & Cooper, M. (1990). Cognitive load and selective attention as factors in the structuring of technical material. Journal of Experimental Psychology: General, 119, 176-192

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s