• Visual Analysis: An Ethnomethodological Approach

    2006-08-22

    Tag:

    版权声明:转载时请以超链接形式标明文章原始出处和作者信息及本声明
    http://dushijue.blogbus.com/logs/3114756.html

    Practices of Seeing:
    Visual Analysis: An Ethnomethodological Approach
    Charles Goodwin
    Applied Linguistics
    UCLA
    cgoodwin@humnet.ucla.edu
    Pp. 157-182 in
    Handbook of Visual Analysis
    edited by Theo van Leeuwen and Carey Jewitt
    London: Sage Publications
    2000
    © Charles Goodwin
    Practices of Seeing
    Visual Analysis: An Ethnomethodological Approach
    Charles Goodwin
    A primordial site for the analysis of human language, cognition and action
    consists of a situation in which multiple participants are attempting to carry out
    courses of action together while attending to each other, the larger activities that
    their current actions are embedded within, and relevant phenomena in their
    surround. Vision can be central to this process.1. The visible bodies of
    participants provide systematic, changing displays about relevant action and
    orientation. Seeable structure in the environment can not only constitute a locus
    for shared visual attention, but can also contribute crucial semiotic resources for
    the organization of current action (consider for example the use of graphs and
    charts in a scientific discussion). For the past thirty years both Conversation
    Analysis and Ethnomethodology have provided extensive analysis of how
    human vision is socially organized. Both fields investigate the practices that
    participants use to build and shape in concert with each other the structured
    events that constitute the lifeworld of a community of actors. Phenomena
    investigated in which vision plays a central role range from sequences of talk, to
    medical and legal encounters, to scientific knowledge.
    The approach taken by both ethnomethodology and conversation analysis to
    the study of visual phenomena is quite distinctive. At least since Saussure
    proposed studying langue as an analytically distinct subfield of a more
    encompassing science of signs, different kinds of semiotic phenomena (language,
    visual signs, etc.) have typically been analyzed in isolation from each other.
    However in the work to be described here neither vision, nor the images or other
    phenomena that participants look at, are treated as coherent, self-contained
    1 Vision is not, however, essential as both the competence of the blind and
    telephone conversations demonstrate. Below it will be argued that situated
    action is accomplished through the juxtaposition of multiple semiotic fields,
    only some of which make vision relevant.
    Final Printed page

    numbers in margin.
    p. 157
    2
    domains that can be subjected to analysis in their own terms. Instead it quickly
    becomes apparent that visual phenomena can only be investigated by taking into
    account a diverse set of semiotic resources and meaning-making practices that
    participants deploy to build the social worlds that they inhabit and constitute
    through ongoing processes of action. Many of these, such as structure provided
    by current talk, are not in any sense visual, but the visible phenomena that the
    participants are attending to cannot be properly analyzed without them. The
    focus of analysis is not thus not representations or vision per se, but instead the
    part played by visual phenomena in the production of meaningful action.
    Both the methodology and the forms of analysis used in this approach can
    best be demonstrated through specific examples.
    Gaze between Speakers and Hearers
    In formulating the distinction between competence and performance Chomsky
    (1965: 3-4) argued that actual speech is so full of performance errors, such as
    sentence fragments, restarts and pauses, that both linguists and parties faced
    with the task of acquiring a language should ignore it. Investigating a corpus of
    conversation recorded on video Goodwin (1980a, 1981, Chapter 2) indeed found
    precisely the “false starts” and “changes of plan in mid-course” that Chomsky
    describes. In the following instead of producing an unbroken grammatical
    sentence the speaker says:2
    2 Talk is transcribed using the system developed by Gail Jefferson (see Sacks,
    Schegloff and Jefferson Sacks, et al. 1974: 731-733). Talk receiving some form
    of emphasis is marked with u nderlining or bold italics. Punctuation is used to
    transcribe intonation: A period indicates falling pitch, a question mark rising
    pitch, and a comma a falling contour, as would be found for example after a
    non-terminal item in a list. A colon indicates lengthening of the current sound.
    A dash marks the sudden cut-off of the current sound (in English it is
    frequently realized as glottal stop). Comments (e.g., descriptions of relevant
    nonvocal behavior) are printed in italics within double parentheses. Numbers
    within single parentheses mark silences in seconds and tenths of a second. A
    degree sign (°) indicates that the talk that follows is being spoken with low
    p. 158
    3
    Cathy: En a couple of girls- O ne other girl from there,
    However, when the video is examined it is found that the restart occurs at a
    specific place: precisely at the point where the speaker brings her gaze to her
    addressee, and finds that her addressee is looking elsewhere:
    Pam: En a couple of girls- One other girl from the:re,
    Speaker Brings
    Gaze to
    Recipient
    Restart
    Hearer Looking
    Ann: Away
    Hearer Starts
    Moving Gaze
    to Speaker
    Gaze Arrives
    Moreover, the restart acts as a request for the Hearer’s gaze. Thus immediately
    after the restart the hearer starts to move her gaze to the speaker.
    Paradoxically, if the speaker had not produced a restart at this point she
    could have said something that would appear to be an unbroken grammatical
    unit if one examined only the stream of speech (e.g., “En a couple of girls from
    there …”), but which would in fact be interactively a sentence fragment since her
    addressee attended to only part of it.
    The identities of speaker and hearer are the most generic participant
    categories relevant to the production of a strip of talk. The phenomena examined
    here (which occur pervasively in conversation) provide evidence that the work of
    volume. Left brackets connecting talk by different speakers mark the point
    where overlap begins.
    p. 159
    4
    being a hearer in face-to-face interaction requires situated use of the body, and
    gaze in particular, as a way of visibly displaying to others the focus of one’s
    orientation. Moreover speakers not only use their own gaze to see relevant action
    in the body of a silent hearer, but actively change the structure of their emerging
    talk in terms of what they see.
    What relevance do processes such as these have to the other issue raised by
    Chomsky (1965 :3), that of determining “from the data of performance the
    underlying system of rules mastered by the speaker-hearer”? Many repairs
    involve the repetition, with some significant change, of something said elsewhere
    in the utterance:
    We wen t- I went to
    If he could- If you could
    Such repetition has the effect of delineating the boundaries and structure of
    many different units in the stream of speech. Thus, by analyzing what is the
    same and what is different in these examples one is able to discover: First, where
    the stream of speech can be divided into significant subunits; second, that
    alternatives are possible in a particular slot; third, what some of these
    alternatives are (here different pronouns); and fourth, that these alternatives
    contrast with each other in some significant fashion, or else the repair would not
    be warranted. Repairs in other examples not only delineate basic units in the
    stream of speech (noun phrases for example), but also demonstrate the different
    forms such units can take, and the types of operations that can be performed
    upon them (see Goodwin 1981 :170-173). Repairs further require that a listener
    learn to recognize that not all of the sequences within the stream of speech are
    possible sequences within the language, e.g., that “I” does not follow “to” in “We
    went t- I went to …”. In order to deal with such a repair a hearer is thus required
    to make one of the most basic distinctions posed for anyone attempting to
    decipher the structure of a language: to differentiate what are and are not
    possible sequences in the language, that is between grammatical and
    ungrammatical structures. The fact that this task is posed may be crucial to any
    learning process. If the party attempting to learn the language did not have to
    5
    deal with ungrammatical possibilities, if for example she was exposed to only
    well-formed sentences, she might not have the data necessary to determine the
    boundaries, or even the structure of the system. Chomsky’s argument that the
    repairs found in natural speech so flaw it that a child is faced with data of very
    “degenerate quality” does not appear warranted. Rather it might be argued that
    if a child grew up in an ideal world where she heard only well-formed sentences
    she would not learn to produce sentences herself because she would lack the
    analysis of their structure provided by events such as repair. Crucial to this
    process is the way in which visual phenomena, such as dispreferred gaze states
    can both lead to repair, and demonstrate that the participants are in fact
    attending in fine detail to what might appear to be quite ephemeral structure in
    the stream of speech.
    What has just been described provides one example of the methodologies and
    forms of analysis used to investigate visual phenomena within Conversation
    Analysis. Several observations can be made. First, the focus of analysis is not
    visual events in isolation, but instead the systematic practices used by
    participants in interaction to achieve courses of collaborative action with each
    other — in the present case the interactive construction of turns at talk, and the
    utterances that emerge with those turns. Visual events, such as gaze, play a
    central role in this process but their sense and relevance is established through
    their embeddedness in other meaning making tasks and practices, such as the
    production of a strip of talk that is in fact heard and attended to by its addressee.
    This links vision to a host of other phenomena including language and the visible
    body as an unfolding locus for the display of meaning and action. Second, what
    the analyst seeks to do is not to provide his or her own gloss on how visual
    phenomena might be meaningful, but instead to demonstrate how the
    participants themselves not only actively orient to particular kinds of visual
    events (such as states of gaze), but use them as a constitutive feature of the
    activities they are engaged in (for example by modifying their talk in terms of
    what they demonstrably see). Third, in addition to the spatial dimension that is
    naturally associated with vision, these processes also have an intrinsic temporal
    dimension as changes in visual events are marked by, and lead to, ongoing
    changes in the organization of emerging action. If one had only a static snapshot,
    p. 160
    6
    or measured only a single structural possibility, such as mutual gaze instead of
    looking at the temporally unfolding interplay of different combinations of
    participant gaze, the type of analysis being pursued here would be impossible.
    Fourth, such analysis requires data of a particular type, specifically a record that
    maintains as much information as possible about the setting, embodied displays
    and spatial organization of all relevant participants, their talk, and how events
    change through time. In practice no record is completely adequate. Every camera
    position excludes other views of what is happening. The choice of where to place
    the camera is but the first in a long series of crucial analytical decisions. Despite
    these limitations a video or film record does constitute a relevant data source,
    something that can be worked with in an imperfect world.
    Fifth, crucial problems of transcription are posed. The task of translating the
    situated, embodied practices used by participants in interaction to organize
    phenomena relevant to vision poses enormous theoretical and methodological
    problems. Our ability to transcribe talk is built upon a process of analyzing
    relevant structure in the stream of speech, and marking those distinctions with
    written symbols, that extends back thousands of years, and is still being modified
    today (for example the system developed by Gail Jefferson (Sacks, Schegloff and
    Jefferson 1974: 731-733) for transcribing the texture of talk-in-interaction,
    including phenomena, such as momentary restarts and sound stretches, that are
    crucial for the analysis being reported here). When it comes to the transcription
    of visual phenomena we are at the very beginning of such a process. The arrows
    and other symbols I’ve used to mark gaze on a transcript (see Goodwin 1981)
    capture only a small part of a larger complex constituted by bodies interacting
    together in a relevant setting. The decision to describe gaze in terms of the
    speaker-hearer framework is itself a major analytic one, and by no means simple,
    neutral description. Moreover a gazing head is embedded within a larger
    postural configuration, and indeed different parts of the body can
    simultaneously display orientation to different participants or regions (see
    Kendon 1990b, Schegloff 1998), creating participation frameworks of
    considerable complexity. Thus on occasion a transcriber wants some way of
    indicating on the printed page posture and alignment. In addition, not only the
    bodies of the participants, but also phenomena in their surround, can be crucial
    p. 161
    7
    to the organization of their action. To try to make the phenomena I’m analyzing
    independently accessible to the reader so that she or he can evaluate my analysis,
    I’ve experimented with using transcription symbols, frame grabs, diagrams, and
    movies embedded in electronic versions of papers. Multiple issues are involved
    and no method is entirely successful. On the one hand the analyst needs
    materials that maintain as much of the original structure of the events being
    analyzed as possible, and which can be easily and repetitively replayed. On the
    other hand, just as a raw tape recording does not display the analysis of
    segmental structure in the stream of speech provided by transcription with a
    phonetic or alphabetic writing system, in itself a video, even one that can be
    embedded within a paper, does not provide an analysis of how visible events are
    being parsed by participants. The complexity of the phenomena involved
    requires multiple methods for rendering relevant distinctions (e.g., accurate
    transcription of speech, gaze notation, frame grabs, diagrams, etc., see also Ochs
    1979). Moreover, like the two-faced Roman god Janus, any transcription system
    must attend simultaneously to two separate fields, looking in one direction at
    how to accurately recover through a systematic notation the endogenous
    structure of the events being investigated, while simultaneously keeping another
    eye on the addressee/reader of the analysis by attempting to present relevant
    descriptions as clearly and vividly as possible. In many cases different stages of
    analysis and presentation will require multiple transcriptions. There is a
    recursive interplay between analysis and methods of description.
    Work in Conversation Analysis has provided extensive study of how the gaze
    of participants toward each other is consequential for the organization of action
    within talk-in-interaction. Phenomena investigated include the way in which
    speakers change the structure of an emerging utterance, and the sentence being
    constructed within it, as gaze is moved from one type of recipient to another, so
    that the utterance maintains its appropriateness for its addressee of the moment
    (Goodwin 1979, 1981); how speakers modify descriptions in terms of their
    hearer’s visible assessment of what is being said (M.H. Goodwin 1980b); how
    genres such as stories are constructed not by a speaker alone, but instead
    through the differentiated visible displays of a range of structurally different
    kinds of recipients (speaker, primary addressee, principal character, etc. See
    8
    Goodwin 1984); the organization of gaze and co-participation in medical
    encounters (Heath 1986, Robinson 1998); the interactive organization of
    assessments (Goodwin and Goodwin 1987), gesture (Goodwin in press, Streeck
    1993, 1994), the use of gaze in activities such as word searches (M.H. Goodwin
    and C. Goodwin 1986), etc. Though not strictly lodged within Conversation
    Analysis the work of Kendon (1990a. 1994, 1997) on both the interactive
    organization of bodies as they frame states of talk, and on gesture, is central to
    the study of visible behavior in interaction. Haviland (1993) provides important
    analysis of the interactive organization of gesture within narration (for extensive
    analysis of gesture from a psychological perspective see McNeill 1992).
    Scientific Images
    The visible, gazing body, and the orientation of participants toward each
    other as they co-produce states of talk is central to the work in
    ConversationalAnalysis just examined. By way of contrast much work within
    Ethnomethodology has focused not on the bodies of actors, but instead on the
    images, diagrams, graphs and other visual practices used by scientists to
    construct the crucial visual working environments of their disciplines. As noted
    by Lynch and Woolgar (1990:5):
    Manifestly, what scientists laboriously piece together, pick up in
    their hands, measure, show to one another, argue about, and
    circulate to others in their communities are not “natural objects”
    independent of cultural processes and literary forms. They are
    extracts, “tissue cultures,” and residues impressed within graphic
    matrices; ordered, shaped, and filtered samples; carefully aligned
    photographic traces and chart recordings; and verbal accounts.
    These are the proximal “things" taken into the laboratory and
    circulated in print and they are a rich repository of “social”
    actions.”
    Despite important differences in subject matter and methodology both fields
    emphasize the importance of focusing not on representations or other visual
    phenomena as self-contained entities in their own right, but instead on how they
    are constructed, attended to, and used by participants as components of the
    p. 162
    9
    endogenous activities that make up the lifeworld of a setting. Thus, in
    introducing their important volume on Representation in Scientific Practice Lynch
    and Woolgar (1990: 11) define their inquiry as follows:
    Instead of asking “what do we mean, in various contexts, by
    ‘representation’?” the studies begin by asking, “What do the participants,
    in this case, treat as representation?”
    Note that what must be investigated is specified both in terms of the orientation
    of the participants, and with respect to the features of the relevant local setting
    (e.g., “in this case”). This leads to a distinctive ethnomethodological perspective
    on reflexivity:
    “Reflexivity” in this usage means, not self-referential nor reflective
    awareness of representational practice, but the inseparability of a “theory”
    of representation from the heterogeneous social contexts in which
    representations are composed and used” (Lynch and Woolgar 1990 12).
    In a classic article Lynch (1990 :153-154) formulates the task of analyzing
    scientific representations as that of describing the publicly visible “externatized
    retina” that is the site for the practices implicated in the social constitution of the
    objects that are the focus of scientific work:
    This study is based on the premise that visual displays are more
    than a simple matter of supplying pictorial illustrations for
    scientific texts. They are essential to how scientific objects and
    orderly relationships are revealed and made analyzable. To
    appreciate this, we first need to wrest the idea of representation from
    an individualistic cognitive foundation, and to replace a
    preoccupation with images on the retina (or alternatively ‘mental
    images’ or ‘pictorial ideas’) with a focus on the ‘externalized retina’
    of the graphic and instrumental fields upon which the scientific
    image is impressed and circulated.
    Using as data images from scientific journal articles and books Lynch describes
    two families of practices used to constitute the visible scientific object: “selection”
    and “mathematization.” Selection, illustrated through double images in which a
    photograph and a diagram of entities visible in the photograph are presented
    side by side, is described as a host of practices that iteratively transform one
    p. 163
    10
    image of an entity into another (e.g. the photograph to the diagram) while
    simultaneously structuring and shaping what it is that is being represented.
    Crucial to this process is that fact that different selective/shaping practices,
    including Filtering, Uniforming, Upgrading and Defining can be repetitively
    applied creating not just a single image, but a linked, directional chain of
    representations Indeed much of the work of actually doing science consists in
    building and shaping what Latour (1986) (see also Latour and Woolgar 1979)
    have called inscriptions in this fashion. “Mathematization” refers not simply to
    the use of numbers, but instead to the host of practices used to transform
    recalcitrant events into mathematically tractable visual and graphic displays e.g.,
    graphs, charts and diagrams. Thus an image showing a map of lizard territories
    is assembled through, among other operations, driving stakes into the lizards’
    environment to create a grid for measurement (and thus injecting a scientifically
    relevant Cartesian space into the very habitat being studied), repetitively
    capturing lizards, distinguishing them from each other by cutting off a different
    pattern of toes on each lizard, recording each capture on a paper map of the
    staked out territory, and finally drawing lines around collections of points to
    create the map. As noted by Lynch (1990: 171) the product of these practices, e.g.,
    the published map, “is a hybrid object that is demonstrably mathematical,
    natural and literary.” Note how in all of these cases the focus of analysis is on the
    contextually based practices of the participants who are assembling and using
    these images to accomplish the work that defines their profession.
    Though emerging from psychological anthropology, rather than
    ethnomethdology, Hutchins’ (1995) ground breaking study of the cognitive
    practices required to navigate a ship outlines a major perspective for the analysis
    of both images and seeing as forms of work-relevant practice. Hutchins
    demonstrates how the practices required to navigate a ship are not situated
    within the mental life of a single individual, but are instead embedded within a
    distributed system that encompasses visual tools such as maps and instruments
    for juxtaposing a landmark and compass bearing within the same visual field,
    and actors in structurally different positions who use alternative tools and, in
    part because of this, perform different kinds of cognitive operations, many of
    11
    which have a strong visual component (e.g., locating landmarks, plotting
    positions on a map, etc.).
    Images in Interaction
    All of the work discussed so far takes as its point of departure for the
    investigation of visual phenomena the task of describing and analyzing the
    practices used by participants to construct the actions and events that make up
    their lifeworld. Rather than standing alone as a self-contained analytic domain,
    visual phenomena are constituted and made meaningful through the way in
    which they are embedded within this larger set of practices. However, within
    this common focus, two quite different orders of visual practice have been
    examined. Research in science studies has investigated the images produced by
    scientists, and the way in which they visually and mathematically structure the
    world that is the focus on their inquiry, without however looking in much detail
    at how scientists attend to each other as living, meaningful bodies, or structure
    what they are seeing through the organization of talk-in-interaction. By way of
    contrast studies of the interactive organization of vision in conversation looked
    in considerable detail at how participants treat the visual displays of each
    other’s bodies as consequential, and how this is relevant to the moment-bymoment
    production of talk, but did not focus much analysis on images in the
    environment. Clearly all of the phenomena noted — the visible body,
    participation, gesture, the details of talk and language use, visual structure in the
    surround, images, maps and other representational practices, the public
    organization of visual practice within the worklife of a profession, etc. — are
    relevant. The question arises as to whether it is possible to analyze such disparate
    phenomena within a coherent analytic framework.
    Before turning to studies that have probed such questions several issues must
    be noted. First, it is clearly not the case that the only acceptable analysis is one
    that includes this full range of all possible visual phenomena. Both participants
    and the structures that provide organization for action and events use visible
    phenomena selectively. Parties speaking over the telephone can see neither either
    other’s bodies nor events in a common surround. A scientific journal can be read
    in the absence of the parties who constructed its text and diagrams. More
    p. 164
    12
    interestingly within face-to-face interaction participants can continuously shift
    between actions that invoke, and perhaps require, gaze toward specific events in
    the surround, and those make relevant gaze toward no more than each other’s
    bodies, and even in this more limited case there may be a real issue as to whether
    it is relevant to attend to everything that a body does, e.g., some gestures made
    by a speaker may not require gaze toward them from an addressee. There is thus
    an essential contingency, not only for the analyst but more crucially for the
    participants themselves, as to what subset of possible visual events are in fact
    relevant to the organization of the actions of the moment. Moreover, this means
    that in addition to investigating how different kinds of visible phenomena are
    organized, the analyst must also take into account how participants show each
    other what kinds of events they are expected to take into account at a particular
    moment, for example to indicate that a participant, gesture, or entity in the
    surround should be gazed at. There is thus not only communication through
    vision, but also ongoing communication about relevant vision (Goodwin 1981,
    1986; in preparation, Streeck 1988).
    Second visual events are quite heterogeneous, not only in what they make
    visible, but more crucially in their structure. Consider for example the issue of
    temporality. Both gestures and the displays of postural orientation used to build
    participation frameworks are performed by the body within interaction.
    However, while gestures, like the bits of talk they accompany, are typically brief
    (e.g. they frequently fall within the scope of a single utterance) and display
    semantic content relevant to the topic of the moment, participation displays
    frame extended strips of talk and typically provide information about the
    participants’ orientation rather than the specifics of what is being discussed.
    Bodily displays with one kind of temporal duration (and information content)
    are thus embedded within another class of visual displays being made by the
    body which have a quite different structure.
    Third, the structure of visual signs, including their possibilities for
    propagation through space and time, can be intimately tied to the medium used
    to construct them. A major theme of Shakespeare’s sonnets focuses on the
    contrast between the temporally constrained human body, condemned to
    inevitable decay, and the (limited) possibilities for transcending such corruption
    p. 165
    13
    provided by language inscribed on the printed page which can remain fresh and
    alive long after its author and subject have passed into dust. This contrast
    between the temporal possibilities provided by alternative media (e.g., the body
    and documents) constitutes an ongoing resource for participants in vernacular
    settings as they build, through interaction with each other, the events that make
    up their lifeworld. In addition to the displays made by a fleeting gesture or local
    participation framework, participants also have access to images and documents
    which can encompass multiple interactions and quite diverse settings. This arises
    in part from the specific media used to constitute the signs they contain. Rather
    than being lodged within an ever changing human body, such documents
    constitute what Latour (1987: 223) has called immutable mobiles, portable
    material objects that can carry stable inscriptions of various types from place to
    place and through time.
    However, despite the way in which crucial aspects of the structure of images
    and documents remain constant in different environments, they are not selfcontained
    visual artifacts that can be analyzed in isolation from the processes of
    interaction and work practices through which they are made relevant and
    meaningful. The same image or document can be construed in quite different
    ways in alternative settings. For example, a schedule listing all arriving and
    departing flights was a major tool for almost all workgroups at the airport
    studied by the Xerox PARC workplace project (Brun-Cottan et al. 1991, Goodwin
    and Goodwin 1996, Suchman 1992), and indeed it linked diverse workers
    throughout North America into a common web of activity. However while
    baggage loaders carefully structured their work to anticipate arriving flights, so
    that planes could be speedily unloaded, these same arrival times were almost
    ignored by gate agents looking at the same schedule, but concerned with the
    departure of passengers. Each work group highlighted the common document in
    ways relevant to the specific work tasks it faced. Similarly, on the oceanographic
    ship reported in Goodwin (1995) a map showing where samples would be taken
    in the Atlantic at the mouth of the Amazon, was a major document at all stages
    of the research project. Before the ship sailed the places where samples could be
    taken was the focus of intense political debate between different groups of
    scientists and the Brazilian and American governments; after the project was
    14
    completed the map provided an infrastructure for graphic displays that could be
    used in published journal articles to show what the scientists had found about
    how the waters of the Amazon and the Atlantic interacted with each other, i.e., a
    way of making visible relevant scientific phenomena; during the voyage itself the
    map not only provided a common framework for the quite different work of
    various teams of scientists and the crew navigating the ship, but could also be
    looked at by lab technicians not able to go to bed for days at a time because of the
    map’s incessant sampling demands, to locate places where stations were far
    apart and rest was possible. In brief, though the material form of images and
    documents gives them an extended temporal scope, and the ability to travel from
    setting to setting, they cannot be analyzed as self-contained fields of visually
    organized meaning, but instead stand in a reflexive relationship to the settings
    and processes of embodied human interaction through which they are
    constituted as meaningful entities. To explicate such events analysis must deal
    simultaneously with the quite different structure and temporal organization of
    both local embodied practice and enduring graphic displays.
    Finally, the visual (and other properties) of settings structure environments
    that shape, on an historical time scale, the activities systematically performed
    within those settings. A very simple example is provided by the bridge of the
    oceanographic ship which not only had a window facing forward so the
    helmsman could steer the ship and watch for trouble, but also a window facing
    backwards. This was used by a winch operator who had the task of lifting heavy
    instrument packages in and out of the sea. Though being used here to do science,
    this arrangement is in fact a systematic solution to a repetitive problem faced by
    sailors, such as fishermen using nets, who have to maneuver heavy objects while
    a sea. Solutions found to these tasks, such as the rear facing window with the
    visual access it provides (as well as the forward window facilitating navigation),
    are built into the tools that constitute the work environments used by subsequent
    actors faced with similar tasks. See Hutchins (1995) for illuminating analysis of
    this process, including tools that visually structure complex mathematical
    calculations, as well as maps. Both work environments and many of the tools
    used within them (computer displays, etc.) structure in quite specific ways the
    embodied visual practices of those who inhabit such settings.
    p. 166

    15
    In an attempt to come to terms with such issues Goodwin (in press) has
    proposed that images in interaction are lodged within endogenous activity
    systems constituted through the ongoing, changing deployment of multiple
    semiotic fields which mutually elaborate each other. The term semiotic field is
    intended to focus on signs-in-their-media, i.e., the way in which what is typically
    been attended to are sign phenomena of various types (gestures, maps, displays
    of bodily orientation, etc.) which have variable structural properties that arise in
    part from the different kinds of materials used to make them visible (e.g., the
    body, talk, documents, etc.). Bringing signs lodged within different fields into a
    relationship of mutual elaboration produces locally relevant meaning and action
    that could not be accomplished by one sign system alone. Consider for example a
    place on a map indicated by a pointing finger which is being construed in a
    specific fashion by the accompanying talk. Neither the map as a whole, that is a
    self-sufficient representation, nor the pointing finger in isolation from a) its
    target (the spot on the map) and b) the construal being provided by the talk, nor
    the talk alone would be sufficient to constitute the action made visible by the
    conjoined use of the three semiotic fields, each of which provides resources for
    specifying how to relevantly see and understand the others (see the brief
    discussion of the Rodney King data below for a specific example; see Goodwin
    in preparation for more detailed analysis of pointing). The particular subset of
    semiotic fields available in a setting that participants orient to as relevant to the
    construction of the actions of the moment constitutes a contextual configuration.
    As interaction unfolds contextual configurations can change as new fields are
    added to, or dropped from, the specific mix being used to constitute the events of
    the moment. Thus, as contextual configurations change there is both unfolding
    public semiotic structure and contingency(and indeed in some circumstances
    actions can misfire when addressees fail to take into account a relevant semiotic
    field, such as the sequential organization provided by a prior unheard utterance
    – see Goodwin in preparation for an example).
    Professional Vision
    Work settings provide one environment in which the interplay between situated,
    embodied interaction, and the use of visual images of different types, can be
    p. 167
    16
    systematically investigated. In many work settings participants face the task of
    classifying visual phenomena in a way that is relevant to the work they are
    charged with performing. Frequently they must also construct different kinds of
    representations of visual structure in the environment that is the focus of their
    professional scrutiny. We will now briefly examine how such vision is socially
    organized in two tasks faced by archaeologists: 1) color classification and 2) Map
    making, and then look at how such professional vision was both constructed and
    contested in the trial of four policemen charged with beating an African
    American motorist, Mr. Rodney King. The key evidence at the trial was a
    videotape of the beating.
    Color Classification as Historically Structured Professional Practice
    As part of the work involved in excavating a site, archaeologists make maps
    showing relevant structure in the layers of dirt they uncover. In addition to
    artifacts, such as stone tools, archaeologists are also interested in features, such the
    remains of an old hearth or the outlines of the posts that held up a building. Such
    features are typically visible as color differences in the dirt being examined (e.g.,
    the remains of a cooking fire will be blacker than the surrounding soil, and the
    holes used for posts will also have a different color from the soil around them).
    Field archaeologists thus face the task of systematically classifying the color of
    the dirt they are excavating. The methods they use to accomplish this task
    constitute a form of professional visual practice. As demonstrated by the
    discussion of Lynch’s analysis of scientific representation, and the brief
    description of the oceanographers, crucial work in many different occupations
    takes the form of classifying and constructing visual phenomena in ways that
    help shape the objects of knowledge that are the focus of the work of a profession
    (e.g., architects, sailors plotting courses on charts, air traffic controllers,
    professors making graphs and overheads for talks and classes, etc.). Such
    professional vision constitutes a perspicuous site for systematic study of how
    different kinds of phenomena intersect to organize a community’s practices of
    seeing.
    Goodwin (1996, in press) describes how archaeologists code the color of the
    dirt they are excavating through use of a Munsell chart. The following shows two
    17
    archaeologists performing this task, the Munsell page that they are using, and
    the coding form where they will record their classification:
    17 Pam: En this one. ((Points at color patch))
    18 (0.4) ((Jeff moves trowel))
    19 Jeff: nuhhh?
    20 (1.8)
    21 Pam: Or that one? ((Points at color patch))
    Within this scene are a number of different kinds of phenomena relevant to
    the organization of visual practice, including tools that structure the process of
    seeing and classification, and documents that organize cognition and interaction
    in the current setting while linking these processes to larger activities and other
    settings. These archaeologists are intently examining the color of a tiny sample of
    dirt because they have been given a coding form to fill out. That form ties their
    work at this site to a range of other settings, such as the offices and lab of the
    senior investigator, where the form being filled in here will eventually become
    part of the permanent record of the excavation, and a component of subsequent
    analysis. The multivocality of this form, the way in which it displays on a single
    p. 168
    18
    surface the actions of multiple actors in structurally different positions, is shown
    visually in vivid fashion by the contrast between the printed coding categories,
    and the hand written entries of the field workers.
    The use of a coding form such as this to organize the perception of nature,
    events, or people within the discourse of a profession carries with it an array of
    perceptual and cognitive operations that have far reaching impact. Coding
    schemes distributed on forms allow a senior investigator to inscribe his or her
    perceptual distinctions into the work practices of the technicians who code the
    data. By using such a system a worker views the world from the perspective it
    establishes. Of all the possible ways that the earth could be looked at, the
    perceptual work of field workers using this form is focused on determining the
    exact color of a minute sample of dirt. They engage in active cognitive work, but
    the parameters of that work have been established by the classification system
    that is organizing their perception. In so far as the coding scheme establishes an
    orientation toward the world, a work-relevant way of seeing, it constitutes a
    structure of intentionality whose proper locus is not the isolated, Cartesian mind,
    but a much larger organizational system, one that is characteristically mediated
    through mundane bureaucratic documents such as this form.
    Rather than standing alone as self-explicating textual objects, forms are
    embedded within webs of socially organized, situated practices. In order to make
    an entry in the slot provided for color an archaeologist must make use of another
    tool, the set of standard color samples provided by a Munsell chart. This chart
    incorporates into a portable physical object the results of a long history of
    scientific investigation of the properties of color.
    The Munsell chart being used by the archaeologists contains not just one, but
    three different kinds of sign systems for describing each point in the color space
    it provides: 1) a set of carefully controlled color samples arranged in a grid to
    demonstrate the changes that result from systematic variation of the variables of
    Hue , Chroma and Value used to define each color (each page displays an
    ordered set of Value and Chroma variables for a single hue); 2) numeric
    coordinates for each row and column, the intersection of what specifies each
    square as a pair of numbers (e.g., 4/6 on the 10YR Hue page); and 3) standard
    color names such as “dark yellowish brown” (these names are on the left facing
    p. 169
    19
    page which is not reproduced here). Moreover these systems are not precisely
    equivalent to each other. For example several color squares can fall within the
    scope of a single name.
    Why does the Munsell page contain multiple, overlapping representation of
    what is apparently the same visual entity (e.g., a particular choice within a larger
    set of color categories)? The answer seems to like in the way that each
    representation as a semiotic field with its own distinctive properties makes
    possible alternative operations and actions, and thus fits into different kinds of
    activities. Both the names and numbered grid coordinates can be written, and
    thus easily transported from the actual excavation to the other work sites, such as
    laboratories and journals, that constitute archaeology as a profession. The
    numbers provide the most precise description, and do not require translation
    from language to language. However locating the color indexed by the
    coordinates requires that the classification be read with a Munsell book at hand.
    By way of contrast the color names can be grasped in a way that is adequate for
    most practical purposes by any competent speaker of the language used to write
    the report. The outcome of the activity of color classification initiated by the
    empty square on the coding form is thus a set of portable linguistic objects that
    can easily be incorporated into the unfolding chains of inscription that lead step
    by step from the dirt at the site to reports in the archaeological literature.
    However, as arbitrary linguistic signs produced in a medium that does not
    actually make visible color, neither the color names nor the numbers, allow direct
    visual comparison between a sample of dirt and a reference color. This is
    precisely what the color patches and viewing holes make possible. In brief, rather
    than simply specifying unique points in a larger color space, the Munsell chart is
    used in multiple overlapping activities (comparing a reference color and a patch
    of dirt as part of the work of classification, transporting those results back to the
    labe, comparing samples, publishing reports, etc.), and thus represents the
    “same” entity, a particular color, in multiple ways, each of which makes possible
    different kinds of operations because of the unique properties of each
    representational system.
    In addition to its various sign systems it also contains a set of circular holes,
    positioned so that one is adjacent to each color patch. To classify color the
    p. 170
    20
    archaeologist puts a small sample of dirt on the tip of a trowel, puts the trowel
    directly under the Munsell page and then moves it from hole to hole until the
    best match with an adjacent color sample is found. With elegant simplicity the
    Munsell page with its holes for viewing the sample of dirt on the trowel
    juxtaposes in a single visual field two quite different kinds of spaces: 1) actual
    dirt from the site at the archaeologists’ feet is framed by 2) a theoretical space for
    the rigorous, replicable classification of color. The latter is both a conceptual
    space, the product of considerable research into properties of color, and an actual
    physical space instantiated in the orderly modification of variables arranged in a
    grid on the Munsell page. The pages juxtaposing color patches and viewing holes
    that allow the dirt to be seen right next to the color sample provide an
    historically constituted architecture for perception, one that encapsulates in a
    material object theory and solutions developed by earlier workers at other sites
    faced with the task of color classification. By juxtaposing unlike spaces, but ones
    relevant to the accomplishment of a specific cognitive task, the chart creates a
    new, distinctively human, kind of space. It is precisely here, as bits of dirt are
    shaped into the work relevant categories of a specific social group, that “nature”
    is transformed into culture.
    How are the resources provided by the chart made visible and relevant
    within talk-in-interaction? At line 17 Pam moves her hand to the space above the
    Munsell chart and points to a particular color patch while saying “En this one.”
    Within the field of action created by the activity of color classification, what Pam
    does here is not simply an indexical gesture, but a proposal that the indicated
    color might be the one they are searching for. By virtue of such conditional
    relevance (Schegloff 1968) it creates a new context in which reply from Jeff is the
    expected next action. In line 19 Jeff rejects the proposed color. His move occurs
    after a noticeable silence in line 18. However that silence is not an empty space,
    but a place occupied by its own relevant activity. Before a competent answer to
    Pam’s proposal in line 17 can be made, the dirt being evaluated has to be placed
    under the viewing hole next to the sample she indicated, so that the two can be
    compared. During line 18 Jeff moves the trowel to this position. Because of the
    spatial organization of this activity, specific actions have to be performed before
    a relevant task, a color comparison, can be competently performed. In brief, in
    p. 171
    21
    this activity the spatial organization of the tools being worked with, and the
    sequential organization of talk in interaction interact with each other in the
    production of relevant action (e.g. getting to a place where one make an expected
    answer requires rearrangement of the visual field being scrutinized so that the
    judgment being requested can be competently performed). Here socially
    organized vision requires embodied manipulation of the environment being
    scrutinized.
    It is common to talk about structures such as the Munsell chart as
    “representations.” However exclusive focus on the representational properties of
    such structures can seriously distort our understanding of how such entities are
    embedded within the organization of human practice. With its viewholes for
    scrutinizing samples, the page is not simply a perspicuous representation of
    current knowledge about the organization of color, but a space designed for the
    ongoing production of particular kinds of action.
    We will now look at how a group of archaeologists make a map. This process
    will allow us to examine the interface between seeing, writing practices, talk,
    human interaction and tool use (see Goodwin 1994 for more detailed analysis).
    Map Making and the Practices of Seeing it Requires
    Maps are central to archaeological practice. The professional seeing required to
    produce and make use of a visual document, such as a map, encompasses not
    only the image itself but also the ability to competently see relevant structure in
    the territory being mapped, mastery of appropriate tools, and on occasion the
    ability to analyze the work-relevant actions of another’s body. These different
    kinds of phenomena can be brought together within the temporally unfolding
    process of human interaction used to accomplish the activity of making a map. In
    the following, two archaeologists are making a map to record what they have
    found in a profile of the dirt on the side of one of the square holes they have
    excavated. Before actually setting pen to paper some relevant events in the dirt,
    such as the boundary between two different kinds of soil, are highlighted by
    outlining them with the tip of a trowel. The structure visible in the dirt is then
    mapped on a sheet of graph paper. Typically this task is done by two
    participants working together. One uses a pair of rulers (one laid horizontally on
    the surface, and the other a hand held tape measure used to measure depth
    22
    beneath the surface) to measure the length and depth coordinates of the points in
    the dirt that are to be transferred to the map, and then speaks these coordinates
    as pairs of numbers (e.g., “at fifteen three point two)”. The second person plots
    the points specified on the graph paper, and draws lines between successive
    measurements. What we find here is a small activity system that encompasses
    talk, writing, tools and distributed cognition as two parties collaborate to inscribe
    events they see in the earth onto paper. Here Ann, the party drawing the map, is
    the senior Archaeologist at the site, and Sue, the person making measurements is
    her Student:
    p. 172
    23
    1 Ann: Give me the ground surface over here
    2 to about ninety.
    3 (1.6)
    4 Ann: No- No- Not at ninety.
    5 From you to about ninety.
    6 (1.0)
    7 Sue: °Oh.
    8 Ann: Wherever there's a change in slope.
    9 (0.6)
    10 Sue: °Mm kay.
    11 Ann: See so if its fairly flat
    12 I'll need one
    13 where it stops being fairly flat.
    14 Sue: Okay.
    15 Ann: Like right there.
    Line Drawn
    With Trowel
    Surface
    Tape
    Measure
    Ann Sue
    Ruler
    24
    The sequence to be examined begins with a directive. Ann, the writer, tells Sue
    the measurer, to “Give me the ground surface over here to about ninety.”
    However before Sue has produced any numbers, indeed before she has said
    anything whatsoever, Ann in lines 4 and 5 challenges her, telling her that what
    she is doing is wrong: ”No- No- Not at ninety. From you to about ninety.”
    Directives are a classic form of speech action that sociolinguists have used to
    probe the relationship between language and social structure, and in particular
    issues of power and gender. Here Sue formats both her directive and her
    correction in very strong, direct “aggravated” fashion. No forms of mitigation are
    found in either utterance, and Ann is not given an opportunity to find and
    correct the trouble on her own. Directives formatted in this fashion have
    frequently been argued to display a hierarchical relationship, i. e., Ann is treating
    Sue as someone that she can give direct, unmitigated orders to. And indeed Ann
    is a professor and Sue is her student.
    Issues of power do not however exhaust the social phenomena visible in this
    sequence. Equally important are a range of cognitive processes that are as
    socially organized as the relationships between the participants. For example, in
    that Sue has not produced an answer to the directive, how can Ann see that there
    is something wrong with a response that has not even occurred yet? Crucial to
    this process is the phenomenon of conditional relevance first described by
    Schegloff (1968). Basically a first utterance creates an interpretive environment
    that will be used to analyze whatever occurs after it. Here no subsequent talk has
    yet been produced. However, providing an answer in this activity system
    encompasses more than talk. Before speaking the set of numbers that counts as a
    proper next bit of talk, Sue must first locate a relevant point in the dirt and
    measure its coordinates. Both her movement through space, and her use of tools
    such as a tape measure, are visible events. As Ann finishes her directive Sue is
    holding the tape measure against the dirt at the left or zero end of the profile.
    However, just after hearing “ninety” Sue moves both her body and the tape
    measure to right, stopping near the “90” mark on the upper ruler. By virtue of
    the field interpretation opened up through conditional relevance, Sue’s
    movement and tool use can now be analyzed by Ann as elements of the activity
    she has been asked to perform, and found wanting. Sue has moved immediately
    p. 173
    25
    to ninety instead of measuring the relevant points between zero and ninety. The
    sequential framework created by a directive in talk thus provides resources for
    analyzing and evaluating the visible activity of an addressee’s body interacting
    with a relevant environment.
    Additional elements of the cognitive operations and kinds of seeing that Ann
    requires from Sue in order to make her measurements are revealed as the
    sequence continues to unfold. Making the relevant measurements presupposes
    the ability to locate where in the dirt measurements should be made. However
    Sue’s response calls this presupposition into question and leads to Ann telling
    her explicitly, in several different ways, what she should look for in order to
    determine where to measure. After Ann tells Sue to measure points between zero
    and ninety, Sues does not immediately move to points in that region but instead
    hesitates for a full second before replying with a weak “°Oh” (line 7). Ann then
    tells her what she should be looking for “Wherever there’s a change in slope”
    (line 8). This description of course presupposes Sue’s ability to find in the dirt
    what will could as “a change in slope.” Sue again moves her tape measure far to
    the right. At this point, instead of relying upon talk alone to make explicit the
    phenomena that she wants Sue to locate, Ann moves into the space that Sue is
    attending to and points to one place that should be measured while describing
    more explicitly what constitutes a change in slope: “See so if it’s fairly flat I’ll
    need one where it stops being fairly flat like right there.”
    One of the things that is occurring within this sequence is a progressive
    expansion of Sue’s understanding as the distinctions she must make to carry out
    the task assigned to her are explicated and elaborated. In this process of
    socialization through language there is a growth in intersubjectivity as domains
    of ignorance that prevent the successful accomplishment of collaborative action
    are revealed and transformed into practical knowledge, a way of seeing, that is
    sufficient to get the job at hand done, such that Sue is finally able to understand
    what Ann is asking her to do (that is to see the scene in front of her in a manner
    that permits her to make an appropriate, competent response to the directive). It
    would however be wrong to see the unit within which this intersubjectivity is
    lodged as simply these two minds coming together in the work at hand. Instead
    the distinction being explicated, the ability to see in the very complex perceptual
    p. 174
    26
    field provided by the landscape they are attending to, those few events that
    count as points to be transferred to the map, are central to what it means to see
    the world as an archaeologist, and to use that seeing to build the artifacts, such as
    this map, which are constitutive of archaeology as a profession. Such seeing
    would be expected of any competent archaeologist. It is an essential part of what
    it means to be an archaeologist, and it is these professional practices of seeing
    that Sue is being held accountable to. The relevant unit for the analysis of the
    intersubjectivity at issue here — the ability of separate individuals to see a
    common scene in a congruent, work-relevant fashion — is thus not these
    individuals as isolated entities, but instead archaeology as a profession, a
    community of competent practitioners, most of whom have never met each
    other, but who nonetheless expect each other to be able to see and categorize the
    world in ways that are relevant to the work, scenes, tools and artifacts that
    constitute their profession.
    The phenomena examined so far provide some demonstration of how what is
    to be seen in a map, scene, human body or image stands in a reflexive
    relationship to other semiotic structures that participants are using to constitute
    visual phenomena as a relevant component of the events and activities that make
    up their lifeworld. These structures include language, the constitution of action
    and context provided by sequential organization, and ways of seeing events and
    using images of different types that are lodged within the practices of particular
    social communities, such as the profession of archaeology.
    Professional Vision in Court
    Parties who are not competent members of relevant social communities can lack
    the ability, and/or the social positioning, to see and articulate visual events in a
    consequential way. These issues were made dramatically visible in the trial of
    four Los Angeles policemen who were recorded on videotape administering a
    beating to an African American motorist, Mister Rodney King, whom they had
    stopped after a high speed pursuit triggered by a traffic violation. When the tape
    of the beating was shown on national television there was outrage, and even the
    head of the Los Angeles police department thought that conviction of the officers
    was almost automatic. However, at their first trial (they were later tried again in
    p. 175 Federal rather than state court for violating Mister King’s civil rights) all four
    27
    policemen were acquitted, a verdict that triggered an uprising in the city of Los
    Angeles, with neighborhoods being burned, federal troops being called in, etc.
    The crucial evidence at the trial was a visual document: the videotape of the
    beating. Rather than transparently proving the guilt of the policemen who were
    seen on it beating a man lying prone on the ground, the tape in fact provided the
    policemen’s lawyers with their evidence for convincing the jury that their clients
    were not guilty of any wrongdoing. They did this by using language, pointing
    and expert testimony to structure how the jury saw the events on the tape in a
    way the exonerated the policemen. In essence they used the tape of the beating to
    demonstrate that Mr. King was the aggressor, not the policemen, and that the
    policemen were following proper police practice for subduing a violent,
    dangerous suspect (see Goodwin 1994 for more detailed analysis of such
    professional vision). Crucial to their success was their use of another policeman,
    Sargent Duke, as an expert witness. It was argued that laymen could not
    properly see the events on the tape. Instead, the ability to legitimately see what
    the body of a suspect was doing, such as Mr. King’s as he lay on the ground
    being beaten, and specifically whether the suspect was being aggressive or
    compliant, was lodged within the work practices of the social group charged
    with arresting suspects: the police. The ability to see such a body, and code it in
    terms of its aggressiveness, was a component of the professional practices that
    policemen use to code the events that are the focus of their work. It so far as such
    vision is a public component of the work practices of a particular social group,
    someone who wasn’t present but who is a member of the profession, a
    policeman, can make authoritative statements about what can be legitimately
    seen on the tape. However, while policemen constitute a socially organized
    profession, suspects and victims of beatings don’t. Therefore there is no one with
    the social standing, i.e., membership and mastery of the practices of a relevant
    social group, to act as an expert witness to articulate what was happening from
    Mister King’s perspective.
    What was to be seen on the tape was structured through the way in which
    different semiotic fields, such as structure in the stream of speech, pointing
    which highlighted specific places and phenomena in the image being looked at,
    and events in the image itself, mutually elaborated each other to provide a
    28
    construal of events that served the purposes of the party articulating the image.
    The following provides an example. At the point where we enter this sequence
    the prosecutor has noted that Mr. King appears to be moving into a position
    appropriate for handcuffing him, and that one officer is in fact reaching for his
    handcuffs, i.e. the suspect is being cooperative.
    1 Prosecutor: So uh would you,
    2 again consider this to be:
    3 a nonagressive, movement by Mr. King?
    4 Sgt. Duke: At this time no I wouldn't. (1.1)
    5 Prosecutor: It is aggressive.
    6 Sgt. Duke: Yes. It's starting to be. (0.9)
    7 This foo


    收藏到:Del.icio.us