The hypotheses we make and the constraints that we recognize are the following:

  1. The traces are available in the XML format, the semantics of which is known, at least informally. This is not a strong hypothesis: many CSCL tools directly produce such formats. In other cases, if the representation and the semantics of the traces are known, it is possible to convert them to XML format without loss of information.
  2. The proposed approach does not prejudge the use of a specific tool or a compulsory format. It applies to the joint usage of different tools and methods of gathering traces, for example through one or more CSCL tools on the one hand, and by the manual transcription of audio or video, possibly with the help of an appropriate tool, on the other.

However, if one takes into account the wide variety of CSCL tools and transcription conventions followed by researchers, it seems illusionary to attempt to propose a common transcription/trace format or even hope to define a kind of “pivot format” that can represent human activity, whether it is through an exceedingly complex format that expresses all the shades and variations possible or whether it is through a simplified format that expresses a lowest common denominator. It is easier and more reasonable to imagine that the XML trace documents are conserved, unchanged, in their original form, as the researcher choses to record them. Consequently, it becomes necessary to give the researcher a tool that lets him or her explore the collected corpus through a friendly interface. The minimal functionalities that should be supplied are:

  • The visualization of corpus extracts;
  • The « Replay » of the mediated interactions from the collected traces;
  • The possibility to annotate elements of the corpus;
  • A search mechanism for the corpus.

The method we propose here is designed to support researchers for analysis according to a given coding schema, selection of analyses already done in order to perform further analyses, and finally comparison of analyses done by different coders or with different methodologies.
We begin by defining the term “primary corpus” as the collection of all the documents gathered during the course of an experiment or observation. These typically consist of:

  • Audio or video documents that have been recorded during the experiment or during the observation of the situation;
  • Transcriptions of these recordings carried out by the researcher;
  • Traces of computer-mediated interactions;
  • Documents distributed to participants in the experiment/situation;
  • Notes taken before, during and after the experiment/situation;
  • All other documents judged to be relevant by the researcher.
These documents are finite in number and will not evolve a posteriori, as they represent all the data gathered during and on the experiment/situation. In practice, we are interested in documents that exist in digital format (having been originally generated in or translated into XML) for which an informal semantics can be defined.
We make the hypothesis that this primary corpus will be considered as fixed and unchangeable. All other documents created at a later date from this primary corpus will be an extract, a comment or an interpretation of the primary. Any annotation to the documents in this base will be expressed through an intermediary document (the “anchors document”) that will contain references to the primary corpus.
The methodology described above allows us to make up a corpus that contains all of the available data, without any information loss, as no data is translated from one format to another. As mentioned previously, this corpus should be visualized and explored by the researcher. He or she should also be able to designate particular elements, annotate them and extract these elements or parts of them.
We use the term « point in a corpus » to designate a reference to an item in the corpus. Such an item is, for example, the location of a word, a sentence or a paragraph in a text, an element in an XML document, a spot or an area in a picture, an excerpt of a video/audio document, etc.
Let's give some examples of analysis situations. A researcher can ask the following questions on a corpus :
  1. In which parts of the corpus does a participant use the expression « OK » ?
  2. Which participant intervenes the most in the chat ?
  3. What is the category (using Rainbow analysis for example) of each participant actions?
  4. What mechanical knowledge is referred to in this discussion ?
  5. I need to put comments on some points of the corpus.
Clearly, each question is answered by a very different analysis work. While requests 1 and 2 can possibly get an answer in a few minutes, possibly through a procedure that automates the work, other requests may need hours, days or even weeks of work by the researcher.

However, we cannot expect the human and social sciences researcher to master the different representations used in specific software, even through the most friendly XML editors. We must therefore provide him or her with a tool that allows a visualization of the corpus he or she wishes to analyze.
Following an initial analysis of research practices, needs and existing tools, we propose the following tentative solution:

  • The development of a generic browser, allowing the visualization and the mark-up of different documents that are part of the primary corpus.
  • The development of an annotation tool, that allows users to link annotations to elements of the primary corpus.
  • The development of an analysis tool that allows users to create links between elements of the corpus (a given chat intervention for example) and elements of the analysis method (for example, the task management category in the Rainbow method).
And of course the ability to control the Replayer coming with the DSS, in an extremely detailed manner for researcher analysis, but also in a simpler version to allow teacher inspection.
The tool must handle data of most CSCL and CSCW systems (e.g. LEAD tools, DREW, Digalo) and audio/video recording tools (e.g. Transana). The interface should enable access to raw trace files and to the replayer functionalities.
The Alpha prototype will be built according to the proposed model and will firstly be tested on a selection of computer-mediated human interaction traces by researchers using the Rainbow framework. Next, we will address a second analysis method and test its use by researchers. Our ultimate goal is to provide an observation base of primary corpora that, through the definition of anchors, allows researchers to annotate, analyze, validate analyses and visualize data using a single adaptive tool while thinking of the future re-use of the work done.

From a technological point of view, the interface between the electronic Discussion tool (DSS) and the analysis tool is limited to

  • the interface of the « Replayer »
  • the model of the trace file recorded during a mediated discussion


Tatiana: Trace Analysis Tool for Interactions ANAlysts. This is the tool developed for the Lead Project, but it will also handle traces from different discussion tools (Coffee, Drew, DrewLite, Digalo, etc).

Drew is a collaborative tool (developped during the previous projects SCALE) that provide trace files of interactions with a web browser interface.
You may test here without any installation.

DrewLite is a simplified version of this tool running on a LAN with a less elaborate environment for managing sessions and traces. A simple trace viewer and converter comes also with DrewLite.