The prosody of reported speech in spoken French


Molti studi sembrano confermare che il discorso riportato è contrassegnato prosodicamente da variabili percettive come la variazione del tono, la durata della pausa, la velocità del discorso, l'estensione del tono più alta, ecc.

Tuttavia, i dati analizzati estratti dai dataset ORFEO e OFROM, che raccolgono vari corpora di discorsi spontanei in francese con vari stili e provenienze geografiche, mostrano che se i monologhi sono spesso caratterizzati da una pausa che introduce i segmenti riportati, non è affatto così è il caso dei dialoghi informali, in cui i segmenti introduttivi sono per lo più posizionati all'inizio o al centro della frase e integrati nella struttura prosodica complessiva della frase.

La presenza di una pausa nei monologhi per segnalare la presenza di un segmento del discorso riportato in una frase riflette un “pregiudizio della lingua scritta” legato alla presenza di virgolette nelle rappresentazioni scritte del discorso riportato.


Many studies seem to confirm that reported speech is marked prosodically by perceptive variables such as pitch variation, pause duration, speech rate, higher pitch range, etc.

However, analyzed data extracted from the ORFEO and OFROM data sets, which bring together various corpora of spontaneous speech in French with various styles and geographic origins, show that if monologs are often characterized by a pause introducing the reported segments, it is not at all the case for informal dialogs, where introducing segments are mostly located at the beginning or in the middle of the sentence and integrated in the overall sentence prosodic structure.

The presence of a pause in monologs to signal the occurrence of a reported speech segment in a sentence reflects a “written language bias” linked to the presence of quoting marks in written representations of reported speech.

In traditional grammar, reported speech is differentiated from direct speech, e.g., Max said he is very hungry as reported speech vs. Paul said “I am very hungry” as direct speech. In this paper however, we will deal with “direct” speech in oral discourse, commonly designated as “reported oral speech”, or “reported speech” when dealing with oral speech, rather than its written transcription, as in Paul said “I am very hungry”, normally transcribed orthographically with quotation marks. In fact, this orthographic convention using quotation marks may bring preconceived ideas about the oral realization of reported speech, as oral reading imposes the presence of a pause at the beginning and the end of the segment reported. But is it the case for spontaneous speech?


Common views on reported speech

A common linguistic description of reported speech asserts that speakers “do not always explicitly introduce different ‘voices’ with reporting verbs or quotative constructions. Instead, figures are often ‘brought on stage’ for the first time merely by being animated, without, for instance, a prefatory he said or she said. […] The figure’s ‘voice’ must be constructible different from the current speaker’s own ‘voice’”. However, “Prosodic and paralinguistic effects are in fact deictic to a certain extent: they involve speaking within a given range of relative loudness, pitch and tempo […] and with a given voice quality.” (Couper-Kuhlen, 1997).

Many studies seem to confirm this view, according which specific prosodic features are expected to mark reported speech segments: Fónagy, 1986; Günther, 1998; Klewitz and Couper-Kuhlen, 1999; Calaresu, 2004; Oliveira et al, 2004; Contreras Roa, 2020, among others. In these studies, analyzed phonetic data describe how expecting perceptive variables such as pitch variation, pause duration, speech rate, higher pitch range, etc. signal reported speech segments in discourse.

It may then be worthwhile to determine from the analysis of samples extracted from a large corpus of spontaneous speech in French if reported speech is actually prosodically marked in discourse, as predicted by the doxa, considering two styles present in the corpus, storytelling (monolog) and conversations between friends (dialogs).

Prosodic parameters

Most of the studies quoted above investigate the possible correlation between reported speech segments and acoustic correlates of perceptive variables, i.e., pitch variations (Hz, Semitones), pause duration (s), speech rate (Syllables or phones per second), loudness variations (dB).

The observed correlations by Oliveira & Cunha (2004) for instance pertain to:

  1. Pitch variations for reported (direct) speech in French, with higher pitch range than indirect speech, pitch reset (with respect to previous segments), register change.
  2. Pause duration: reported segments are often separated from non-reported speech by pauses.
  3. Speech rate: Changes in speech rate.

(Intensity measurements are found harder to establish acoustically).

These results comfort the idea that there is a need to put some distance, introduce some change when the section of reported speech starts. Perhaps the “perfect way” to do this would be to change the speaker voice in order to imitate the speech style of the person quoted (see an example below, Fig. 13), borrowing some idiosyncratic characteristics of the speaker voice.

Some contradicting earlier work

However, analyzing recorded conversations in Quebec French, Demers (1998) for instance, appears not so sure about the use of specific prosodic markers necessarily related to reported speech, especially in informal conversation. She observes that in examples like:

Il m'a dit: « Ça été l'erreur de ta vie ». He told me: “That was the mistake of your life.”

Ben souvent le monde il dit: «Il a pas peur lui ». Well, people often say: “He’s not afraid.”

C'est pour ça je dis moi: «Il-y-a pas de femme heureuse comme moi » : “That’s why I say to myself: “There is no happy woman like me.”

Demers concludes that an expected pause is indeed observed when reading, probably influenced by the punctuation quotation marks, whereas it does not appear so frequent in spontaneous discourse, except perhaps only in academic formal speech or in some experimental phonology environment.

Adding the prosodic structure

Most if not all existing research on prosodic characteristics of reported speech use global parameters, such as pitch range, speech rate, intensity variations, etc. We will add here another potentially interesting property of reported speech segments pertaining to the prosodic structure of the sentence where the reported segment is embedded. This may show with more details the eventual presence of pauses or changes in the global prosodic parameters how reported speech segments are integrated or not in the speaker production.

This analysis is based on a model of the autonomous prosodic structure (Martin, 2018), where the hierarchical grouping of accentual phrases (AP, group of words with only one stressed syllable, in final position in French) is defined by dependency relations indicated by pitch variations located on stressed syllables vowels. These pitch variations are categorized as falling or rising, above or below the glissando threshold (i.e., perceived as a melodic variation or a static tone), and reaching a minimal or maximal pitch value (for terminal conclusive contours declarative and interrogative), Fig. 1.)


Fig. 1. Classes of melodic contours in French.

The dependency relations indicated by these melodic contours are given Fig. 2:


Fig. 2. Dependency relations indicated by melodic contours located on stressed syllables vowels.

For instance, the falling contour Cfal, whose variation is above the glissando threshold, depends on the occurrence of either a falling contour before pause, Cfap#, or a rising contour Cris, also above the glissando threshold, both located later in the sentence (dependency “to the right”). These dependency relations determine the successive grouping of accentual phrases ended by Cfal with those ended by Cris to form a larger prosodic syntagm (aka IP, Intonation Phrase in the Autosegmental Metrical model).

Applied to the example of Fig. 3, the prosodic dependency rules define the sentence prosodic structure, displayed with orthogonal branches (Stressed syllables vowels are in bold).

Fig. 3. [il a dit ben] [tu vois] [ je suis content] [d'être comme ça]

“[he said well] [you see] [I'm happy] [to be like that]”

[Corpus ORFEO, ergotherapie_sch il a dit ben 299.898 s 302.131 s]

Fig. 3 gives an example of prosodic annotation of a segment retrieved automatically by the WinPitch integrated concordancer (keyword il a dit): fundamental frequency, intensity, wave form, and optional narrow band spectrogram with its aligned frequency scale in order to visually verify the validity of the melodic curve. The automatic word and phone segmentation (in IPA) allows to easily define stressed vowels, instancing melodic contours above and below the glissando threshold.

The prosodic structure, represented with orthogonal branches, gives an account as how the accentual phrases are merged together in the curse of time. In the example, [il a dit ben] is merged with the group formed by three accentual phrases AP [tu vois] [ je suis content] and [d'être comme ça] ended with a rising contour Cris, to define the sentence prosodic structure. The stressed syllables vowels (in bold characters) define the AP right boundaries, according to French morphological rules.

Analysis of spontaneous speech data

The analyzed experimental data are extracted from the ORFEO + OFROM data set (4 302 930 words, 1273 files), which brings together various corpora of spontaneous speech in French. Examples were easily located thanks to the concordancer integrated in the WinPitch software. This software automatically displays the speech sound corresponding to the selected concordancer text, together with its acoustic analysis, spectrogram and melodic and intensity curves.

Fig. 4. Concordancer integrated in WinPitch, giving the list of occurrences of a key word (here je lui dis vous). Selecting a line will retrieve the corresponding speech segment, together with their acoustic analysis (fundamental frequency, intensity and narrow or wide band spectrogram).

Given the usual large amount of data considered (In the ORFEO + OFROM corpus: 4 302 230 words…), the ease of use of efficient and ergonomic research tool is essential. WinPitch addresses this problem by offering fast access to recorded data in a variety of transcription formats (json, TextGrid, TRS, UFT-8, XML…), integrating fast acoustic analysis and prosodic annotation tools of selected speech segments of interest. Prosodic annotation uses graphic drawing tools, operating either with the ToBI standard notation, or an automatic categorization of melodic contours based on their glissando levels.

Some statistics…

Number of occurences
Je disI say1134
Tu disYou say401
Elle ditShe says182
Nous disonsWe say9
Vous ditesYou say195
Elles disentThey say5
Je lui disI tell him231
Tu lui disYou tell him35
Il me ditHe tells me280
Elle lui ditShe tells him34
Elle me ditShe tells me196
Nous lui disonsWe tell him1
Vous lui ditesYou tell him3
Ils lui disentThey tell him1
Elles lui disentThey tell him0

Table 1. Number of occurrences of some introductory segments found in the ORFEO corpus.


Table 1 shows the number of occurrences of some selected introductory segments found in the ORFEO + OFROM corpora. Je dis is the most frequent occurrence in the corpus, followed by tu dis and je lui dis.

Reported speech typology

Given the rather large number of occurrences of reported speech found in ORFEO+OFROM corpora, analyzed examples are classified according to their position in the sentence:

Initial: [il m'a dit] # [c'est comme ça] [que ça se fait] “[he told me] # [that’s how it’s done]”

Embedded: [Et sa femme] [elle lui dit euh] [le Nord euh] [c ’ est-à-dire euh] [au-dessus d ’ Avignon] ? “[And his wife] [she said to him uh] [the North uh] [that is to say uh] [above Avignon]?”

Final (postnucleus):  [très difficile] [jouer violoncelle] # [lui aurait dit] [Rostro] “[very difficult] [play cello] # [reportedly told him] [Rostro]”

Furthermore, examples are classified as extracted from monologs (story tellers, radio news), or dialogs (informal conversations).

Selection of examples

1. Introductory segment in initial position in the sentence

1.1 Monologs (Storytellers)

Fig. 5. [il m'a dit] # [c'est comme ça] [que ça se fait] “[he told me] # [that’s how it’s done]”

[Corpus ORFEO youcef_zerari_h_29_abdel_hachim_h_25_so  3943.447 s 1987.058 s]


Although the speaker is not professional storyteller, she uses a didactic style in this monolog, instantiated by the presence of a 450 ms pause, and a prosodic structure congruent with syntax, where the reported segment c'est comme çaque ça se fait constitute a complete Intonation Phrase IP. The presence of the pause can also be due to eurhythmicity, balancing the duration of the two first levels of the prosodic structure. There is no remarkable pitch change, or speech rate, as the encoding of the prosodic structure including the reported segment follows the general prosodic grammar rules.

Fig. 6. [donc il dit] # [s'il me la rend pas] [je vais faire la guerre] [avec toi]

“[so he says] # [if he doesn't give it back] [I'm going to war] [with you]”

[ORFEO 01_og_nh_100222  4393.085 s 2203.267 s]


A typical example of storyteller reported speech, with a pause of 800 ms following the introductory segment donc il dit. The reported segment carries the same configuration of the prosodic structure that would be found in an isolated version of the reported segment, with a rising contour Cris ending the segment s'il me la rend pas, with no change in speech rate, pitch range or intensity.

1.2 Dialogs (informal conversation)


Fig. 7. [il a dit ben] [tu vois] [ je suis content] [d'être comme ça]

“[he said well] [you see] [I'm happy] [to be like that]”

[ORFEO ergotherapie_ sch  595.824 s 302.332 s]


In this example, the introductory segment accentual phrase ends with the first locution of the reported segment ben. The is no pause, but the reported segment is prosodically integrated in a complete IP.