For individual words, a subset of phenomena was coded (see individual corpus descriptions for further details). The coding scheme is phonemic and specifies the following linguistic variables: target phoneme/grapheme; preceding and following phoneme/grapheme; stress; and position in the word. A complete list of the values for each of these variables is available in [RPD_Coding_Linguistic variables.pdf]
The following coding conventions were adopted:
Coding of 'Target phoneme' / 'Preceding phoneme' / 'Following phoneme': coding is based on phonemic (i.e. dictionary) transcription. For example, French <r> is coded /R/ even though non-uvular fricative realizations may be encountered (NB. SAMPA symbols [SAMPA.pdf] are used here and elsewhere). Similarly, Spanish <b> is coded phonemically /b/ in spite of the possibility of approximant realizations in some environments.
Pauses between target phoneme and preceding/following phoneme: If the target sound is preceded or followed by a pause in the particular sound file, 'Preceding phoneme'/'Following phoneme' is coded as '#' (pause).
Coding 'Target / Preceding / Following Grapheme':
Complex graphemes: Some sounds may be represented by two or more graphemes (e.g. French [J]=<gn>, [u]=<ou>; Spanish [rr]=<rr>). In the case of double consonants (e.g. <bb>, <tt>), with the exception of <rr>, coding is based on the first grapheme of the pair (i.e. in the coding, 'Following grapheme' will be the second of the two identical graphemes);
Initial/final graphemes: If the grapheme in question is word-initial or word-final, 'Preceding grapheme' / 'Following grapheme' is coded as '#'. The grapheme of the preceding/following word is not indicated.
Stress: stress is coded based on (i) the syllable in which the sound occurs and (ii) the location of this syllable vis-à-vis the syllable bearing main (tonic) stress (in examples below, the syllable for which the coding is valid is in bold and stressed syllable is underlined).
Ante Pre-tonic: two syllables before the main stress (e.g. [i] in French illisible; [kO~] in Portuguese competir)
Pre-tonic: syllable preceding main stress (e.g. [ku] in Romanian culeg; [rre] in Spanish reloj)
Tonic: syllable receiving main stress (e.g. [mul] in Romanian multă [ku] in French beaucoup)
Post-tonic: syllable following main stress (e.g. [do] in Portuguese mundo; [Do] in Spanish pasado)
Post post-tonic: two syllables following main stress (e.g. [is] Portuguese in móveis; [d@] in Romanian hlamidă)
Position in word: this is based on the phonetic transcription and not the orthographic form.
Initial: all consonants at the beginning of words ([l] in Portuguese longe including the second or third member of a cluster (e.g. [l] in French bloquer)
Medial: all consonants between two pronounced vowels whether singleton (e.g. [m] in Romanian vreme or in clusters (e.g. [vg] in French sauvegarde; [nd] in Spanish candado)
Final: all consonants at the end of words (e.g. [S] in Romanian călus) including the first member of a cluster (e.g. [R] in French plateforme).
Vowels: vowels are coded based on the syllable in which they occur.
Initial: first syllable, whether preceded by a consonant (e.g. [a] in Romanian lapte or not (e.g. [o] in Portuguese opina).
Medial: in words of three syllables or more, any vowel neither in the first or last syllable (e.g. [e] in Spanish ahuecar, Romanian teatral)
Final: last syllable (e.g. [a~] in French demande; [i] in Spanish videoclip).