This site holds the conversational corpora assembled by the ESRC Centre for Research on Bilingualism in Theory & Practice at University of Wales Bangor.
We are seeking to gain a greater understanding of how bilingual individuals in a variety of communities manage both their languages within the same conversation.
The questions we consider include:
To date, we have assembled three corpora:
Summary data for each corpus:
A number of publications and presentations have resulted from mining the corpora for the linguistic information they contain.
The researchers have received input and assistance from a variety of collaborators around the world. We have also received help in translating the Miami corpus from a number of people, listed on this page.
Our corpus material is transcribed and annotated using the CHAT and CLAN applications developed by Prof Brian MacWhinney and Leonid Spektor at Carnegie Mellon University. Our Siarad data is also available via the Talkbank portal (although the version there differs slightly from the one on this website.)
To gloss the Miami and Patagonia corpora we are using autoglossing software we have developed in-house. To mine all three corpora we are using a variety of techniques, including the output from the autoglosser.
The ESRC Centre has collected these materials following the ethical guidelines set out in the Talkbank Code of Ethics.
The material on Talkbank and on this site is available under the Free Software Foundation's General Public License. This means that you can access it freely and use it however you like. We would be grateful, however, if any such use could also acknowledge the ESRC Centre.
The support of the Arts and Humanities Research Council (AHRC), the Economic and Social Research Council (ESRC), the Higher Education Funding Council for Wales (HEFCW) and the Welsh Government is gratefully acknowledged.