A conversation analytic transcription of a spoken subcorpus of the BNC
The CABNC corpus is a open-licensed, detailed conversation analytic re-transcription of naturalistic conversations from a subcorpus of the British National Corpus amounting to around 4.2 million words in 1436 separate conversations.
The project aims to produce transcripts usable for both computational and detailed qualitative analysis. If you are a CA transcriptionist and you use the data, please make sure you re-submit your updated transcripts to help improve the corpus over time.
To edit transcripts in CLAN, place both the transcript .cha file and the audio .wav file in the same directory. Check the CLAN manual for details of how to use the CLAN editor.
Transcriptions are made using Jeffersonian CA transcription conventions, and the CHAT-CA file format and transcription symbols provided by the CLAN transcription system.
A guideline for transcribers is currently being devised to help with standardisation - the guidelines adhere as closely as possible to current standards in CA without sacrificing machine readability.
To use or contribute to these transcripts:
The data on which this project builds is available here:
If you want to perform complex searches on BNC data:
The Audio BNC contains about 7.5 million words of recorded speech, all of it already roughly transcribed, with audio recordings of sufficient quality for automated phonetic transcriptions, and full Praat TextGrid files aligning audio to transcriptions are available for the entire corpus. There are also comprehensive wordclass and part-of-speech tag annotations. Within the overall BNC corpus, this project focuses on a subcorpus of more naturalistic, conversations from informal contexts. These include 152 rough transcripts of audio files, labelled by the original BNC transcribers with the following tags:
These are conversations around water-coolers, in corridors, bus-stops, homes etc. and as such are most useful for analysing natural talk-in-interaction. There are 4,228,314 words in this subcorpus.