|
The Cambridge English Profile Corpus (CEPC) is a corpus of learner English produced by students worldwide, and is being built by Cambridge University Press and Cambridge ESOL, in collaboration with a network of participating educational establishments across the world. These establishments include schools, universities, and private language schools, along with research centres, government bodies (such as ministries of education) and individual education professionals.
The CEPC aims to provide 10 miillion words of data, covering both spoken (20%) and written (80%) language. Both General English (60%) and English for Specific Purposes (40%) are included. Written data is being collected via the online English Profile data collection portal; more details about this process can be found on our corpus FAQ page. The corpus covers levels A1-C2, and attempts to maintain a balance across a number of variables, including CEF level, first language, and educational context. The CEPC allows a number of filtering options: - educational contexts (e.g. primary or secondary, monolingual or bilingual)
- task type e.g. letter, email, report, essay (written data)
- type of interaction e.g. casual conversation, formal presentation, oral exam, classroom discourse, role play etc (spoken data)
- specific domains (e.g. medical English, business English)
- first language of learners
- age range of learners, and other demographic information
- country of data collection
Become a data contributor Learn more about the corpus Link to data collection portal
|