Hong Kong Corpus of Spoken English (HKCSE)
(Part of speech search)

Welcome to the HKCSE hosted by the Research Centre for Professional Communication in English of the Hong Kong Polytechnic University. The HKCSE is a large collection of texts representing spoken English in Hong Kong. This is the orthographic version, if you would like to purchase or know more about the prosodic version (A corpus-driven study of discourse intonation with a CD). click here to go to John Benjamins website.

Please cite the HKCSE with following information:

  Cheng W, Greaves C, Warren M (2005). The creation of prosodically transcribed intercultural corpus: The Hong Kong Corpus of Spoken English (prosodic), ICAME Journal, vol. 29 (pg. 47-68), April 2005.  

There are currently 907,657 words in the HKCSE.




    
    
    
        


Remarks: (Click here for detailed instructions)

(Recommend using Mozilla Firefox or Apple® Safari to search. Best viewed in 1680x1050 screen resolution.)

1. Query word/phrase accepts English alphabets and dash only (i.e. A-Z, a-z, -), case-insensitive.
2. Search string syntax word^pos. [e.g. book^nn, book^vb ^dt, ^vb ^dt ^nn, etc.]
  • The symbol ^tag indicates the tag to be searched for. The string in front of the caret symbol ^ is the query word.
  • A maximum of 5 units of 'word', 'word^pos' or '^pos' can be used for query.
  • A caret with a number (e.g. ^2) in the middle of the search string instructs the search engine to skip two words in between any two query parameters (e.g. book ^2 advance).
  • Head and/or tail partial search of the query word/phrase is possible by adding '  -  ' to the front or end of the query word/phrase (e.g. -ment, advan-, -fine-)
  • Tail partial search for the POS tag is possible by adding '-' at the end of the POS tag (e.g. ^NN- will return ^NN, ^NNS, ^NNP, ^NNPS)
  • Units are separated by a space. Punctuation is not accepted.
3. Part-of-speech tag symbols in alphabetical order*

^CC Coordinating conjunction ^PRPS Possessive pronoun
^CD Cardinal number ^RB Adverb
^DT Determiner ^RBR Adverb, comparative
^EX Existential 'there' ^RBS Adverb, superlative
^FW Foreign word ^RP Particle
^IN Preposition or subordinating conjunction ^TO 'to'
^JJ Adjective ^UH Interjection
^JJR Adjective, comparative ^VB Verb, base form
^JJS Adjective, superlative ^VBD Verb, past tense
^MD Modal ^VBG Verb, gerund or present participle
^NN Noun, singular or mass ^VBN Verb, past participle
^NNS Noun, plural ^VBP Verb, non-3rd person singular present
^NNP Proper noun, singular ^VBZ Verb, 3rd person singular present
^NNPS Proper noun, plural ^WDT Wh-determiner
^PDT Predeterminer ^WP Wh-pronoun
^POS Possessive ending ^WPS Possessive wh-pronoun
^PRP Personal pronoun ^WRB Wh-adverb

4. Maximum 10,000 concordance lines can be listed.

* Source: Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Retrieved December 20, 2016 from http://repository.upenn.edu/cgi/viewcontent.cgi?article=1603&context=cis_reports

* The corpus is tagged with Stanford Part-of-speech Tagger v.3.6.0.

       

Back to Main Profession-specific Corpora Search Page

 

Contents:

  1. Click here for the details of the contents of the HKCSE.

  2. Please note that the contents in the HKCSE do not represent the views of the organisation and/or writer.

  3. The work to compile the HKCSE was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. G-YE86). This support is gratefully acknowledged.