Hong Kong Corpus of Corporate Governance Reports
(Part of speech search)

Welcome to the HKCCGR developed by the Research Centre for Professional Communication in English of the Hong Kong Polytechnic University.

The one-million word HKCCGR consists of the corporate governance reports of 217 companies listed on the Hong Kong Stock Exchange. These companies were carefully chosen to reflect the weighting of the four main sectors listed on the exchange (i.e. finance, utilities, property, and commercial and industrial). The moves which comprise the corporate governance reports were identified and 25 sub-corpora were compiled based on each move.

There are currently 1,034,673 words in the HKCCGR.




    
    
    
        

 

Please cite this corpus with following information:

  Warren, M. (2017). Corpus-driven investigation of corporate governance reports. In E Friginal (Ed.). Studies in corpus-based sociolinguistics (pp. 275–292). New York, NY: Routledge. doi: http://hdl.handle.net/10397/71191  

 

Remarks: (Click here for detailed instructions)

(Recommend using Mozilla Firefox or Apple® Safari to search. Best viewed in 1680x1050 screen resolution.)

1. Query word/phrase accepts English alphabets and dash only (i.e. A-Z, a-z, -), case-insensitive.
2. Search string syntax word^pos. [e.g. book^nn, book^vb ^dt, ^vb ^dt ^nn, etc.]
  • The symbol ^tag indicates the tag to be searched for. The string in front of the caret symbol ^ is the query word.
  • A maximum of 5 units of 'word', 'word^pos' or '^pos' can be used for query.
  • A caret with a number (e.g. ^2) in the middle of the search string instructs the search engine to skip two words in between any two query parameters (e.g. book ^2 advance).
  • Head and/or tail partial search of the query word/phrase is possible by adding '  -  ' to the front or end of the query word/phrase (e.g. -ment, advan-, -fine-)
  • Tail partial search for the POS tag is possible by adding '-' at the end of the POS tag (e.g. ^NN- will return ^NN, ^NNS, ^NNP, ^NNPS)
  • Units are separated by a space. Punctuation is not accepted.
3. Part-of-speech tag symbols in alphabetical order*

^CC Coordinating conjunction ^PRPS Possessive pronoun
^CD Cardinal number ^RB Adverb
^DT Determiner ^RBR Adverb, comparative
^EX Existential 'there' ^RBS Adverb, superlative
^FW Foreign word ^RP Particle
^IN Preposition or subordinating conjunction ^TO 'to'
^JJ Adjective ^UH Interjection
^JJR Adjective, comparative ^VB Verb, base form
^JJS Adjective, superlative ^VBD Verb, past tense
^MD Modal ^VBG Verb, gerund or present participle
^NN Noun, singular or mass ^VBN Verb, past participle
^NNS Noun, plural ^VBP Verb, non-3rd person singular present
^NNP Proper noun, singular ^VBZ Verb, 3rd person singular present
^NNPS Proper noun, plural ^WDT Wh-determiner
^PDT Predeterminer ^WP Wh-pronoun
^POS Possessive ending ^WPS Possessive wh-pronoun
^PRP Personal pronoun ^WRB Wh-adverb

4. Maximum 10,000 concordance lines can be listed.

* Source: Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Retrieved December 20, 2016 from http://repository.upenn.edu/cgi/viewcontent.cgi?article=1603&context=cis_reports
* The corpus is tagged with Stanford Part-of-speech Tagger v.3.6.0.

       

Back to Main Profession-specific Corpora Search Page

 

The compilation of the HKCCGR was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. PolyU 5440/13H)