Hong Kong Corpus of Corporate Governance Reports (HKCCGR)

Hong Kong Corpus of Corporate Governance Reports
(Part of speech search)

Welcome to the HKCCGR developed by the Research Centre for Professional Communication in English of the Hong Kong Polytechnic University.

The one-million word HKCCGR consists of the corporate governance reports of 217 companies listed on the Hong Kong Stock Exchange. These companies were carefully chosen to reflect the weighting of the four main sectors listed on the exchange (i.e. finance, utilities, property, and commercial and industrial). The moves which comprise the corporate governance reports were identified and 25 sub-corpora were compiled based on each move.

There are currently 1,034,673 words in the HKCCGR.

Please cite this corpus with following information:

Warren, M. (2017). Corpus-driven investigation of corporate governance reports. In E Friginal (Ed.). Studies in corpus-based sociolinguistics (pp. 275–292). New York, NY: Routledge. doi: http://hdl.handle.net/10397/71191

Remarks: (Click here for detailed instructions)

(Recommend using Mozilla Firefox or Apple^® Safari to search. Best viewed in 1680x1050 screen resolution.)

1. Query word/phrase accepts English alphabets and dash only (i.e. A-Z, a-z, -), case-insensitive.
2. Search string syntax word^pos. [e.g. book^nn, book^vb ^dt, ^vb ^dt ^nn, etc.]

The symbol ^tag indicates the tag to be searched for. The string in front of the caret symbol ^ is the query word.
A maximum of 5 units of 'word', 'word^pos' or '^pos' can be used for query.
A caret with a number (e.g. ^2) in the middle of the search string instructs the search engine to skip two words in between any two query parameters (e.g. book ^2 advance).
Head and/or tail partial search of the query word/phrase is possible by adding ' - ' to the front or end of the query word/phrase (e.g. -ment, advan-, -fine-)
Tail partial search for the POS tag is possible by adding '-' at the end of the POS tag (e.g. ^NN- will return ^NN, ^NNS, ^NNP, ^NNPS)
Units are separated by a space. Punctuation is not accepted.

3. Part-of-speech tag symbols in alphabetical order*

^CC	Coordinating conjunction	^PRPS	Possessive pronoun
^CD	Cardinal number	^RB	Adverb
^DT	Determiner	^RBR	Adverb, comparative
^EX	Existential 'there'	^RBS	Adverb, superlative
^FW	Foreign word	^RP	Particle
^IN	Preposition or subordinating conjunction	^TO	'to'
^JJ	Adjective	^UH	Interjection
^JJR	Adjective, comparative	^VB	Verb, base form
^JJS	Adjective, superlative	^VBD	Verb, past tense
^MD	Modal	^VBG	Verb, gerund or present participle
^NN	Noun, singular or mass	^VBN	Verb, past participle
^NNS	Noun, plural	^VBP	Verb, non-3rd person singular present
^NNP	Proper noun, singular	^VBZ	Verb, 3rd person singular present
^NNPS	Proper noun, plural	^WDT	Wh-determiner
^PDT	Predeterminer	^WP	Wh-pronoun
^POS	Possessive ending	^WPS	Possessive wh-pronoun
^PRP	Personal pronoun	^WRB	Wh-adverb

4. Maximum 10,000 concordance lines can be listed.

* Source: Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Retrieved December 20, 2016 from http://repository.upenn.edu/cgi/viewcontent.cgi?article=1603&context=cis_reports
* The corpus is tagged with Stanford Part-of-speech Tagger v.3.6.0.

Back to Main Profession-specific Corpora Search Page

The compilation of the HKCCGR was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. PolyU 5440/13H)