Corpus of Research Articles 2007
(Part of speech search)

Welcome to the Corpus of Research Articles (CRA) 2007. The CRA is a large collection of Research Articles collected from 39 disciplines.

There are currently 5,609,407 words in the corpus.


Remarks: (Click here for detailed instructions)

(Recommend using Mozilla Firefox or Apple® Safari to search. Best viewed in 1680x1050 screen resolution.)

1. Query word/phrase accepts English alphabets and dash only (i.e. A-Z, a-z, -), case-insensitive.
2. Search string syntax word^pos. [e.g. book^nn, book^vb ^dt, ^vb ^dt ^nn, etc.]
  • The symbol ^tag indicates the tag to be searched for. The string in front of the caret symbol ^ is the query word.
  • A maximum of 5 units of 'word', 'word^pos' or '^pos' can be used for query.
  • A caret with a number (e.g. ^2) in the middle of the search string instructs the search engine to skip two words in between any two query parameters (e.g. book ^2 advance).
  • Head and/or tail partial search of the query word/phrase is possible by adding '  -  ' to the front or end of the query word/phrase (e.g. -ment, advan-, -fine-)
  • Tail partial search for the POS tag is possible by adding '-' at the end of the POS tag (e.g. ^NN- will return ^NN, ^NNS, ^NNP, ^NNPS)
  • Units are separated by a space. Punctuation is not accepted.
3. Part-of-speech tag symbols in alphabetical order*

^CC Coordinating conjunction ^PRPS Possessive pronoun
^CD Cardinal number ^RB Adverb
^DT Determiner ^RBR Adverb, comparative
^EX Existential 'there' ^RBS Adverb, superlative
^FW Foreign word ^RP Particle
^IN Preposition or subordinating conjunction ^TO 'to'
^JJ Adjective ^UH Interjection
^JJR Adjective, comparative ^VB Verb, base form
^JJS Adjective, superlative ^VBD Verb, past tense
^MD Modal ^VBG Verb, gerund or present participle
^NN Noun, singular or mass ^VBN Verb, past participle
^NNS Noun, plural ^VBP Verb, non-3rd person singular present
^NNP Proper noun, singular ^VBZ Verb, 3rd person singular present
^NNPS Proper noun, plural ^WDT Wh-determiner
^PDT Predeterminer ^WP Wh-pronoun
^POS Possessive ending ^WPS Possessive wh-pronoun
^PRP Personal pronoun ^WRB Wh-adverb

4. Maximum 10,000 concordance lines can be listed.

* Source: Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Retrieved December 20, 2016 from
* The corpus is tagged with Stanford Part-of-speech Tagger v.3.6.0.


Back to Main Profession-specific Corpora Search Page


The work to compile the CRA was substantially supported with generous funding from the Dean of the Faculty of Humanities (Projects No. 1-87TT and 1-87SP). This support is gratefully acknowledged.