Corpus of Journal Articles 2014
(Part of speech search)

Welcome to the Corpus of Journal Articles (CJA) 2014. The CJA 2014 is a large collection of articles from 721 high-impact journals in 38 disciplines in Journal Citation Reports (JCR) or in SCImago Journal Rank (SJR).

The CJA 2014 consists of 760 articles with 6,140,708 words and contains three sub-corpora:

1. Research articles corpus contains 2,864,188 tokens in 18 sections.
2. Review articles corpus contains 2,354,110 tokens in 5 sections.
3. Theoretical articles corpus contains 922,410 tokens in 4 sections.




    
    
    
        


Remarks: (Click here for detailed instructions)

(Recommend using Mozilla Firefox or Apple® Safari to search. Best viewed in 1680x1050 screen resolution.)

1. Query word/phrase accepts English alphabets and dash only (i.e. A-Z, a-z, -), case-insensitive.
2. Search string syntax word^pos. [e.g. book^nn, book^vb ^dt, ^vb ^dt ^nn, etc.]
  • The symbol ^tag indicates the tag to be searched for. The string in front of the caret symbol ^ is the query word.
  • A maximum of 5 units of 'word', 'word^pos' or '^pos' can be used for query.
  • A caret with a number (e.g. ^2) in the middle of the search string instructs the search engine to skip two words in between any two query parameters (e.g. book ^2 advance).
  • Head and/or tail partial search of the query word/phrase is possible by adding '  -  ' to the front or end of the query word/phrase (e.g. -ment, advan-, -fine-)
  • Tail partial search for the POS tag is possible by adding '-' at the end of the POS tag (e.g. ^NN- will return ^NN, ^NNS, ^NNP, ^NNPS)
  • Units are separated by a space. Punctuation is not accepted.
3. Part-of-speech tag symbols in alphabetical order*

^CC Coordinating conjunction ^PRPS Possessive pronoun
^CD Cardinal number ^RB Adverb
^DT Determiner ^RBR Adverb, comparative
^EX Existential 'there' ^RBS Adverb, superlative
^FW Foreign word ^RP Particle
^IN Preposition or subordinating conjunction ^TO 'to'
^JJ Adjective ^UH Interjection
^JJR Adjective, comparative ^VB Verb, base form
^JJS Adjective, superlative ^VBD Verb, past tense
^MD Modal ^VBG Verb, gerund or present participle
^NN Noun, singular or mass ^VBN Verb, past participle
^NNS Noun, plural ^VBP Verb, non-3rd person singular present
^NNP Proper noun, singular ^VBZ Verb, 3rd person singular present
^NNPS Proper noun, plural ^WDT Wh-determiner
^PDT Predeterminer ^WP Wh-pronoun
^POS Possessive ending ^WPS Possessive wh-pronoun
^PRP Personal pronoun ^WRB Wh-adverb

4. Maximum 10,000 concordance lines can be listed.

* Source: Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Retrieved December 20, 2016 from http://repository.upenn.edu/cgi/viewcontent.cgi?article=1603&context=cis_reports

The work to compile the CJA 2014 was substantially supported by funding from the Departmental Teaching and Learning Grant, Department of English (Project No. 88BY).

Back to Main Profession-specific Corpora Search Page