Part-of-speech tag search tutorial
Key features of the part-of-speech search engine
1. Support a search string of up to four query parameters.
What is a query parameter?
A query parameter is a unit of text that has a string and/or a part-of-speech tag and is separated from another string with a space. For example, the following query string contains four search parameters:
word1 word2 word3 word4
A parameter contains two parts:
For example, cat^nn is one parameter, cat is the word, nn is the POS tag. The caret ^ separates the word and the POS tag.
The Stanford Part-of-Speech tagger was used to tag the corpora; Penn Treebank project part-of-speech tags are used for query.
The search engine supports a partial search. You can put a dash at the head and/or the tail of the query word/phrase, and add the dash to the tail of the part-of-speech tag for querying words/part-of-speech tags that match the pattern. For example:
will list all words that end with ment. (e.g. development, advancement, etc.)
The search engine is case-insensitive. Punctuation is not accepted.
To search for an n-gram, just type the word in the query parameter box and click query.
You can insert a skipgram parameter in the query string to perform a skipgram search. The skipgram parameter starts with a caret ^, followed by an one-digit number to specify the number of words to skip. For example:
He ^3 work
He is going to work
How to use:
On the corpus page, type in query parameters and select the number of instances to be listed. Click 'Click here to query' button to start the search.
The top left of the search result list tells you the number of all matched occurences in the corpus. If the list is truncated to match the maximum number of instances limited by the system or specified in the query, a reminder 'Showing XX randomised instances' is displayed.
If the query returns fewer instances as set by the query option than the corpus contains, the link below will appear on the top right side of the query result:
You can click the link to show all instances.
Download the concordance list
On the top right side of the search result list, a blue button with the caption
appears after the full list is completely loaded. The downloaded Rich Text format file retains the colours of the concordance list first displayed. You can use Microsoft Word or free software such as WordPad for Windows and CopyWriter to open a Rich text format file.
Sort the concordance list
On the header of the concordance list, you can click the specific sorting option to sort the list. The Rich Text formatted file retains the original concordance list and is not affected by sorting.
Show the full description of a part-of-speech tag
You can place the mouse cursor on any part-of-speech tag to show the full description of the tag. A list of all part-of-speech tags used is attached to the end of the concordance list.
Show the expanded section of a concordance line
You can click on the first query word (i.e. the word underlined and in blue) to load the expanded section of the concordance line:
The expanded section contains the original text and the part-of-speech tagged text. The first query word is highlighted in blue.