ConcGramCore Frequent Asked Questions

This FAQ page provides answers to some common questions.

   
   

Q.

What is the Licence term?

A.

ConcGramCore is a MIT license free software. Here is the licence content:

Copyright 2018 RCPCE, Department of English, The Hong Kong Polytechnic University

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Q.

What is the hardware requirement to run ConcGramCore?

A.

It is best to run ConcGramCore on a computer with 64-bit Microsoft Windows, 4GB RAM and 250GB free hard disk space. If your computer was purchased in less than three years, they are probably good enough to run the program. Indeed, it should work well on the old computer you have. The codes had been tested on a computer with 2GB memory and 32GB hard drive.

The SQL engine will only swap the data to computer hard drive if it had exhausted the computer memory.

Q.

But why it uses a lot of computer memory and hard drive space?

A.

Concgrams measure the positional variations of all combinations in a predetermined word span. The larger the word span and corpus size, the total combinations could reach billions.

Q.

The program runs fast on a smaller corpus, and slow on a larger corpus, why?

A.

The SQLite engine store temporary data in the computer memory. If the memory is exhausted, it will swap the data on to the computer hard drive. Since computer hard drive is much much slower than computer memory, a larger corpus could exhaust the computer memory and utilise the hard drive to swap the data.

Even that the process will be slow once the program swaps data to hard drive, the program is unlikely to crash. Just leave it runs, and it will give you the results when the process is finished.

Q.

How big a corpus the program could handle?

A.

It depends on the corpus size, word span, the size of concgrams and whether it is running on a 64-bit Windows. I expect your hard drive will run out of space first before the segmentation process exhausted all the available computer memory. There should not have problems to handle a multi-million words corpus if there is enough free space on the hard drive (Remarks: I did not test the limit).

Q.

I selected Stanford POS tagger for the segmentation, but the program out is an empty file?

A.

Stanford POS tagger has a specific computer memory requirement. If your computer is running a 32-bit Windows or installed fewer than 4GB of computer memory, it is quite possible to hit the memory limit and crashed the Stanford POS tagger. If the output from the default simple segmentation method is okay, then you may have hit the Stanford POS tagger memory limit.

Q.

Do you recommend running the program on SolidState drive (SSD)?

A.

Yes, it is highly recommended if you are working on a multi-million words corpus. An SSD will improve the search speed by ten times compared to the hard drive in your computer.

 

 

Back to RCPCE Home Page
Last updated on 19 June 2018