A language character counting tool

According to the latest Utopian rule, any contribution in 'translation' category must translate over a number of words. If the translator works in Crowdin, it is easy to have the word count number. However, if the translator works on Github project directly, it is a bit difficult to count the characters in a particular language as for translation work, ususlly multiple language characters exist in the same file. I have implemented a tool to do this job. It is written in Python and has been tested on Ubuntu 16.

Image source: pixabay.com

Implementation

The basic idea is to analysis the text and check each character against the unicode values for each language. In principle, the script works with any language - just edit the configuration file. Also, to make it handy for both translators and moderators, the tool support counting for both individual files and all files contained in a folder.

Test

I have written a couple of test to validate if the tool works and all tests pass.

$ python test.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

How to use

First, clone this repository to your PC.

Then modify the first line of wordcounter.py to get your python folder right:

#!/home/yuxi/environments/myenv/bin/python

To count individual file, run:
/YOUR_FOLDER/wordcounter.py FILENAME locale

To count all files within a folder, run:
/YOUR_FOLDER/wordcounter.py FOLDER locale

For example, if samples/1.yml has the following content:

zh-CN:
  File: 文件
  Edit：编辑
  Help：帮助

Then run command:

./wordcounter.py samples/1.yml   zh-CN

It returns 6

To run the following command to count Chinese characters within a folder:

./wordcounter.py samples/   zh-CN

It returns:

The tool is available here: https://github.com/yuxir/wordcounter

To prove it is the work I have done, I have changed the README in github repository, e.g. put my steemit URL in:

Posted on Utopian.io - Rewarding Open Source Contributors