* force extensions to avoid clash between example scripts
* fix arg order and default file encoding
* add example config for conllu script
* newline
* move extension definitions to main function
* few more encodings fixes
* Update imports for `bin/`
* Add all currently supported languages
* Update subtok merger for new Matcher validation
* Modify blinded check to look at tokens instead of lemmas (for corpora
with tokens but not lemmas like Telugu)