spaCy/bin
Wolfgang Seeker eae35e9b27 add tokenizer files for German, add/change code to train German pos tagger
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/

- init_model.py
	- change doc freq threshold to 0
- add train_german_tagger.py
	- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
..
parser * Fancy up the CLI for the conll train script 2016-02-02 22:58:06 +01:00
tagger add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
cythonize.py fix cythonize 2016-02-05 16:17:23 +01:00
gather_freqs.py * Fix gather_freqs.py 2016-02-04 20:21:58 +01:00
get_freqs.py * Import io in get_freqs 2015-10-19 12:56:18 +11:00
init_model.py add tokenizer files for German, add/change code to train German pos tagger 2016-02-18 13:24:20 +01:00
munge_ewtb.py * Upd munge_ewtb for the new json format 2015-06-06 02:10:33 +02:00
ner_tag.py caught more codecs.open -> io.open 2015-09-30 20:20:09 +02:00
prepare_treebank.py caught more codecs.open -> io.open 2015-09-30 20:20:09 +02:00
prepare_vecs.py Remove trailing whitespace 2015-04-19 13:01:38 -07:00