spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-10 03:34:21 +03:00

History

Paul O'Leary McCann 6e9e686568 Sample implementation of Japanese Tagger (ref #1214 ) This is far from complete but it should be enough to check some things. 1. Mecab transition. Janome doesn't support Unidic, only IPAdic, but UD tag mappings are based on Unidic. This switches out Mecab for Janome to get around that. 2. Raw tag extension. A simple tag map can't meet the specifications for UD tag mappings, so this adds an extra field to ambiguous cases. For this demo it just deals with the simplest case, which only needs to look at the literal token. (In reality it may be necessary to look at the whole sentence, but that's another issue.) 3. General code structure. Seems nobody else has implemented a custom Tagger yet, so still not sure this is the correct way to pass the vocabulary around, for example. Any feedback would be greatly appreciated. -POLM		2017-08-08 01:27:15 +09:00
..
__init__.py	Add basic Japanese tokenizer test	2017-06-28 01:24:25 +09:00
test_tagger.py	Sample implementation of Japanese Tagger (ref #1214 )	2017-08-08 01:27:15 +09:00
test_tokenizer.py	Parametrize and extend Japanese tokenizer tests	2017-06-29 00:09:40 +09:00