mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 13:47:13 +03:00
61ef0739b8
List created by taking the 2000 top words from a Wikipedia dump and removing everything that wasn't hiragana. Tried going through kanji words and deciding what to keep but there were too many obvious non-stopwords (東京 was in the top 500) and many other words where it wasn't clear if they should be included or not. |
||
---|---|---|
.. | ||
__init__.py | ||
examples.py | ||
stop_words.py | ||
tag_map.py |