mirror of
https://github.com/explosion/spaCy.git
synced 2025-11-01 08:27:44 +03:00
List created by taking the 2000 top words from a Wikipedia dump and removing everything that wasn't hiragana. Tried going through kanji words and deciding what to keep but there were too many obvious non-stopwords (東京 was in the top 500) and many other words where it wasn't clear if they should be included or not. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| examples.py | ||
| stop_words.py | ||
| tag_map.py | ||