* Switch from mecab-python3 to fugashi
mecab-python3 has been the best MeCab binding for a long time but it's
not very actively maintained, and since it's based on old SWIG code
distributed with MeCab there's a limit to how effectively it can be
maintained.
Fugashi is a new Cython-based MeCab wrapper I wrote. Since it's not
based on the old SWIG code it's easier to keep it current and make small
deviations from the MeCab C/C++ API where that makes sense.
* Change mecab-python3 to fugashi in setup.cfg
* Change "mecab tags" to "unidic tags"
The tags come from MeCab, but the tag schema is specified by Unidic, so
it's more proper to refer to it that way.
* Update conftest
* Add fugashi link to external deps list for Japanese
* document token ent_kb_id
* document span kb_id
* update pipeline documentation
* prior and context weights as bool's instead
* entitylinker api documentation
* drop for both models
* finish entitylinker documentation
* small fixes
* documentation for KB
* candidate documentation
* links to api pages in code
* small fix
* frequency examples as counts for consistency
* consistent documentation about tensors returned by predict
* add entity linking to usage 101
* add entity linking infobox and KB section to 101
* entity-linking in linguistic features
* small typo corrections
* training example and docs for entity_linker
* predefined nlp and kb
* revert back to similarity encodings for simplicity (for now)
* set prior probabilities to 0 when excluded
* code clean up
* bugfix: deleting kb ID from tokens when entities were removed
* refactor train el example to use either model or vocab
* pretrain_kb example for example kb generation
* add to training docs for KB + EL example scripts
* small fixes
* error numbering
* ensure the language of vocab and nlp stay consistent across serialization
* equality with =
* avoid conflict in errors file
* add error 151
* final adjustements to the train scripts - consistency
* update of goldparse documentation
* small corrections
* push commit
* typo fix
* add candidate API to kb documentation
* update API sidebar with EntityLinker and KnowledgeBase
* remove EL from 101 docs
* remove entity linker from 101 pipelines / rephrase
* custom el model instead of existing model
* set version to 2.2 for EL functionality
* update documentation for 2 CLI scripts
* Added RONEC to spaCy Universe
* Added contributor file
* Corrected date from .github/contributors/avramandrei.md
* Convert tabs to spaces
* Remove duplicate keys
Can only have one GitHub link unfortunately
* Also add models category
* Adjust ID
This is used to generate the URL, so a simpler string is better
* Add entry for Blackstone in universe.json
Add an entry for the Blackstone project. Checked JSON is valid.
* Create ICLRandD.md
* Fix indentation (tabs to spaces)
It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that.
* Try to fix formatting for diff
* Fix diff
Co-authored-by: Ines Montani <ines@ines.io>
* Typo fix for AllenAI url
Changed incorrect home page url for AllenAI from appenai.org to allenai.org
* Sign contributor agreement
* Change date format