spaCy/spacy/cli
Sofie Van Landeghem 2d249a9502 KB extensions and better parsing of WikiData (#4375)
* fix overflow error on windows

* more documentation & logging fixes

* md fix

* 3 different limit parameters to play with execution time

* bug fixes directory locations

* small fixes

* exclude dev test articles from prior probabilities stats

* small fixes

* filtering wikidata entities, removing numeric and meta items

* adding aliases from wikidata also to the KB

* fix adding WD aliases

* adding also new aliases to previously added entities

* fixing comma's

* small doc fixes

* adding subclassof filtering

* append alias functionality in KB

* prevent appending the same entity-alias pair

* fix for appending WD aliases

* remove date filter

* remove unnecessary import

* small corrections and reformatting

* remove WD aliases for now (too slow)

* removing numeric entities from training and evaluation

* small fixes

* shortcut during prediction if there is only one candidate

* add counts and fscore logging, remove FP NER from evaluation

* fix entity_linker.predict to take docs instead of single sentences

* remove enumeration sentences from the WP dataset

* entity_linker.update to process full doc instead of single sentence

* spelling corrections and dump locations in readme

* NLP IO fix

* reading KB is unnecessary at the end of the pipeline

* small logging fix

* remove empty files
2019-10-14 12:28:53 +02:00
..
converters Fix ner_jsonl2json converter (fix #4389) (#4394) 2019-10-08 00:52:45 +02:00
__init__.py Move UD scripts to bin 2019-03-20 01:19:34 +01:00
_schemas.py Store JSON schemas in Python and tidy up (#3235) 2019-02-07 19:44:31 +11:00
convert.py Tidy up and auto-format [ci skip] 2019-08-31 13:39:06 +02:00
debug_data.py Initialize low data warning for debug-data parser (#4331) 2019-09-27 20:56:49 +02:00
download.py Improve usage of pkg_resources and handling of entry points (#4387) 2019-10-07 17:22:09 +02:00
evaluate.py Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
info.py Small CLI improvements (#3030) 2018-12-08 11:49:43 +01:00
init_model.py Support model name in init-model 2019-09-26 03:01:32 +02:00
link.py Small CLI improvements (#3030) 2018-12-08 11:49:43 +01:00
package.py Also support "requirements" in model.json 2019-07-27 13:34:57 +02:00
pretrain.py KB extensions and better parsing of WikiData (#4375) 2019-10-14 12:28:53 +02:00
profile.py pulling tqdm imports in functions to avoid bug (tmp fix) (#4263) 2019-09-09 16:32:11 +02:00
train.py Use consistent spelling 2019-10-02 10:37:39 +02:00
validate.py Improve usage of pkg_resources and handling of entry points (#4387) 2019-10-07 17:22:09 +02:00