spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-05 20:31:30 +03:00

History

Sofie Van Landeghem 2d249a9502 KB extensions and better parsing of WikiData (#4375 ) * fix overflow error on windows * more documentation & logging fixes * md fix * 3 different limit parameters to play with execution time * bug fixes directory locations * small fixes * exclude dev test articles from prior probabilities stats * small fixes * filtering wikidata entities, removing numeric and meta items * adding aliases from wikidata also to the KB * fix adding WD aliases * adding also new aliases to previously added entities * fixing comma's * small doc fixes * adding subclassof filtering * append alias functionality in KB * prevent appending the same entity-alias pair * fix for appending WD aliases * remove date filter * remove unnecessary import * small corrections and reformatting * remove WD aliases for now (too slow) * removing numeric entities from training and evaluation * small fixes * shortcut during prediction if there is only one candidate * add counts and fscore logging, remove FP NER from evaluation * fix entity_linker.predict to take docs instead of single sentences * remove enumeration sentences from the WP dataset * entity_linker.update to process full doc instead of single sentence * spelling corrections and dump locations in readme * NLP IO fix * reading KB is unnecessary at the end of the pipeline * small logging fix * remove empty files		2019-10-14 12:28:53 +02:00
..
cli	KB extensions and better parsing of WikiData (#4375 )	2019-10-14 12:28:53 +02:00
data	Make spacy/data a package	2017-03-18 20:04:22 +01:00
displacy	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
lang	Initial commit: New language Luxembourgish (lb) (#4424 )	2019-10-14 12:27:50 +02:00
matcher	Fix PhraseMatcher.remove for overlapping patterns (#4437 )	2019-10-14 12:19:51 +02:00
pipeline	KB extensions and better parsing of WikiData (#4375 )	2019-10-14 12:28:53 +02:00
syntax	Ensure the NER remains consistent after resizing (#4330 )	2019-09-27 20:57:13 +02:00
tests	KB extensions and better parsing of WikiData (#4375 )	2019-10-14 12:28:53 +02:00
tokens	Bugfix initializing DocBin with attributes (#4368 )	2019-10-03 14:48:45 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Add registry for model creation functions ('architectures') (#4395 )	2019-10-08 12:21:03 +02:00
__main__.py	Update __main__.py	2019-03-20 09:43:26 +01:00
_align.pyx	Improve alignment around quotes	2018-08-16 01:04:34 +02:00
_ml.py	Improve spacy pretrain (#4393 )	2019-10-07 23:34:58 +02:00
about.py	Set version to v2.2.1	2019-10-03 14:50:39 +02:00
attrs.pxd	Fix attrs alignment	2019-07-12 17:59:47 +02:00
attrs.pyx	Bugfix initializing DocBin with attributes (#4368 )	2019-10-03 14:48:45 +02:00
compat.py	Improve usage of pkg_resources and handling of entry points (#4387 )	2019-10-07 17:22:09 +02:00
errors.py	KB extensions and better parsing of WikiData (#4375 )	2019-10-14 12:28:53 +02:00
glossary.py	Include Norwegian NER entity types in glossary [ci skip]	2019-09-15 17:16:21 +02:00
gold.pxd	Merge changes from master	2019-08-21 14:18:52 +02:00
gold.pyx	Fix orth replacement	2019-09-19 00:03:24 +02:00
kb.pxd	rename entity frequency	2019-07-19 17:40:28 +02:00
kb.pyx	KB extensions and better parsing of WikiData (#4375 )	2019-10-14 12:28:53 +02:00
language.py	KB extensions and better parsing of WikiData (#4375 )	2019-10-14 12:28:53 +02:00
lemmatizer.py	Refactor lemmatizer and data table integration (#4353 )	2019-10-01 21:36:03 +02:00
lexeme.pxd	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 )	2019-02-24 21:13:51 +01:00
lexeme.pyx	Alphanumeric -> alphabetic [ci skip]	2019-10-06 13:30:01 +02:00
lookups.py	Refactor lemmatizer and data table integration (#4353 )	2019-10-01 21:36:03 +02:00
morphology.pxd	annotate kb_id through ents in doc	2019-03-22 11:36:44 +01:00
morphology.pyx	Improve Morphology errors (#4314 )	2019-09-21 14:37:06 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
scorer.py	Make except more explicit	2019-09-18 19:57:08 +02:00
strings.pxd	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
strings.pyx	Merge branch 'master' into feature/lemmatizer	2019-03-16 13:44:22 +01:00
structs.pxd	Merge changes from master	2019-08-21 14:18:52 +02:00
symbols.pxd	Fix symbol alignment	2019-07-12 17:48:38 +02:00
symbols.pyx	ensure Span.as_doc keeps the entity links + unit test	2019-06-25 15:28:51 +02:00
tokenizer.pxd	Flush tokenizer cache when necessary (#4258 )	2019-09-08 20:52:46 +02:00
tokenizer.pyx	Improve URL_PATTERN and handling in tokenizer (#4374 )	2019-10-05 13:00:09 +02:00
typedefs.pxd	Work on changing StringStore to return hashes.	2017-05-28 12:36:27 +02:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Fix util.filter_spans() to prefer first span in overlapping sam… (#4414 )	2019-10-10 17:00:03 +02:00
vectors.pyx	Consider batch_size when sorting similar vectors (#4388 )	2019-10-07 13:38:35 +02:00
vocab.pxd	💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167 )	2019-08-22 14:21:32 +02:00
vocab.pyx	most_similar() return the k most similar vectors (#4364 )	2019-10-03 14:09:44 +02:00