Commit Graph

15677 Commits

Author SHA1 Message Date
AJ Rader
2f3648700c Correction of default lemmatizer lookup in English (Issue # 4104) (#4110)
* pytest file for issue4104 established

* edited default lookup english lemmatizer for spun; fixes issue 4102

* eliminated parameterization and sorted dictionary dependnency in issue 4104 test

* added contributor agreement
2019-08-15 11:39:10 +02:00
Ines Montani
1711b5eb62
💫 Support displaCy user colors via entry point (#4113) 2019-08-13 15:59:55 +02:00
Sofie Van Landeghem
0ba1b5eebc CLI scripts for entity linking (wikipedia & generic) (#4091)
* document token ent_kb_id

* document span kb_id

* update pipeline documentation

* prior and context weights as bool's instead

* entitylinker api documentation

* drop for both models

* finish entitylinker documentation

* small fixes

* documentation for KB

* candidate documentation

* links to api pages in code

* small fix

* frequency examples as counts for consistency

* consistent documentation about tensors returned by predict

* add entity linking to usage 101

* add entity linking infobox and KB section to 101

* entity-linking in linguistic features

* small typo corrections

* training example and docs for entity_linker

* predefined nlp and kb

* revert back to similarity encodings for simplicity (for now)

* set prior probabilities to 0 when excluded

* code clean up

* bugfix: deleting kb ID from tokens when entities were removed

* refactor train el example to use either model or vocab

* pretrain_kb example for example kb generation

* add to training docs for KB + EL example scripts

* small fixes

* error numbering

* ensure the language of vocab and nlp stay consistent across serialization

* equality with =

* avoid conflict in errors file

* add error 151

* final adjustements to the train scripts - consistency

* update of goldparse documentation

* small corrections

* push commit

* turn kb_creator into CLI script (wip)

* proper parameters for training entity vectors

* wikidata pipeline split up into two executable scripts

* remove context_width

* move wikidata scripts in bin directory, remove old dummy script

* refine KB script with logs and preprocessing options

* small edits

* small improvements to logging of EL CLI script
2019-08-13 15:38:59 +02:00
Ines Montani
5196dbd89d Delete wip.yml [ci skip] 2019-08-13 13:31:21 +02:00
Ines Montani
35c865024b Fix file name [ci skip] 2019-08-12 18:39:54 +02:00
Ines Montani
3a39154804 Create wip.yaml [ci skip] 2019-08-12 17:26:31 +02:00
黎谢鹏
250a54414b update lang/zh (#4103)
* update lang/zh

* update lang/zh
2019-08-12 10:37:48 +02:00
Ines Montani
f653e1bbea Merge branch 'master' into spacy.io 2019-08-11 11:14:10 +02:00
Ines Montani
1362f793cf Improve docs on phrase pattern attributes (closes #4100) [ci skip] 2019-08-11 11:13:49 +02:00
Ines Montani
0df13a829c Merge branch 'master' into spacy.io 2019-08-09 17:42:46 +02:00
Ines Montani
1f4d8bf77e Update universe.json [ci skip] 2019-08-09 17:42:37 +02:00
Ines Montani
c1bd7094dc Merge branch 'master' into spacy.io 2019-08-09 17:22:27 +02:00
Ines Montani
f2516177dd Merge branch 'master' into spacy.io 2019-08-09 17:17:01 +02:00
ICLR&D
87e40b17a0 Add entry for Blackstone in universe.json (#4101)
* Add entry for Blackstone in universe.json

Add an entry for the Blackstone project. Checked JSON is valid.

* Create ICLRandD.md

* Fix indentation (tabs to spaces)

It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that.

* Try to fix formatting for diff

* Fix diff


Co-authored-by: Ines Montani <ines@ines.io>
2019-08-09 17:16:51 +02:00
Sofie Van Landeghem
963ea5e8d0 Update lemma and vector information after splitting a token (#4097)
* fixing vector and lemma attributes after retokenizer.split

* fixing unit test with mockup tensor

* xp instead of numpy
2019-08-08 15:09:44 +02:00
Ines Montani
dbde9cd0f2 Merge branch 'master' into spacy.io 2019-08-08 13:03:57 +02:00
Ines Montani
a2ac2e873f Update Binder version [ci skip] 2019-08-08 13:03:45 +02:00
Ines Montani
f623b579d2 Merge branch 'master' into spacy.io 2019-08-08 11:21:01 +02:00
Matthew Honnibal
04113a844d Set version to v2.1.8 2019-08-07 13:53:58 +02:00
Ines Montani
36ac044937 Update README.md [ci skip] 2019-08-07 13:38:59 +02:00
Ines Montani
3e60afacf9 Add Serbian to languages [ci skip] 2019-08-07 13:38:25 +02:00
Ines Montani
1dc28a9ecb Update Binder version [ci skip] 2019-08-07 13:38:12 +02:00
Ines Montani
6bec24cdd0 Require downloaded model in pkg_resources (#4090) 2019-08-07 13:18:11 +02:00
Ines Montani
95d63c74b4 Update site.json 2019-08-07 00:47:40 +02:00
Ines Montani
8b4a0fabbb Adjust docs example [ci skip] 2019-08-07 00:46:47 +02:00
adrianeboyd
69aca7d839 Add validate option to EntityRuler (#4089)
* Add validate option to EntityRuler

* Add validate to EntityRuler, passed to Matcher and PhraseMatcher

* Add validate to usage and API docs

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-07 00:40:53 +02:00
Ines Montani
25a7a5fbdc Merge branch 'master' into spacy.io 2019-08-06 12:20:34 +02:00
Ines Montani
4ae320e5c2 Use consistent casing for entity ruler patterns (see #4063) [ci skip] 2019-08-06 12:20:22 +02:00
Ines Montani
f023175ca3 Merge branch 'master' into spacy.io 2019-08-06 12:13:53 +02:00
Ines Montani
223bde5cf6 Improve docs on matcher attributes [ci skip] (closes #4063) 2019-08-06 12:13:42 +02:00
Ines Montani
2bfae0b167 Auto-format 2019-08-06 12:13:31 +02:00
Jeno
15be09ceb0 Raise error if annotation dict in simple training style has unexpected keys #4074 (#4079)
* adding enhancement #4074.

* modified behavior to strictly require top level dictionary keys - issue #4074

* pass expected keys to error message and add links as expected top level key
2019-08-06 11:01:25 +02:00
Sofie Van Landeghem
ad09b0d6f3 fetch norm from lex if necessary for matching (#4080) 2019-08-05 23:51:04 +02:00
Ines Montani
7f3212e2f5
💫 Sync branches (#4084) [ci skip]
* Update from master

* Re-added Universe readme (#3688) (closes #3680)

* Fix typo

* Add version tag to `--base-model` argument (closes #3720)

* fixing regex matcher examples (#3708) (#3719)

* Improve Token.prob and Lexeme.prob docs (resolves #3701)

* Fix DependencyParser.predict docs (resolves #3561)

* Update languages.json


Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Aaron Kub <aaronkub@gmail.com>
2019-08-05 14:32:54 +02:00
Ines Montani
84a00ce55e Merge branch 'master' into spacy.io 2019-08-05 14:30:22 +02:00
Ines Montani
0f740fad1a Update universe.json [ci skip] 2019-08-05 14:30:07 +02:00
Pavle Vidanović
e1a935d71c Stopwords for Serbian language. (#4078)
* Serbian stopwords added. (cyrillic alphabet)

* spaCy Contribution agreement included.

* Test initialize updated
2019-08-05 10:22:27 +02:00
Sebastian Jordan
878302a55d Fix typo in requirements section of pyproject.toml (#4081) 2019-08-05 10:21:14 +02:00
veer-bains
874bd8c8dd Fixed syntax error in lang/ko when using python 2 (#4082) (closes #4068)
* fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py

* fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py

* Update __init__.py

* Create veer-bains.md

* Update __init__.py

fixed syntax errors in variable datatype assignment when calling spacy.blank("ko") with python 2.7
2019-08-05 10:19:32 +02:00
Ines Montani
87ddbdc33e Fix handling of kwargs in Language.evaluate
Makes it consistent with other methods
2019-08-04 13:44:21 +02:00
Muhammad Irfan
d1d30b0442 added missing punctuation following conventions. (#4066) 2019-08-04 13:41:18 +02:00
Ines Montani
0e680046ac Update languages.json 2019-08-02 21:44:26 +02:00
Anastassia
33b14724a5 Update gold corpus code to properly ingest a directory of jsonl… (#4067)
* Update gold corpus code to properly ingest a directory of jsonlines files

In response to: https://github.com/explosion/spaCy/issues/3975

* Update spacy/gold.pyx

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-02 09:58:51 +02:00
Ines Montani
dcad9a14c5 Merge branch 'master' into spacy.io 2019-08-01 18:37:20 +02:00
Ines Montani
0f76e0022d Update .tensor docs [ci skip] 2019-08-01 18:37:09 +02:00
Ines Montani
d8fcebf386 Merge branch 'master' into spacy.io 2019-08-01 18:33:23 +02:00
Ines Montani
3072eb28c2 Support and render Markdown in model meta [ci skip] 2019-08-01 18:33:10 +02:00
Matthew Honnibal
944a66c326 Add span.tensor and token.tensor attributes 2019-08-01 18:30:50 +02:00
Matthew Honnibal
d3071ecdbc Set version to v2.1.7 2019-08-01 18:09:19 +02:00
Matthew Honnibal
97c51ef93b Set version to v2.1.7.dev1 2019-08-01 17:29:25 +02:00