Commit Graph

15700 Commits

Author SHA1 Message Date
Ines Montani
fe230c8776 Fix typo [ci skip] 2019-08-20 13:02:05 +02:00
Daniel Bourke
b0a28fd0de fix PhraseMatcher link typo (#4150)
/api/phtasematcher -> /api/phrasematcher
2019-08-20 13:01:43 +02:00
Ines Montani
22a31b78d6 Merge branch 'master' into spacy.io 2019-08-19 19:22:19 +02:00
Ines Montani
ce4c3e5204 Document force flag on set_extension (closes #4148) 2019-08-19 19:22:07 +02:00
Ines Montani
af3dd786b1 Merge branch 'master' into spacy.io 2019-08-19 13:59:51 +02:00
Ines Montani
66aba2d676 Improve regex matching docs [ci skip] 2019-08-19 13:59:41 +02:00
Ines Montani
50b117c072 Merge branch 'master' into spacy.io 2019-08-19 11:54:53 +02:00
Ines Montani
8b738a9f35 Update .gitignore [ci skip] 2019-08-19 11:54:42 +02:00
Sofie Van Landeghem
cc66f47893 Make enabling/disabling jupyter mode more explicit (#4144)
* make enabling/disabling jupyter mode more explicit

* markup fix
2019-08-19 11:53:34 +02:00
Ivan Šarić
434f6fa6c1 Issue #1107 - adds examples.py for Croatian language (#4143)
* adds contributor agreement for isaric

* adds examples.py for croatian language
2019-08-18 23:04:41 +02:00
Ines Montani
e520eb3f6c Make visualized NER examples more clear (closes #4104) [ci skip] 2019-08-18 16:29:29 +02:00
Paul O'Leary McCann
7f82a1fe1b Make the emoticon list a raw string (#4139)
While working on an unrelated task I got warnings about an unsupported
escape sequence (`"\("`) in the tokenizer exceptions. Making the
tokenizer exceptions a raw string makes this warning go away.

The specific string that triggered this is `¯\(ツ)/¯`.
2019-08-18 15:17:13 +02:00
Ines Montani
009280fbc5 Tidy up and auto-format 2019-08-18 15:09:16 +02:00
Ines Montani
89f2b87266 Open file as utf-8 (closes #4138) 2019-08-18 13:55:34 +02:00
Ines Montani
f35a8221d8 Move generation of parses out of with blocks 2019-08-18 13:54:26 +02:00
yanaiela
ec0beccaf1 Custom entity render (#4117)
* customizable template for entities display, allowing to pass additional parameters along each entity

* contributor agreement

* simpler naming for the additional parameters given to the span entities renderer

Co-Authored-By: Ines Montani <ines@ines.io>

* change of default parameter, as suggested

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-16 18:39:25 +02:00
Ines Montani
5a8a39c9b0 Merge branch 'master' into spacy.io 2019-08-16 17:48:40 +02:00
Jeno
91441f169c Update universe.json to include negspacy (#4132) 2019-08-16 17:48:17 +02:00
Jeno Pizarro
2e6e0321dd Update universe.json to include negspacy 2019-08-16 10:24:09 -04:00
Ines Montani
e5c7e19e82 Fix typo and auto-format [ci skip] 2019-08-16 10:53:38 +02:00
adrianeboyd
a58cb023d7 WIP: Extending debug-data (#4114)
* Extending debug-data with dependency checks, etc.

* Modify debug-data to load with GoldCorpus to iterate over .json/.jsonl
files within directories

* Add GoldCorpus iterator train_docs_without_preprocessing to load
original train docs without shuffling and projectivizing

* Report number of misaligned tokens

* Add more dependency checks and messages

* Update spacy/cli/debug_data.py

Co-Authored-By: Ines Montani <ines@ines.io>

* Fixed conflict

* Move counts to _compile_gold()

* Move all dependency nonproj/sent/head/cycle counting to
_compile_gold()

* Unclobber previous merges

* Update variable names

* Update more variable names, fix misspelling

* Don't clobber loading error messages

* Only warn about misaligned tokens if present
2019-08-16 10:52:46 +02:00
Ziming He
eea7d4f4a8 biluo_tags_from_offsets throw exception for overlapping entities (#4021)
* Check whether two entities overlap

- biluo_gold_biluo_overlap now throw exception when entities passed in have overlaps
- added unit test

* SCA agreement
2019-08-15 18:13:32 +02:00
adrianeboyd
2f9b28c218 Provide more info in cycle error message E069 (#4123)
Provide the tokens in the cycle and the first 50 tokens from document in
the error message so it's easier to track down the location of the cycle
in the data.

Addresses feature request in #3698.
2019-08-15 18:08:28 +02:00
AJ Rader
2f3648700c Correction of default lemmatizer lookup in English (Issue # 4104) (#4110)
* pytest file for issue4104 established

* edited default lookup english lemmatizer for spun; fixes issue 4102

* eliminated parameterization and sorted dictionary dependnency in issue 4104 test

* added contributor agreement
2019-08-15 11:39:10 +02:00
Ines Montani
1711b5eb62
💫 Support displaCy user colors via entry point (#4113) 2019-08-13 15:59:55 +02:00
Sofie Van Landeghem
0ba1b5eebc CLI scripts for entity linking (wikipedia & generic) (#4091)
* document token ent_kb_id

* document span kb_id

* update pipeline documentation

* prior and context weights as bool's instead

* entitylinker api documentation

* drop for both models

* finish entitylinker documentation

* small fixes

* documentation for KB

* candidate documentation

* links to api pages in code

* small fix

* frequency examples as counts for consistency

* consistent documentation about tensors returned by predict

* add entity linking to usage 101

* add entity linking infobox and KB section to 101

* entity-linking in linguistic features

* small typo corrections

* training example and docs for entity_linker

* predefined nlp and kb

* revert back to similarity encodings for simplicity (for now)

* set prior probabilities to 0 when excluded

* code clean up

* bugfix: deleting kb ID from tokens when entities were removed

* refactor train el example to use either model or vocab

* pretrain_kb example for example kb generation

* add to training docs for KB + EL example scripts

* small fixes

* error numbering

* ensure the language of vocab and nlp stay consistent across serialization

* equality with =

* avoid conflict in errors file

* add error 151

* final adjustements to the train scripts - consistency

* update of goldparse documentation

* small corrections

* push commit

* turn kb_creator into CLI script (wip)

* proper parameters for training entity vectors

* wikidata pipeline split up into two executable scripts

* remove context_width

* move wikidata scripts in bin directory, remove old dummy script

* refine KB script with logs and preprocessing options

* small edits

* small improvements to logging of EL CLI script
2019-08-13 15:38:59 +02:00
Ines Montani
5196dbd89d Delete wip.yml [ci skip] 2019-08-13 13:31:21 +02:00
Ines Montani
35c865024b Fix file name [ci skip] 2019-08-12 18:39:54 +02:00
Ines Montani
3a39154804 Create wip.yaml [ci skip] 2019-08-12 17:26:31 +02:00
黎谢鹏
250a54414b update lang/zh (#4103)
* update lang/zh

* update lang/zh
2019-08-12 10:37:48 +02:00
Ines Montani
f653e1bbea Merge branch 'master' into spacy.io 2019-08-11 11:14:10 +02:00
Ines Montani
1362f793cf Improve docs on phrase pattern attributes (closes #4100) [ci skip] 2019-08-11 11:13:49 +02:00
Ines Montani
0df13a829c Merge branch 'master' into spacy.io 2019-08-09 17:42:46 +02:00
Ines Montani
1f4d8bf77e Update universe.json [ci skip] 2019-08-09 17:42:37 +02:00
Ines Montani
c1bd7094dc Merge branch 'master' into spacy.io 2019-08-09 17:22:27 +02:00
Ines Montani
f2516177dd Merge branch 'master' into spacy.io 2019-08-09 17:17:01 +02:00
ICLR&D
87e40b17a0 Add entry for Blackstone in universe.json (#4101)
* Add entry for Blackstone in universe.json

Add an entry for the Blackstone project. Checked JSON is valid.

* Create ICLRandD.md

* Fix indentation (tabs to spaces)

It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that.

* Try to fix formatting for diff

* Fix diff


Co-authored-by: Ines Montani <ines@ines.io>
2019-08-09 17:16:51 +02:00
Sofie Van Landeghem
963ea5e8d0 Update lemma and vector information after splitting a token (#4097)
* fixing vector and lemma attributes after retokenizer.split

* fixing unit test with mockup tensor

* xp instead of numpy
2019-08-08 15:09:44 +02:00
Ines Montani
dbde9cd0f2 Merge branch 'master' into spacy.io 2019-08-08 13:03:57 +02:00
Ines Montani
a2ac2e873f Update Binder version [ci skip] 2019-08-08 13:03:45 +02:00
Ines Montani
f623b579d2 Merge branch 'master' into spacy.io 2019-08-08 11:21:01 +02:00
Matthew Honnibal
04113a844d Set version to v2.1.8 2019-08-07 13:53:58 +02:00
Ines Montani
36ac044937 Update README.md [ci skip] 2019-08-07 13:38:59 +02:00
Ines Montani
3e60afacf9 Add Serbian to languages [ci skip] 2019-08-07 13:38:25 +02:00
Ines Montani
1dc28a9ecb Update Binder version [ci skip] 2019-08-07 13:38:12 +02:00
Ines Montani
6bec24cdd0 Require downloaded model in pkg_resources (#4090) 2019-08-07 13:18:11 +02:00
Ines Montani
95d63c74b4 Update site.json 2019-08-07 00:47:40 +02:00
Ines Montani
8b4a0fabbb Adjust docs example [ci skip] 2019-08-07 00:46:47 +02:00
adrianeboyd
69aca7d839 Add validate option to EntityRuler (#4089)
* Add validate option to EntityRuler

* Add validate to EntityRuler, passed to Matcher and PhraseMatcher

* Add validate to usage and API docs

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-07 00:40:53 +02:00
Ines Montani
25a7a5fbdc Merge branch 'master' into spacy.io 2019-08-06 12:20:34 +02:00