Commit Graph

378 Commits

Author SHA1 Message Date
Ines Montani
b544dcb3c5 Document debug-data [ci skip] 2019-09-12 15:26:20 +02:00
Ines Montani
c0a4cab178 Update "Adding languages" docs [ci skip] 2019-09-12 14:53:06 +02:00
Ines Montani
e7c20ad1d2 Update colors entry points docs [ci skip] 2019-09-12 12:59:10 +02:00
Ines Montani
7b59a919e6 Update entry points docs [ci skip] 2019-09-12 12:52:06 +02:00
Sofie Van Landeghem
0b4b4f1819 Documentation for Entity Linking (#4065)
* document token ent_kb_id

* document span kb_id

* update pipeline documentation

* prior and context weights as bool's instead

* entitylinker api documentation

* drop for both models

* finish entitylinker documentation

* small fixes

* documentation for KB

* candidate documentation

* links to api pages in code

* small fix

* frequency examples as counts for consistency

* consistent documentation about tensors returned by predict

* add entity linking to usage 101

* add entity linking infobox and KB section to 101

* entity-linking in linguistic features

* small typo corrections

* training example and docs for entity_linker

* predefined nlp and kb

* revert back to similarity encodings for simplicity (for now)

* set prior probabilities to 0 when excluded

* code clean up

* bugfix: deleting kb ID from tokens when entities were removed

* refactor train el example to use either model or vocab

* pretrain_kb example for example kb generation

* add to training docs for KB + EL example scripts

* small fixes

* error numbering

* ensure the language of vocab and nlp stay consistent across serialization

* equality with =

* avoid conflict in errors file

* add error 151

* final adjustements to the train scripts - consistency

* update of goldparse documentation

* small corrections

* push commit

* typo fix

* add candidate API to kb documentation

* update API sidebar with EntityLinker and KnowledgeBase

* remove EL from 101 docs

* remove entity linker from 101 pipelines / rephrase

* custom el model instead of existing model

* set version to 2.2 for EL functionality

* update documentation for 2 CLI scripts
2019-09-12 11:38:34 +02:00
Sofie Van Landeghem
6b012cebff Make pos/tag distinction more clear in docs (#4246)
* make distinction between tag and pos more prominent in docs

* out of the 101
2019-09-06 10:31:21 +02:00
adrianeboyd
8fe7bdd0fa Improve token pattern checking without validation (#4105)
* Fix typo in rule-based matching docs

* Improve token pattern checking without validation

Add more detailed token pattern checks without full JSON pattern validation and
provide more detailed error messages.

Addresses #4070 (also related: #4063, #4100).

* Check whether top-level attributes in patterns and attr for PhraseMatcher are
  in token pattern schema

* Check whether attribute value types are supported in general (as opposed to
  per attribute with full validation)

* Report various internal error types (OverflowError, AttributeError, KeyError)
  as ValueError with standard error messages

* Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS,
  LEMMA, and DEP

* Add error messages with relevant details on how to use validate=True or nlp()
  instead of nlp.make_doc()

* Support attr=TEXT for PhraseMatcher

* Add NORM to schema

* Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler

* Remove unnecessary .keys()

* Rephrase error messages

* Add another type check to Matcher

Add another type check to Matcher for more understandable error messages
in some rare cases.

* Support phrase_matcher_attr=TEXT for EntityRuler

* Don't use spacy.errors in examples and bin scripts

* Fix error code

* Auto-format

Also try get Azure pipelines to finally start a build :(

* Update errors.py


Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2019-08-21 14:00:37 +02:00
Ines Montani
3134a9b6e0 Add section on expanding regex match to token boundaries (see #4158) [ci skip] 2019-08-21 12:53:31 +02:00
Ines Montani
66aba2d676 Improve regex matching docs [ci skip] 2019-08-19 13:59:41 +02:00
Sofie Van Landeghem
cc66f47893 Make enabling/disabling jupyter mode more explicit (#4144)
* make enabling/disabling jupyter mode more explicit

* markup fix
2019-08-19 11:53:34 +02:00
Ines Montani
e520eb3f6c Make visualized NER examples more clear (closes #4104) [ci skip] 2019-08-18 16:29:29 +02:00
Ines Montani
1362f793cf Improve docs on phrase pattern attributes (closes #4100) [ci skip] 2019-08-11 11:13:49 +02:00
Ines Montani
8b4a0fabbb Adjust docs example [ci skip] 2019-08-07 00:46:47 +02:00
adrianeboyd
69aca7d839 Add validate option to EntityRuler (#4089)
* Add validate option to EntityRuler

* Add validate to EntityRuler, passed to Matcher and PhraseMatcher

* Add validate to usage and API docs

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-07 00:40:53 +02:00
Ines Montani
4ae320e5c2 Use consistent casing for entity ruler patterns (see #4063) [ci skip] 2019-08-06 12:20:22 +02:00
Ines Montani
223bde5cf6 Improve docs on matcher attributes [ci skip] (closes #4063) 2019-08-06 12:13:42 +02:00
Ines Montani
2bfae0b167 Auto-format 2019-08-06 12:13:31 +02:00
Ines Montani
bd39e5e630 Add "Processing text" section [ci skip] 2019-07-25 17:38:03 +02:00
Ines Montani
a5e3d2f318 Improve section on disabling pipes [ci skip] 2019-07-25 14:25:34 +02:00
Ines Montani
02e444ec7c Add section on special tokenizer component [ci skip] 2019-07-25 14:25:03 +02:00
Ines Montani
1fa6d6ba55 Improve consistency of docs examples [ci skip] 2019-07-25 14:24:56 +02:00
Ines Montani
1167c303a0 Fix typos [ci skip] 2019-07-19 13:08:18 +02:00
Ines Montani
c3ead02ea5 Adjust wording [ci skip] 2019-07-17 16:06:25 +02:00
Ines Montani
1d5ff3e455 Add infobox 2019-07-17 15:29:36 +02:00
Ines Montani
114cb18892 Improve wording 2019-07-17 15:27:53 +02:00
Ines Montani
7522beef9e Add "Things to try" prompts 2019-07-17 15:25:02 +02:00
Ines Montani
9f02e3c027 Adjust example
Not actually supported in this alignment interpretation
2019-07-17 15:13:50 +02:00
Ines Montani
1ea472468a Add usage docs for aligning tokenization 2019-07-17 15:08:33 +02:00
pmbaumgartner
9a86d95ea2 fix custom attribute links 2019-07-14 20:23:54 -04:00
Ines Montani
ebe58e7fa1 Document gold.docs_to_json [ci skip] 2019-07-10 10:27:33 +02:00
Ines Montani
881f5bc401 Auto-format 2019-07-10 10:27:29 +02:00
Ines Montani
d361e380b8 Fix matcher callback example (closes #3862) 2019-06-26 14:47:26 +02:00
Alejandro Alcalde
4866a7ee9e Changed learning rate by its param name. (#3855)
* Changed learning rate by its param name.

I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful.

* Signing SCA
2019-06-20 10:29:20 +02:00
Ramanan Balakrishnan
eb12703d10 minor fix to broken link in documentation (#3819) [ci skip] 2019-06-04 11:15:35 +02:00
Ines Montani
0c74506c9c Fix typos in docs (closes #3802) [ci skip] 2019-06-01 11:35:01 +02:00
mak
89379a7fa4 Corrected example model URL in requirements.txt (#3786)
The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).
2019-05-29 10:51:55 +02:00
Aaron Kub
719a15f23d fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
张晓飞
ba1ff00370 update response after calling add_pipe (#3661)
* update response after calling add_pipe

component:print_info is appened in the last, so need show it at the end of  pipeline

* Create henry860916.md
2019-05-01 12:02:18 +02:00
Ramiro Gómez
8ee4100f8f Remove dangling M (#3657)
I assume this is a typo. Sorry if it has a meaning that I'm not aware of.
2019-04-29 19:44:43 +02:00
Amit Chaudhary
167d63af31 Fix broken link to Dive Into Python 3 website (#3656)
* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Ivan Tham
fa94f83697 Improve redundant variable name (#3643)
* Improve redundant variable name

* Apply suggestions from code review

Co-Authored-By: pickfire <pickfire@riseup.net>
2019-04-26 16:50:14 +02:00
Ines Montani
0dce4585b1 Add course to 101 2019-04-19 15:59:51 +02:00
Ines Montani
38395d9518 Merge branch 'spacy.io' 2019-04-19 15:26:20 +02:00
Ines Montani
7ac5bb0a7b Update landing and feature overview 2019-04-19 15:23:08 +02:00
fizban99
f2f2df6e78 entity types for colors should be in uppercase (#3599)
although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.
2019-04-17 11:22:56 +02:00
Ines Montani
9e7deeaf48 Remove Datacamp 2019-04-13 17:46:32 +02:00
Ines Montani
2f0f439c54 Remove non-existent example (closes #3533) 2019-04-03 09:59:17 +02:00
Ines Montani
200d8bdb3c Merge branch 'spacy.io' [ci skip] 2019-03-23 16:46:34 +01:00
Ines Montani
06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani
b532386a60 Fix typo [ci skip] 2019-03-22 18:36:17 +01:00