spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-03-03 19:08:06 +03:00

Author	SHA1	Message	Date
Ines Montani	25b2b3ff45	Remove LEMMA from exception examples [ci skip]	2019-09-12 16:26:27 +02:00
Ines Montani	82c16b7943	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
Ines Montani	a31e9e1cd5	Update training docs [ci skip]	2019-09-12 15:32:39 +02:00
Ines Montani	b544dcb3c5	Document debug-data [ci skip]	2019-09-12 15:26:20 +02:00
Ines Montani	c0a4cab178	Update "Adding languages" docs [ci skip]	2019-09-12 14:53:06 +02:00
Ines Montani	10257f3131	Document Lookups [ci skip]	2019-09-12 14:00:14 +02:00
Ines Montani	aa4ff0baa1	Auto-format [ci skip]	2019-09-12 13:05:53 +02:00
Ines Montani	625ce2db8e	Update Language docs [ci skip]	2019-09-12 13:03:38 +02:00
Ines Montani	cb41a33d14	Update displaCy API docs [ci skip]	2019-09-12 12:59:20 +02:00
Ines Montani	e7c20ad1d2	Update colors entry points docs [ci skip]	2019-09-12 12:59:10 +02:00
Ines Montani	7b59a919e6	Update entry points docs [ci skip]	2019-09-12 12:52:06 +02:00
Sofie Van Landeghem	0b4b4f1819	Documentation for Entity Linking (#4065 ) * document token ent_kb_id * document span kb_id * update pipeline documentation * prior and context weights as bool's instead * entitylinker api documentation * drop for both models * finish entitylinker documentation * small fixes * documentation for KB * candidate documentation * links to api pages in code * small fix * frequency examples as counts for consistency * consistent documentation about tensors returned by predict * add entity linking to usage 101 * add entity linking infobox and KB section to 101 * entity-linking in linguistic features * small typo corrections * training example and docs for entity_linker * predefined nlp and kb * revert back to similarity encodings for simplicity (for now) * set prior probabilities to 0 when excluded * code clean up * bugfix: deleting kb ID from tokens when entities were removed * refactor train el example to use either model or vocab * pretrain_kb example for example kb generation * add to training docs for KB + EL example scripts * small fixes * error numbering * ensure the language of vocab and nlp stay consistent across serialization * equality with = * avoid conflict in errors file * add error 151 * final adjustements to the train scripts - consistency * update of goldparse documentation * small corrections * push commit * typo fix * add candidate API to kb documentation * update API sidebar with EntityLinker and KnowledgeBase * remove EL from 101 docs * remove entity linker from 101 pipelines / rephrase * custom el model instead of existing model * set version to 2.2 for EL functionality * update documentation for 2 CLI scripts	2019-09-12 11:38:34 +02:00
Sofie Van Landeghem	53a9ca45c9	Docs: bufsize instead of buffsize (#4247 )	2019-09-06 11:11:54 +02:00
Sofie Van Landeghem	6b012cebff	Make pos/tag distinction more clear in docs (#4246 ) * make distinction between tag and pos more prominent in docs * out of the 101	2019-09-06 10:31:21 +02:00
adrianeboyd	82159b5c19	Updates/bugfixes for NER/IOB converters (#4186 ) * Updates/bugfixes for NER/IOB converters * Converter formats `ner` and `iob` use autodetect to choose a converter if possible * `iob2json` is reverted to handle sentence-per-line data like `word1\|pos1\|ent1 word2\|pos2\|ent2` * Fix bug in `merge_sentences()` so the second sentence in each batch isn't skipped * `conll_ner2json` is made more general so it can handle more formats with whitespace-separated columns * Supports all formats where the first column is the token and the final column is the IOB tag; if present, the second column is the POS tag * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O` separates documents * Add option for segmenting sentences (new flag `-s`) * Parser-based sentence segmentation with a provided model, otherwise with sentencizer (new option `-b` to specify model) * Can group sentences into documents with `n_sents` as long as sentence segmentation is available * Only applies automatic segmentation when there are no existing delimiters in the data * Provide info about settings applied during conversion with warnings and suggestions if settings conflict or might not be not optimal. * Add tests for common formats * Add '(default)' back to docs for -c auto * Add document count back to output * Revert changes to converter output message * Use explicit tabs in convert CLI test data * Adjust/add messages for n_sents=1 default * Add sample NER data to training examples * Update README * Add links in docs to example NER data * Define msg within converters	2019-08-29 12:04:01 +02:00
Björn Böing	bae0455f91	Fix visualizer options linking for displaCy. (#4202 )	2019-08-27 14:04:28 +02:00
Christos Aridas	61f5c007a0	DOC Fix pipeline functions examples (#4189 )	2019-08-23 19:15:32 +02:00
adrianeboyd	8fe7bdd0fa	Improve token pattern checking without validation (#4105 ) * Fix typo in rule-based matching docs * Improve token pattern checking without validation Add more detailed token pattern checks without full JSON pattern validation and provide more detailed error messages. Addresses #4070 (also related: #4063, #4100). * Check whether top-level attributes in patterns and attr for PhraseMatcher are in token pattern schema * Check whether attribute value types are supported in general (as opposed to per attribute with full validation) * Report various internal error types (OverflowError, AttributeError, KeyError) as ValueError with standard error messages * Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS, LEMMA, and DEP * Add error messages with relevant details on how to use validate=True or nlp() instead of nlp.make_doc() * Support attr=TEXT for PhraseMatcher * Add NORM to schema * Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler * Remove unnecessary .keys() * Rephrase error messages * Add another type check to Matcher Add another type check to Matcher for more understandable error messages in some rare cases. * Support phrase_matcher_attr=TEXT for EntityRuler * Don't use spacy.errors in examples and bin scripts * Fix error code * Auto-format Also try get Azure pipelines to finally start a build :( * Update errors.py Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2019-08-21 14:00:37 +02:00
Ines Montani	3134a9b6e0	Add section on expanding regex match to token boundaries (see #4158 ) [ci skip]	2019-08-21 12:53:31 +02:00
Ines Montani	fe230c8776	Fix typo [ci skip]	2019-08-20 13:02:05 +02:00
Daniel Bourke	b0a28fd0de	fix PhraseMatcher link typo (#4150 ) /api/phtasematcher -> /api/phrasematcher	2019-08-20 13:01:43 +02:00
Ines Montani	ce4c3e5204	Document force flag on set_extension (closes #4148 )	2019-08-19 19:22:07 +02:00
Ines Montani	66aba2d676	Improve regex matching docs [ci skip]	2019-08-19 13:59:41 +02:00
Sofie Van Landeghem	cc66f47893	Make enabling/disabling jupyter mode more explicit (#4144 ) * make enabling/disabling jupyter mode more explicit * markup fix	2019-08-19 11:53:34 +02:00
Ines Montani	e520eb3f6c	Make visualized NER examples more clear (closes #4104 ) [ci skip]	2019-08-18 16:29:29 +02:00
Ines Montani	1362f793cf	Improve docs on phrase pattern attributes (closes #4100 ) [ci skip]	2019-08-11 11:13:49 +02:00
Ines Montani	8b4a0fabbb	Adjust docs example [ci skip]	2019-08-07 00:46:47 +02:00
adrianeboyd	69aca7d839	Add validate option to EntityRuler (#4089 ) * Add validate option to EntityRuler * Add validate to EntityRuler, passed to Matcher and PhraseMatcher * Add validate to usage and API docs * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <ines@ines.io> * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-07 00:40:53 +02:00
Ines Montani	4ae320e5c2	Use consistent casing for entity ruler patterns (see #4063 ) [ci skip]	2019-08-06 12:20:22 +02:00
Ines Montani	223bde5cf6	Improve docs on matcher attributes [ci skip] (closes #4063 )	2019-08-06 12:13:42 +02:00
Ines Montani	2bfae0b167	Auto-format	2019-08-06 12:13:31 +02:00
Ines Montani	0f76e0022d	Update .tensor docs [ci skip]	2019-08-01 18:37:09 +02:00
Björn Böing	a83c0add2e	Add links to tokenizer API docs to refer relevant information. (#4064 ) * Add links to tokenizer API docs to refer relevant information. * Add suggested changes Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-01 14:28:38 +02:00
Ejar	2cdf7d39e7	Corrected imported fucntion (#4062 ) The example showed an incorrected import	2019-08-01 12:43:36 +02:00
Ines Montani	fcd2f7f656	Fix version introducing Span.ents (closes #4045 ) [ci skip]	2019-07-30 10:32:33 +02:00
Ines Montani	fc69da0acb	💫 Support simple training format in nlp.evaluate and add tests (#4033 ) * Support simple training format in nlp.evaluate and add tests * Update docs [ci skip]	2019-07-27 17:30:18 +02:00
Ines Montani	bd39e5e630	Add "Processing text" section [ci skip]	2019-07-25 17:38:03 +02:00
Ines Montani	a5e3d2f318	Improve section on disabling pipes [ci skip]	2019-07-25 14:25:34 +02:00
Ines Montani	02e444ec7c	Add section on special tokenizer component [ci skip]	2019-07-25 14:25:03 +02:00
Ines Montani	1fa6d6ba55	Improve consistency of docs examples [ci skip]	2019-07-25 14:24:56 +02:00
adrianeboyd	784a5f4284	Update GoldParse attributes in API docs (#4023 ) * add `words` * update name of entity list to `ner` I think it might be a bit more consistent to have `ner` named `entities` or `ents` (and `ents` is actually set somewhere to `None`, which is a bit confusing), but it looks like renaming it would be a non-trivial decision.	2019-07-25 12:14:02 +02:00
Adriane Boyd	6c5044ed2a	Update annotation docs for German - minor formatting fixes - remove STTS tags not used in Tiger - update list of dependency relations to match tiger2dep	2019-07-22 11:59:03 +02:00
adrianeboyd	d2c474cbb7	Fix initial example in EntityRuler API docs (#3999 )	2019-07-22 11:18:55 +02:00
Ines Montani	1167c303a0	Fix typos [ci skip]	2019-07-19 13:08:18 +02:00
BreakBB	6d9a7c0749	Add '--silent' argument to bash example of CLI Info	2019-07-19 10:00:45 +02:00
BreakBB	c8ba0f690d	Fix --force parameter of CLI package	2019-07-19 10:00:45 +02:00
Ines Montani	a0acb1b3cd	Also add infobox to API docs [ci skip]	2019-07-17 16:26:41 +02:00
Ines Montani	c3ead02ea5	Adjust wording [ci skip]	2019-07-17 16:06:25 +02:00
Ines Montani	1d5ff3e455	Add infobox	2019-07-17 15:29:36 +02:00
Ines Montani	114cb18892	Improve wording	2019-07-17 15:27:53 +02:00

1 2 3 4 5 ...

610 Commits