spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-10 19:54:17 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	6b012cebff	Make pos/tag distinction more clear in docs (#4246 ) * make distinction between tag and pos more prominent in docs * out of the 101	2019-09-06 10:31:21 +02:00
adrianeboyd	8fe7bdd0fa	Improve token pattern checking without validation (#4105 ) * Fix typo in rule-based matching docs * Improve token pattern checking without validation Add more detailed token pattern checks without full JSON pattern validation and provide more detailed error messages. Addresses #4070 (also related: #4063, #4100). * Check whether top-level attributes in patterns and attr for PhraseMatcher are in token pattern schema * Check whether attribute value types are supported in general (as opposed to per attribute with full validation) * Report various internal error types (OverflowError, AttributeError, KeyError) as ValueError with standard error messages * Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS, LEMMA, and DEP * Add error messages with relevant details on how to use validate=True or nlp() instead of nlp.make_doc() * Support attr=TEXT for PhraseMatcher * Add NORM to schema * Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler * Remove unnecessary .keys() * Rephrase error messages * Add another type check to Matcher Add another type check to Matcher for more understandable error messages in some rare cases. * Support phrase_matcher_attr=TEXT for EntityRuler * Don't use spacy.errors in examples and bin scripts * Fix error code * Auto-format Also try get Azure pipelines to finally start a build :( * Update errors.py Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2019-08-21 14:00:37 +02:00
Ines Montani	3134a9b6e0	Add section on expanding regex match to token boundaries (see #4158 ) [ci skip]	2019-08-21 12:53:31 +02:00
Ines Montani	66aba2d676	Improve regex matching docs [ci skip]	2019-08-19 13:59:41 +02:00
Sofie Van Landeghem	cc66f47893	Make enabling/disabling jupyter mode more explicit (#4144 ) * make enabling/disabling jupyter mode more explicit * markup fix	2019-08-19 11:53:34 +02:00
Ines Montani	e520eb3f6c	Make visualized NER examples more clear (closes #4104 ) [ci skip]	2019-08-18 16:29:29 +02:00
Ines Montani	1362f793cf	Improve docs on phrase pattern attributes (closes #4100 ) [ci skip]	2019-08-11 11:13:49 +02:00
Ines Montani	8b4a0fabbb	Adjust docs example [ci skip]	2019-08-07 00:46:47 +02:00
adrianeboyd	69aca7d839	Add validate option to EntityRuler (#4089 ) * Add validate option to EntityRuler * Add validate to EntityRuler, passed to Matcher and PhraseMatcher * Add validate to usage and API docs * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <ines@ines.io> * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-07 00:40:53 +02:00
Ines Montani	4ae320e5c2	Use consistent casing for entity ruler patterns (see #4063 ) [ci skip]	2019-08-06 12:20:22 +02:00
Ines Montani	223bde5cf6	Improve docs on matcher attributes [ci skip] (closes #4063 )	2019-08-06 12:13:42 +02:00
Ines Montani	2bfae0b167	Auto-format	2019-08-06 12:13:31 +02:00
Ines Montani	bd39e5e630	Add "Processing text" section [ci skip]	2019-07-25 17:38:03 +02:00
Ines Montani	a5e3d2f318	Improve section on disabling pipes [ci skip]	2019-07-25 14:25:34 +02:00
Ines Montani	02e444ec7c	Add section on special tokenizer component [ci skip]	2019-07-25 14:25:03 +02:00
Ines Montani	1fa6d6ba55	Improve consistency of docs examples [ci skip]	2019-07-25 14:24:56 +02:00
Ines Montani	1167c303a0	Fix typos [ci skip]	2019-07-19 13:08:18 +02:00
Ines Montani	c3ead02ea5	Adjust wording [ci skip]	2019-07-17 16:06:25 +02:00
Ines Montani	1d5ff3e455	Add infobox	2019-07-17 15:29:36 +02:00
Ines Montani	114cb18892	Improve wording	2019-07-17 15:27:53 +02:00
Ines Montani	7522beef9e	Add "Things to try" prompts	2019-07-17 15:25:02 +02:00
Ines Montani	9f02e3c027	Adjust example Not actually supported in this alignment interpretation	2019-07-17 15:13:50 +02:00
Ines Montani	1ea472468a	Add usage docs for aligning tokenization	2019-07-17 15:08:33 +02:00
pmbaumgartner	9a86d95ea2	fix custom attribute links	2019-07-14 20:23:54 -04:00
Ines Montani	ebe58e7fa1	Document gold.docs_to_json [ci skip]	2019-07-10 10:27:33 +02:00
Ines Montani	881f5bc401	Auto-format	2019-07-10 10:27:29 +02:00
Ines Montani	d361e380b8	Fix matcher callback example (closes #3862 )	2019-06-26 14:47:26 +02:00
Alejandro Alcalde	4866a7ee9e	Changed learning rate by its param name. (#3855 ) * Changed learning rate by its param name. I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful. * Signing SCA	2019-06-20 10:29:20 +02:00
Ramanan Balakrishnan	eb12703d10	minor fix to broken link in documentation (#3819 ) [ci skip]	2019-06-04 11:15:35 +02:00
Ines Montani	0c74506c9c	Fix typos in docs (closes #3802 ) [ci skip]	2019-06-01 11:35:01 +02:00
mak	89379a7fa4	Corrected example model URL in requirements.txt (#3786 ) The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).	2019-05-29 10:51:55 +02:00
Aaron Kub	719a15f23d	fixing regex matcher examples (#3708 ) (#3719 )	2019-05-10 14:23:52 +02:00
张晓飞	ba1ff00370	update response after calling add_pipe (#3661 ) * update response after calling add_pipe component:print_info is appened in the last, so need show it at the end of pipeline * Create henry860916.md	2019-05-01 12:02:18 +02:00
Ramiro Gómez	8ee4100f8f	Remove dangling M (#3657 ) I assume this is a typo. Sorry if it has a meaning that I'm not aware of.	2019-04-29 19:44:43 +02:00
Amit Chaudhary	167d63af31	Fix broken link to Dive Into Python 3 website (#3656 ) * Fix broken link to Dive Into Python 3 website * Sign spaCy Contributor Agreement	2019-04-29 19:44:00 +02:00
Ivan Tham	fa94f83697	Improve redundant variable name (#3643 ) * Improve redundant variable name * Apply suggestions from code review Co-Authored-By: pickfire <pickfire@riseup.net>	2019-04-26 16:50:14 +02:00
Ines Montani	0dce4585b1	Add course to 101	2019-04-19 15:59:51 +02:00
Ines Montani	38395d9518	Merge branch 'spacy.io'	2019-04-19 15:26:20 +02:00
Ines Montani	7ac5bb0a7b	Update landing and feature overview	2019-04-19 15:23:08 +02:00
fizban99	f2f2df6e78	entity types for colors should be in uppercase (#3599 ) although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.	2019-04-17 11:22:56 +02:00
Ines Montani	9e7deeaf48	Remove Datacamp	2019-04-13 17:46:32 +02:00
Ines Montani	2f0f439c54	Remove non-existent example (closes #3533 )	2019-04-03 09:59:17 +02:00
Ines Montani	200d8bdb3c	Merge branch 'spacy.io' [ci skip]	2019-03-23 16:46:34 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00
Ines Montani	b532386a60	Fix typo [ci skip]	2019-03-22 18:36:17 +01:00
Ines Montani	5073ce63fd	Merge branch 'spacy.io' [ci skip]	2019-03-22 15:17:11 +01:00
Ines Montani	0712efc6b3	Update version requirements [ci skip]	2019-03-21 10:23:54 +01:00
Ines Montani	d4eed4a84f	Add note on unicode build to troubleshooting guide (see #3421 ) [ci skip]	2019-03-19 10:27:02 +01:00
Ines Montani	a611b32fbf	Update model docs [ci skip]	2019-03-17 11:48:18 +01:00
Ines Montani	cbcba699dd	Fix missing ids	2019-03-14 17:56:53 +01:00
Ines Montani	4cfe4aa224	Fix small issues in the docs [ci skip]	2019-03-12 22:57:15 +01:00
Ines Montani	ba7eb2d131	Update section [ci skip]	2019-03-12 16:18:34 +01:00
Ines Montani	cecc31b765	Don't auto-slugify accordion links [ci skip]	2019-03-12 15:30:49 +01:00
Ines Montani	72fb324d95	Add vector training script to bin [ci skip]	2019-03-12 12:07:56 +01:00
Ines Montani	3abf0e6b9f	Replace dev-resources links with real examples	2019-03-12 12:07:40 +01:00
Ines Montani	59c0620487	Auto-format	2019-03-12 12:07:11 +01:00
Ines Montani	7c05ca01e8	💫 Support mutable default values for extension attributes (#3389 ) * Support mutable default values in extensions * Update documentation	2019-03-11 12:50:44 +01:00
Ines Montani	8dbf1e9037	Also fix #3387 on develop	2019-03-10 23:36:28 +01:00
Ines Montani	9a8f169e5c	Update v2-1.md	2019-03-10 18:58:51 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Ines Montani	48a206a95f	Fix displaCy visualizations in docs (closes #3357 ) [ci skip]	2019-03-06 13:20:44 +01:00
Ines Montani	c478a2ccb6	Update backwards incompat [ci skip]	2019-02-27 11:56:56 +01:00
Ines Montani	1b6238101a	Add table explaining training metrics [closes #2644 ]	2019-02-25 10:03:43 +01:00
Ines Montani	62b558ab72	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 ) * Fix formatting and whitespace * Add support for lexical attributes (closes #2390) * Document lexical attribute setting during retokenization * Assign variable oputside of nested loop	2019-02-24 21:13:51 +01:00
Ines Montani	aa52305461	Improve pipeline model and meta example [ci skip]	2019-02-24 18:45:39 +01:00
Ines Montani	df19e2bff6	💫 Allow setting of custom attributes during retokenization (closes #3314 ) (#3324 ) <!--- Provide a general summary of your changes in the title. --> ## Description This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter and a setter implemented. ```python Token.set_extension('is_musician', default=False) doc = nlp("I like David Bowie.") with doc.retokenize() as retokenizer: attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}} retokenizer.merge(doc[2:4], attrs=attrs) assert doc[2].text == "David Bowie" assert doc[2].lemma_ == "David Bowie" assert doc[2]._.is_musician ``` ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-24 18:38:47 +01:00
Ines Montani	403b9cd58b	Add docs on adding to existing tokenizer rules [ci skip]	2019-02-24 18:35:19 +01:00
Ines Montani	383e2e1f12	Update Python versions [ci skip]	2019-02-24 11:49:45 +01:00
Ines Montani	b624cb4b89	Update v2-1.md	2019-02-24 11:49:27 +01:00
Ines Montani	0fc908d7a5	Add note on merging speed in v2.1 (see #3300 ) [ci skip]	2019-02-21 12:34:18 +01:00
Ines Montani	236aa94ded	Update v2-1.md	2019-02-21 12:33:56 +01:00
Sofie	9a478b6db8	Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 ) * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib	2019-02-20 22:10:13 +01:00
Ines Montani	57ae71ea95	Add docs on serializing the pipeline (see #3289 ) [ci skip]	2019-02-18 14:13:29 +01:00
Ines Montani	38e4422c0d	Improve matcher example (resolves #3287 )	2019-02-18 13:26:37 +01:00
Ines Montani	660cfe44c5	Fix formatting	2019-02-18 13:26:22 +01:00
Ines Montani	212ff359ef	Fix links [ci skip]	2019-02-17 22:25:50 +01:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
ines	3f4fd2c5d5	Update usage documentation	2017-10-03 14:26:20 +02:00
Reza Gharibi	0461b82158	Fix typos	2017-09-27 03:56:20 +03:30
Reza Gharibi	fa1844b132	Fix typo	2017-09-27 03:55:54 +03:30
Reza Gharibi	b5dd7e7cc4	Fix typo	2017-09-27 03:55:28 +03:30
Ines Montani	b8e81daccf	Fix typo (closes #1312 )	2017-09-14 12:49:59 +02:00
ines	d15775c3ad	Fix typos and commands in alpha docs	2017-08-21 13:40:11 +02:00
ines	3c33003078	Port over typo corrections from #1245	2017-08-20 12:00:17 +02:00
ines	a29f132ffd	Change python -m spacy to spacy Reflects latest change to entry point or auto-alias	2017-08-14 13:04:48 +02:00
Nikolai Kruglikov	08e443e083	Fix small typo in documentation	2017-08-14 12:19:04 +02:00
ines	ab8ffbaab7	Add text classification to v2 overview	2017-07-22 17:56:51 +02:00
ines	0fb89dd204	Add text classification usage guide template	2017-07-22 17:56:07 +02:00
ines	d05ab1b3a0	Add text classification to 101 overview and change order	2017-07-22 17:55:53 +02:00
Jarle Mathiesen	f20533ec0c	fix small typo	2017-06-24 12:31:33 +02:00
Savva Kolbachev	800a8faff4	Changed the capital of Lithuania to Vilnius Hi, There is a typo about the capital of Lithuania. Vilnius is the capital of Lithuania https://en.wikipedia.org/wiki/Vilnius Ljubljana is the capital of Slovenia https://en.wikipedia.org/wiki/Ljubljana	2017-06-12 23:27:00 +03:00
Ines Montani	57f64b9e1c	Merge pull request #1124 from v3t3a/patch-3 docs - Fix url error for Displacy Ent visualizer	2017-06-12 21:20:32 +02:00
Ines Montani	b2a28028cf	Merge pull request #1115 from v3t3a/patch-2 docs - Add read() method when opening file (Lightning tour)	2017-06-12 21:19:25 +02:00
Vetea	eae1f7b19c	Fix url error for Displacy Ent visualizer	2017-06-12 14:30:02 +02:00
ines	49026a1346	Fix typos in example (see #1105 )	2017-06-08 19:15:50 +02:00
Vetea	cc3aee1189	Add read() method when opening file Add read() method for to avoid : ```TypeError: Argument 'string' has incorrect type (expected str, got _io.TextIOWrapper)``` Test with: spaCy : v2.0.0 Alpha python : 3.5.2+ (default, Sep 22 2016, 12:18:14)	2017-06-08 11:27:09 +02:00
ines	6b799bac54	Fix formatting and details	2017-06-06 14:37:49 +02:00
ines	fd9ae0f0e0	Update v2 comparison table	2017-06-05 16:39:11 +02:00
ines	a3f9745a14	Update similarity usage guide and examples	2017-06-05 15:37:33 +02:00
ines	fd35d910b8	Update v2 docs and benchmarks	2017-06-05 14:13:38 +02:00
ines	040553ca59	Update architecture and features table	2017-06-05 13:33:01 +02:00
ines	505d43b832	Update norms example	2017-06-04 23:33:26 +02:00
ines	f8e93b6d0a	Update norms example	2017-06-04 23:24:29 +02:00
ines	a857b2b511	Update norms example	2017-06-04 23:21:37 +02:00
ines	47d066b293	Add under construction	2017-06-04 23:17:54 +02:00
ines	e9816daa6a	Add details on syntax iterators	2017-06-04 23:16:33 +02:00
ines	990cb81556	Add info on syntax iterators	2017-06-04 21:47:22 +02:00
ines	e4eb33daf7	Add links to production use guide	2017-06-04 20:56:58 +02:00
ines	63cd539d04	Add more details on model packages and requirements.txt (see #1099 )	2017-06-04 20:52:10 +02:00
ines	97ff83d163	Fix docs on model loading	2017-06-04 20:44:59 +02:00
ines	b6002db797	Add v2 label	2017-06-04 18:53:03 +02:00
ines	468ff1a7dd	Update v2 docs and add benchmarks stub	2017-06-04 15:34:28 +02:00
Matthew Honnibal	23fd6b1782	Add intro narrative for v2	2017-06-04 15:10:37 +02:00
ines	3419ecbfdd	Update docs on model shortcut links	2017-06-04 13:55:00 +02:00
ines	586e901143	Add v2 intro stub	2017-06-04 13:42:37 +02:00
ines	4f8f62d9b3	Merge branch 'v2-docs-edits' into develop	2017-06-04 13:40:58 +02:00
ines	809903dcad	Fix link and update wording	2017-06-04 13:29:20 +02:00
ines	22dd18c364	Remove redundant CPU commands	2017-06-04 13:29:13 +02:00
ines	1d6377218a	Update architecture blurb and move other info	2017-06-04 13:28:58 +02:00
ines	7a66c9f039	Fix formatting	2017-06-04 13:14:00 +02:00
Matthew Honnibal	f2c4a9f690	Edits to spacy-101 page	2017-06-04 13:10:27 +02:00
Matthew Honnibal	aca53b95e1	Link architecture blurb	2017-06-04 13:10:06 +02:00
Matthew Honnibal	64ca5123bb	Add Architecture 101 blurb	2017-06-04 13:09:19 +02:00
Matthew Honnibal	e77ed953f4	Update GPU instructions	2017-06-04 12:03:22 +02:00
ines	1d3b012e56	Update adding languages docs and add 101	2017-06-03 23:54:23 +02:00
ines	a3715a81d5	Update adding languages guide	2017-06-03 22:16:38 +02:00
ines	ec6d2bc81d	Add table of contents mixin	2017-06-03 22:16:26 +02:00
ines	9acf8686f7	Update note on compact mode issues	2017-06-03 13:31:16 +02:00
ines	c60431357d	Port over docs typo corrections	2017-06-03 11:31:30 +02:00
ines	c6dc2fafc0	Add Spanish and move example sentences to meta	2017-06-01 17:49:56 +02:00
ines	b577ed79ee	Move social image logic out to function and move files	2017-06-01 14:27:44 +02:00
ines	5e60b09dcd	Fix custom tokenizer example	2017-06-01 13:02:50 +02:00
ines	8274dffad6	Update NER training draft	2017-06-01 12:51:36 +02:00
ines	04fac3f52a	Add NER training example code	2017-06-01 12:47:47 +02:00
ines	7f5e7e7320	Fix typo	2017-06-01 12:47:36 +02:00
ines	4a927154d8	Update v2 docs	2017-06-01 11:56:32 +02:00
ines	03bbb96db8	Remove outdated examples	2017-06-01 11:56:02 +02:00
ines	789e69b73f	Update training guide	2017-06-01 11:53:23 +02:00
ines	2f40d6e7e7	Add training 101	2017-06-01 11:53:16 +02:00
ines	abed463bbb	Update serialization 101	2017-06-01 11:52:58 +02:00
ines	72380c952a	Update training section in NER guide and add links	2017-06-01 11:52:49 +02:00
ines	22b1f72870	Add spaCy 101 intro	2017-05-31 12:44:09 +02:00
ines	a18b95ca12	Update docs on testing	2017-05-31 12:43:40 +02:00
ines	981196c181	Fix typo	2017-05-31 11:34:31 +02:00
ines	f86289566a	Update new in v2 section and add note on Matcher acceptors	2017-05-30 13:53:06 +02:00
ines	ce4e45d0bb	Update 101 intro	2017-05-29 22:15:06 +02:00
ines	687ed28340	Update processing pipelines guide	2017-05-29 14:21:00 +02:00
ines	d5992f408f	Update note on vocab consistency	2017-05-29 14:14:26 +02:00
ines	a2134951f2	Update 101 and add note on pipeline order and tensors	2017-05-29 11:45:32 +02:00
ines	17b635eaab	Update alpha docs note and fix typo	2017-05-29 11:09:24 +02:00

1 2 3 4 5 ...

473 Commits