spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-01 23:36:02 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	98acf5ffe4	💫 Allow passing of config parameters to specific pipeline components (#3386 ) * Add component_cfg kwarg to begin_training * Document component_cfg arg to begin_training * Update docs and auto-format * Support component_cfg across Language * Format * Update docs and docstrings [ci skip] * Fix begin_training	2019-03-10 23:36:47 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Ines Montani	0426689db8	💫 Improve Doc.to_json and add Doc.is_nered (#3381 ) * Use default return instead of else * Add Doc.is_nered to indicate if entities have been set * Add properties in Doc.to_json if they were set, not if they're available This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.	2019-03-10 15:24:34 +01:00
Ines Montani	76764fcf59	💫 Improve converters and training data file formats (#3374 ) * Populate converter argument info automatically * Add conversion option for msgpack * Update docs * Allow reading training data from JSONL	2019-03-08 23:15:23 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Ines Montani	fa7314b221	Clarify train_path and dev_path format (see #3366 ) [ci skip]	2019-03-07 12:23:27 +01:00
Ines Montani	e9babd9973	Update hyperparameters section (see #3352 )	2019-03-06 14:40:30 +01:00
Ines Montani	5eadf61327	Update pretraining docs on file format (closes #3354 )	2019-03-04 16:30:13 +00:00
Ines Montani	1d4ba7678f	Auto-format [ci skip]	2019-02-27 12:07:35 +01:00
Matthew Honnibal	f1d77eb140	💫 Improve handling of missing NER tags (closes #2603 ) (#3341 ) * Improve handling of missing NER tags GoldParse can accept missing NER tags, if entities is provided in BILUO format (rather than as spans). Missing tags can be provided as None values. Fix bug that occurred when first tag was a None value. Closes #2603. * Document specification of missing NER tags.	2019-02-27 12:06:32 +01:00
Matthew Honnibal	4a3371acd5	Make doc[0].is_sent_start == True (closes #2869 ) (#3340 ) * Make doc[0] have sent_start True. Closes #2869 * Document that doc[0].is_sent_start defaults True.	2019-02-27 11:17:17 +01:00
Ines Montani	d0b3af9222	Fix remaining inaccuracies in API docs (closes #2329 )	2019-02-24 22:21:25 +01:00
Ines Montani	62b558ab72	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 ) * Fix formatting and whitespace * Add support for lexical attributes (closes #2390) * Document lexical attribute setting during retokenization * Assign variable oputside of nested loop	2019-02-24 21:13:51 +01:00
Ines Montani	df19e2bff6	💫 Allow setting of custom attributes during retokenization (closes #3314 ) (#3324 ) <!--- Provide a general summary of your changes in the title. --> ## Description This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter and a setter implemented. ```python Token.set_extension('is_musician', default=False) doc = nlp("I like David Bowie.") with doc.retokenize() as retokenizer: attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}} retokenizer.merge(doc[2:4], attrs=attrs) assert doc[2].text == "David Bowie" assert doc[2].lemma_ == "David Bowie" assert doc[2]._.is_musician ``` ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-24 18:38:47 +01:00
Ines Montani	1ea1bc98e7	Document regex utilities [ci skip]	2019-02-24 18:34:10 +01:00
Ines Montani	46ec5cdccc	Update TextCategorizer docs	2019-02-24 13:11:57 +01:00
Ines Montani	c03cb1cc63	Improve built-in component API docs	2019-02-24 13:11:49 +01:00
Ines Montani	250e88ef55	Fix docs example (see #2728 )	2019-02-21 14:22:06 +01:00
Ines Montani	04b4df0ec9	Remove n_threads	2019-02-17 22:25:42 +01:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
ines	808f7ee417	Update API documentation	2017-10-03 14:27:22 +02:00
ines	d15775c3ad	Fix typos and commands in alpha docs	2017-08-21 13:40:11 +02:00
ines	3c33003078	Port over typo corrections from #1245	2017-08-20 12:00:17 +02:00
ines	1261b01e46	Update Doc.char_span docs	2017-08-19 16:34:32 +02:00
ines	5cb0200e63	Document new Span.to_array() method	2017-08-19 12:45:28 +02:00
ines	471eed4126	Add example to Span.merge()	2017-08-19 12:45:16 +02:00
ines	404d3067b8	Document new Doc.char_span() method	2017-08-19 12:45:00 +02:00
ines	d53cbf369f	Document as_tuples kwarg on Language.pipe()	2017-08-19 12:44:50 +02:00
ines	6a37c93311	Update argument type	2017-08-19 12:44:33 +02:00
ines	4731d50220	Add break utility for long nowrap items (e.g. code)	2017-08-19 12:44:23 +02:00
ines	0aba11b64b	Update package command docs	2017-08-14 16:45:44 +02:00
ines	a29f132ffd	Change python -m spacy to spacy Reflects latest change to entry point or auto-alias	2017-08-14 13:04:48 +02:00
ines	f085b88f9d	Add TextCategorizer API docs stub	2017-07-22 17:56:33 +02:00
ines	ab1a4e8b3c	Add Tensorizer API docs stub	2017-07-22 17:56:25 +02:00
ines	d2a7e5b8e5	Add GoldParse.cats attribute	2017-07-22 17:55:35 +02:00
ines	23d976ed00	Add Doc.cats attribute and missing v2 tag	2017-07-22 17:55:14 +02:00
Ines Montani	1ddbeddca2	Fix typo	2017-07-22 15:00:58 +02:00
Vetea	8e20cf6368	Update doc.jade Just remove a duplicate 'doc ='	2017-06-08 10:35:58 +02:00
ines	9f55c0d4f6	Add Vectors class	2017-06-05 13:33:11 +02:00
ines	e204788c30	Add docs for util.load_model_from_path	2017-06-05 13:18:22 +02:00
ines	efc37ea3de	Update train CLI	2017-06-04 23:45:14 +02:00
ines	3419ecbfdd	Update docs on model shortcut links	2017-06-04 13:55:00 +02:00
ines	b0225183c2	Update displaCy defaults	2017-06-03 13:27:06 +02:00
ines	c60431357d	Port over docs typo corrections	2017-06-03 11:31:30 +02:00
ines	1bebc6392c	Add source files to pipeline components	2017-06-01 17:38:06 +02:00
ines	706cec6d58	Move annotation specs up	2017-06-01 13:02:43 +02:00
ines	77dca25c7f	Update Language API docs	2017-06-01 11:51:31 +02:00
ines	f86289566a	Update new in v2 section and add note on Matcher acceptors	2017-05-30 13:53:06 +02:00
ines	b5bfab8699	Add description	2017-05-29 15:27:16 +02:00
ines	567485a818	Fix and document model loading with pipeline and overrides	2017-05-29 14:10:10 +02:00
ines	00b2094dc3	Fix typos, long integers and tests	2017-05-29 01:09:52 +02:00
ines	606879b217	Update hash strings examples	2017-05-28 19:42:44 +02:00
ines	c7b57ea314	Update docs and change integer IDs to hash values	2017-05-28 19:25:34 +02:00
ines	0ea31d1e31	Add under construction note to pipeline components	2017-05-28 18:44:07 +02:00
ines	414193e9ba	Update docs to reflect StringStore changes	2017-05-28 18:19:11 +02:00
ines	69bda9aed7	Update text, examples, typos, wording and formatting	2017-05-28 16:41:01 +02:00
ines	eb5a8be9ad	Update language overview and add section on 'xx' lang class	2017-05-28 01:15:44 +02:00
ines	eb703f7656	Update API docs	2017-05-28 00:32:43 +02:00
ines	c1983621fb	Update util functions for model loading	2017-05-28 00:22:40 +02:00
ines	70afcfec3e	Update defaults and example	2017-05-26 14:04:31 +02:00
ines	1b982f0838	Update train command and add docs on hyperparameters	2017-05-26 14:02:38 +02:00
ines	1b9c6ded71	Update API docs and add "source" button to GH source	2017-05-26 13:40:32 +02:00
ines	d48530835a	Update API docs and fix typos	2017-05-26 12:43:16 +02:00
ines	ea9474f71c	Add version tag mixin to label new features	2017-05-26 12:42:36 +02:00
ines	353f0ef8d7	Use disable argument (list) for serialization	2017-05-26 12:33:54 +02:00
ines	0f48fb1f97	Rename processing text to production use and remove linear feature scheme	2017-05-25 00:10:33 +02:00
ines	8b86b08bed	Update usage workflows	2017-05-24 11:59:08 +02:00
ines	66088851dc	Add Doc.to_disk() and Doc.from_disk() methods	2017-05-24 11:58:17 +02:00
ines	10afb3c796	Tidy up and merge usage pages	2017-05-24 00:37:47 +02:00
ines	697d3d7cb3	Fix links to CLI docs	2017-05-24 00:36:38 +02:00
ines	a38393e2f6	Update annotation docs	2017-05-23 23:16:17 +02:00
ines	786af87ffb	Update IOB docs	2017-05-23 23:15:50 +02:00
ines	c8bde2161c	Add kwargs to spacy.load	2017-05-23 23:14:02 +02:00
ines	0a8a2d2f6d	Remove tip infoboxes from annotation docs	2017-05-23 23:13:51 +02:00
ines	e6acd3bbf2	Fix matcher tests and matcher docs	2017-05-23 11:36:02 +02:00
ines	f497cf60b2	Update formatting	2017-05-23 11:32:25 +02:00
ines	a23f487b06	Tidy up displaCy and add "manual" option Also don't require title in EntityRenderer	2017-05-22 18:48:20 +02:00
ines	dddad5bf26	Update util.prints docs	2017-05-22 13:54:52 +02:00
ines	d5a6a9a6a9	Use string values for attrs in Matcher docs	2017-05-22 13:54:45 +02:00
ines	54f04a9fe0	Update API docs with changes in spacy.gold and spacy.language	2017-05-22 12:29:30 +02:00
ines	fc3ec733ea	Reduce complexity in CLI Remove now redundant model command and move plac annotations to cli files	2017-05-22 12:28:58 +02:00
ines	2c5cfe8bbf	Update docstrings and API docs for StringStore	2017-05-21 14:18:58 +02:00
ines	251346b59f	Fix typos and formatting	2017-05-21 14:18:46 +02:00
ines	075f5ff87a	Update docstrings and API docs for GoldParse	2017-05-21 13:53:46 +02:00
ines	465a1dd710	Add BILUO scheme to annotation docs	2017-05-21 13:53:34 +02:00
ines	c9f04f3cd0	Add note on automated processes to download command	2017-05-21 13:23:39 +02:00
ines	8ab59515b2	Fix typo and use consistent description for from_bytes	2017-05-21 13:18:39 +02:00
ines	c5a653fa48	Update docstrings and API docs for Tokenizer	2017-05-21 13:18:14 +02:00
ines	d82ae9a585	Change "function" to "callable" in docs	2017-05-21 13:17:40 +02:00
ines	ee3fdffffb	Move attributes and remove deprecated methods	2017-05-21 01:18:31 +02:00
ines	1cb2c86f9a	Update CLI docs	2017-05-21 01:13:05 +02:00
ines	272a8981c3	Add model tag to spacy.load API docs	2017-05-21 01:12:43 +02:00
ines	3871157d84	Update spacy.util documentation	2017-05-21 01:12:09 +02:00
ines	da12aee0c1	Update spacy.load with note on get_lang_class	2017-05-21 00:19:26 +02:00
ines	27de0834b2	Update docstrings and API docs for Lexeme	2017-05-20 15:13:42 +02:00
ines	7ed8a92ed1	Update docstrings and API docs for Token	2017-05-20 15:13:33 +02:00
ines	4ed6a36622	Update docstrings and API docs for Matcher	2017-05-20 14:43:10 +02:00
ines	39f36539f6	Update docstrings and API docs for Matcher	2017-05-20 14:32:34 +02:00
ines	c00ff257be	Update docstrings and API docs for Matcher	2017-05-20 14:26:10 +02:00
ines	463e3cc80f	Remove resize_vectors and vectors_length	2017-05-20 14:02:14 +02:00
ines	f0cc642bb9	Update docstrings and API docs for Vocab	2017-05-20 14:00:41 +02:00
Matthew Honnibal	a93276bb78	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-05-20 13:55:12 +02:00
Matthew Honnibal	ce9234f593	Update Matcher API	2017-05-20 13:54:53 +02:00
ines	8b14476253	Fix typo	2017-05-20 13:00:13 +02:00
ines	6557ff9e85	Update example	2017-05-20 13:00:07 +02:00
ines	fea4925f41	Reorganise API docs navigation	2017-05-20 12:59:57 +02:00
ines	b2678372c7	Add API docs for top-level spaCy functions i.e. spacy.load(), spacy.info(), spacy.explain()	2017-05-20 12:59:44 +02:00
ines	797f10ab16	Update formatting	2017-05-20 12:59:16 +02:00
ines	e10c48210d	Update Matcher API and workflow to reflect new API on_match is now the second positional argument, to easily allow a variable number of patterns while keeping the method clean and readable.	2017-05-20 12:59:03 +02:00
ines	eb521af267	Fix formatting	2017-05-20 12:58:15 +02:00
ines	7973912114	Update CLI docs	2017-05-20 12:58:05 +02:00
ines	5163a4513e	Update API docs	2017-05-20 01:43:48 +02:00
ines	e3256e7406	Update Matcher API docs	2017-05-20 01:38:34 +02:00
ines	0cabf9e13f	Fix model tag	2017-05-20 01:38:14 +02:00
ines	fe5d8819ea	Update Matcher docstrings and API docs	2017-05-19 21:47:06 +02:00
ines	c8580da686	Update "requires model" tags	2017-05-19 20:24:46 +02:00
ines	c3e903e4c2	Update examples and API docs	2017-05-19 19:59:02 +02:00
ines	e9e62b01b0	Update docstrings and API docs for Token	2017-05-19 18:47:56 +02:00
ines	62ceec4fc6	Update docstrings and API docs for Span	2017-05-19 18:47:46 +02:00
ines	23f9a3ccc8	Update docstrings and API docs for Doc	2017-05-19 18:47:39 +02:00
ines	2c8c9dc0c9	Update docstrings and API docs for Language	2017-05-19 18:47:24 +02:00
ines	0791f0aae6	Update docstrings and API docs for Span class	2017-05-19 00:31:31 +02:00
ines	5b68579eb8	Use returns/yields instead of return/yield	2017-05-19 00:02:34 +02:00
ines	b687ad109d	Update docstrings and API docs for Doc class	2017-05-18 23:59:44 +02:00
ines	d42bc16868	Update docstrings and API docs for Language class	2017-05-18 23:57:38 +02:00
ines	b87066ff10	Update docstrings and API docs for Doc class	2017-05-18 22:17:41 +02:00
ines	476b8209fe	Update docs with new Jupyter auto-detection	2017-05-18 14:58:17 +02:00
ines	02a4841e7b	Move CLI docs to API reference	2017-05-17 12:04:03 +02:00
ines	d7244ae72d	Add docs on collapse_punct option	2017-05-15 13:51:33 +02:00
ines	c33bdeb564	Use uppercase for entity types	2017-05-15 01:24:57 +02:00
ines	cf7e5ed534	Use American spelling for "visualizers" Kinda sucks because we normally use British spelling, but it just looks weird and confusing otherwise... same with tokenizer and all other library internals. So this is sort of the "official policy" for now.	2017-05-14 23:29:36 +02:00
ines	fe5a5086e1	Fix typo	2017-05-14 23:27:56 +02:00
ines	1ae07da18f	Add API docs for spacy.displacy (see #1058 )	2017-05-14 19:31:23 +02:00
ines	b462076d80	Merge load_lang_class and get_lang_class	2017-05-14 01:31:10 +02:00
ines	1465c6c221	Add API docs for util functions	2017-05-13 21:23:12 +02:00
ines	19879cb693	Update alpha support docs	2017-05-12 15:57:49 +02:00
ines	63d79947c8	Update title in navigation	2017-05-12 15:40:43 +02:00
ines	531ee1373b	Rename "Language models" to "Languages" in API	2017-05-12 15:38:56 +02:00
ines	fac3566aac	Add descriptions to POS tagging scheme	2017-05-03 20:11:02 +02:00
ines	1570b83ee5	Add spacy.explain() note to NER annotation scheme	2017-05-03 20:11:02 +02:00
ines	219369bb7d	Add detailed docs for dependency label annotations	2017-05-03 20:11:02 +02:00
ines	f9384b0fbd	Update alpha languages and add aside for tokenizer dependencies	2017-05-03 09:58:31 +02:00
Yasuaki Uechi	0e7a9b9fac	Add Japanese to 'Alpha support’ section	2017-05-03 13:56:45 +09:00
ines	034ec5710b	Fix typo and add Norwegian to alpha languages	2017-04-27 11:24:21 +02:00
ines	375edf0bb5	Add list of models and include French	2017-04-26 20:50:27 +02:00
ines	ddd5194088	Update Language docs and docstrings	2017-04-17 01:52:13 +02:00
ines	aad80a291f	Add save_to_directory method to API docs	2017-04-17 01:40:34 +02:00
ines	13df2d6a60	Add documentation for spaCy's JSON format	2017-03-26 15:56:15 +02:00
ines	a5fc5fb0db	Add Hebrew to list of alpha languages	2017-03-25 10:22:46 +01:00
ines	9600cd1b9e	Fix download commands	2017-03-25 10:22:05 +01:00

1 2 3 4 5 ...

275 Commits