spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-28 02:46:35 +03:00

Author	SHA1	Message	Date
Ines Montani	c6ee030721	Fix docsearch	2019-03-19 14:38:49 +01:00
Ines Montani	0155083e01	Update netlify.toml	2019-03-19 14:07:00 +01:00
Ines Montani	d4eed4a84f	Add note on unicode build to troubleshooting guide (see #3421 ) [ci skip]	2019-03-19 10:27:02 +01:00
Ines Montani	42d4b818e4	Redirect Netlify URL	2019-03-19 10:17:56 +01:00
Ines Montani	1ee97bc282	Add page title fallback, just in case	2019-03-18 18:58:55 +01:00
Ines Montani	728ae7651b	Fix universe page titles if no separate title is set	2019-03-18 18:58:46 +01:00
Ines Montani	a20d3772fd	FIx responsive landing	2019-03-18 16:24:52 +01:00
Ines Montani	08284f3a11	💫 v2.1.0 launch updates (only merge on launch!) (#3414 ) * Update README.md * Use production docsearch [ci skip] * Add option to exclude pages from search	2019-03-18 16:07:26 +01:00
Ines Montani	a611b32fbf	Update model docs [ci skip]	2019-03-17 11:48:18 +01:00
Matthew Honnibal	62afa64a8d	Expose batch size and length caps on CLI for pretrain (#3417 ) Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`. Also improve CLI output. Closes #3216 ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-16 21:38:45 +01:00
Ines Montani	2c5dd4d602	Update Vectors.find docs [ci skip]	2019-03-16 17:10:57 +01:00
Ines Montani	fa0f501165	Use dev DocSearch index	2019-03-15 14:48:38 +01:00
Ines Montani	8af7d01382	Fix general-purpose IDs	2019-03-15 14:48:26 +01:00
Ines Montani	cbcba699dd	Fix missing ids	2019-03-14 17:56:53 +01:00
Ines Montani	cffe63ea24	Fix :target padding for ids	2019-03-14 17:41:02 +01:00
Ines Montani	51b7b88acf	Generate active sidebar heading (h0) at compile time	2019-03-14 17:20:51 +01:00
Ines Montani	4ab1871a75	Add search-exclude classes	2019-03-14 16:51:29 +01:00
Ines Montani	59bbf85986	Add id to body	2019-03-14 16:51:18 +01:00
Ines Montani	6e07750dd8	Fix class name	2019-03-14 11:52:31 +01:00
Ines Montani	a0813b93e0	Server-side render is-active for crawler	2019-03-14 11:46:27 +01:00
Ines Montani	39ace04b55	Fix active style	2019-03-14 11:46:13 +01:00
Ines Montani	4cfe4aa224	Fix small issues in the docs [ci skip]	2019-03-12 22:57:15 +01:00
Ines Montani	ba7eb2d131	Update section [ci skip]	2019-03-12 16:18:34 +01:00
Ines Montani	cecc31b765	Don't auto-slugify accordion links [ci skip]	2019-03-12 15:30:49 +01:00
Ines Montani	d842d5698e	Tidy up website and add eslint config [ci skip]	2019-03-12 15:21:58 +01:00
Ines Montani	72fb324d95	Add vector training script to bin [ci skip]	2019-03-12 12:07:56 +01:00
Ines Montani	3abf0e6b9f	Replace dev-resources links with real examples	2019-03-12 12:07:40 +01:00
Ines Montani	59c0620487	Auto-format	2019-03-12 12:07:11 +01:00
Ines Montani	1664d1fa62	Update universe [ci skip]	2019-03-12 11:13:03 +01:00
Ines Montani	cdd418b93e	Auto-format [ci skip]	2019-03-11 17:10:50 +01:00
Matthew Honnibal	b0b990e405	Fix token.conjuncts (closes #795 ) (#3392 ) * Implement conjuncts method * Add span.conjuncts property * Un-xfail token.conjuncts tests * Update docs for token.conjuncts and span.conjuncts * Fix merge error in token.conjuncts	2019-03-11 17:05:45 +01:00
Ines Montani	25cb764e64	Document new API [ci skip]	2019-03-11 15:23:53 +01:00
Ines Montani	ebcf2bb1c3	Add Doc.lang and Doc.lang_	2019-03-11 14:21:40 +01:00
Ines Montani	7c05ca01e8	💫 Support mutable default values for extension attributes (#3389 ) * Support mutable default values in extensions * Update documentation	2019-03-11 12:50:44 +01:00
Matthew Honnibal	98acf5ffe4	💫 Allow passing of config parameters to specific pipeline components (#3386 ) * Add component_cfg kwarg to begin_training * Document component_cfg arg to begin_training * Update docs and auto-format * Support component_cfg across Language * Format * Update docs and docstrings [ci skip] * Fix begin_training	2019-03-10 23:36:47 +01:00
Ines Montani	8dbf1e9037	Also fix #3387 on develop	2019-03-10 23:36:28 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Ines Montani	9a8f169e5c	Update v2-1.md	2019-03-10 18:58:51 +01:00
Ines Montani	0426689db8	💫 Improve Doc.to_json and add Doc.is_nered (#3381 ) * Use default return instead of else * Add Doc.is_nered to indicate if entities have been set * Add properties in Doc.to_json if they were set, not if they're available This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.	2019-03-10 15:24:34 +01:00
Ines Montani	76764fcf59	💫 Improve converters and training data file formats (#3374 ) * Populate converter argument info automatically * Add conversion option for msgpack * Update docs * Allow reading training data from JSONL	2019-03-08 23:15:23 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Ines Montani	fa7314b221	Clarify train_path and dev_path format (see #3366 ) [ci skip]	2019-03-07 12:23:27 +01:00
Ines Montani	e9babd9973	Update hyperparameters section (see #3352 )	2019-03-06 14:40:30 +01:00
Ines Montani	48a206a95f	Fix displaCy visualizations in docs (closes #3357 ) [ci skip]	2019-03-06 13:20:44 +01:00
Ines Montani	5eadf61327	Update pretraining docs on file format (closes #3354 )	2019-03-04 16:30:13 +00:00
Ines Montani	1d4ba7678f	Auto-format [ci skip]	2019-02-27 12:07:35 +01:00
Matthew Honnibal	f1d77eb140	💫 Improve handling of missing NER tags (closes #2603 ) (#3341 ) * Improve handling of missing NER tags GoldParse can accept missing NER tags, if entities is provided in BILUO format (rather than as spans). Missing tags can be provided as None values. Fix bug that occurred when first tag was a None value. Closes #2603. * Document specification of missing NER tags.	2019-02-27 12:06:32 +01:00
Ines Montani	c478a2ccb6	Update backwards incompat [ci skip]	2019-02-27 11:56:56 +01:00
Ines Montani	d7217513c9	Merge branch 'spacy.io' into develop [ci skip]	2019-02-27 11:42:10 +01:00
Matthew Honnibal	4a3371acd5	Make doc[0].is_sent_start == True (closes #2869 ) (#3340 ) * Make doc[0] have sent_start True. Closes #2869 * Document that doc[0].is_sent_start defaults True.	2019-02-27 11:17:17 +01:00
Ines Montani	cb481aa1fe	Merge branch 'spacy.io' into develop [ci skip]	2019-02-26 16:51:22 +01:00
Ines Montani	2579ecbb63	Merge branch 'spacy.io' into develop [ci skip]	2019-02-25 21:41:51 +01:00
Ines Montani	3379ebcaa4	Fix default prop [ci skip]	2019-02-25 20:29:11 +01:00
Ines Montani	e711969e3b	Add more human-readable class names [ci skip]	2019-02-25 20:22:40 +01:00
Ines Montani	162bd4d75b	💫 Add Algolia DocSearch (#3332 ) * Add Algolia DocSearch * Add human-readable selector for teaser	2019-02-25 20:11:11 +01:00
Ines Montani	1b6238101a	Add table explaining training metrics [closes #2644 ]	2019-02-25 10:03:43 +01:00
Ines Montani	1981b194cc	Fix recomputing of :target [ci skip] Prevents additional history entry	2019-02-25 10:03:20 +01:00
Ines Montani	d0b3af9222	Fix remaining inaccuracies in API docs (closes #2329 )	2019-02-24 22:21:25 +01:00
Ines Montani	49d0938038	Update version [ci skip]	2019-02-24 22:01:47 +01:00
Ines Montani	62b558ab72	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 ) * Fix formatting and whitespace * Add support for lexical attributes (closes #2390) * Document lexical attribute setting during retokenization * Assign variable oputside of nested loop	2019-02-24 21:13:51 +01:00
Ines Montani	aa52305461	Improve pipeline model and meta example [ci skip]	2019-02-24 18:45:39 +01:00
Ines Montani	df19e2bff6	💫 Allow setting of custom attributes during retokenization (closes #3314 ) (#3324 ) <!--- Provide a general summary of your changes in the title. --> ## Description This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter and a setter implemented. ```python Token.set_extension('is_musician', default=False) doc = nlp("I like David Bowie.") with doc.retokenize() as retokenizer: attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}} retokenizer.merge(doc[2:4], attrs=attrs) assert doc[2].text == "David Bowie" assert doc[2].lemma_ == "David Bowie" assert doc[2]._.is_musician ``` ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-24 18:38:47 +01:00
Ines Montani	403b9cd58b	Add docs on adding to existing tokenizer rules [ci skip]	2019-02-24 18:35:19 +01:00
Ines Montani	1ea1bc98e7	Document regex utilities [ci skip]	2019-02-24 18:34:10 +01:00
Ines Montani	09bf08b3c3	Update redirects [ci skip]	2019-02-24 13:37:50 +01:00
Ines Montani	dceca3264d	Tidy up package.json [ci skip]	2019-02-24 13:37:41 +01:00
Ines Montani	46ec5cdccc	Update TextCategorizer docs	2019-02-24 13:11:57 +01:00
Ines Montani	c03cb1cc63	Improve built-in component API docs	2019-02-24 13:11:49 +01:00
Ines Montani	383e2e1f12	Update Python versions [ci skip]	2019-02-24 11:49:45 +01:00
Ines Montani	b624cb4b89	Update v2-1.md	2019-02-24 11:49:27 +01:00
Ines Montani	250e88ef55	Fix docs example (see #2728 )	2019-02-21 14:22:06 +01:00
Ines Montani	0fc908d7a5	Add note on merging speed in v2.1 (see #3300 ) [ci skip]	2019-02-21 12:34:18 +01:00
Ines Montani	236aa94ded	Update v2-1.md	2019-02-21 12:33:56 +01:00
Sofie	9a478b6db8	Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 ) * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib	2019-02-20 22:10:13 +01:00
Ines Montani	f73d01aa32	Update netlify.toml [ci skip]	2019-02-20 14:33:32 +01:00
Ines Montani	da5edbe434	Tidy up	2019-02-20 14:33:23 +01:00
Ines Montani	57ae71ea95	Add docs on serializing the pipeline (see #3289 ) [ci skip]	2019-02-18 14:13:29 +01:00
Ines Montani	38e4422c0d	Improve matcher example (resolves #3287 )	2019-02-18 13:26:37 +01:00
Ines Montani	660cfe44c5	Fix formatting	2019-02-18 13:26:22 +01:00
Ines Montani	c5476bd75b	Update languages.json	2019-02-18 10:03:35 +01:00
Ines Montani	212ff359ef	Fix links [ci skip]	2019-02-17 22:25:50 +01:00
Ines Montani	04b4df0ec9	Remove n_threads	2019-02-17 22:25:42 +01:00
Ines Montani	4c7ab7620a	Update README.md	2019-02-17 22:16:17 +01:00
Ines Montani	8a8523d8c1	Update README.md	2019-02-17 21:59:52 +01:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
Ines Montani	0184a95340	Merge branch 'master' into develop	2019-02-12 18:29:24 +01:00
Ines Montani	5dd39d8697	Update universe.json	2019-02-12 18:05:51 +01:00
Abhijit Balaji	75a40f56fc	added spacy-langdetect to universe.json (#3266 )	2019-02-12 18:04:38 +01:00
Ines Montani	8ad15a2377	Fix typo [ci skip]	2019-02-08 17:29:53 +01:00
Ines Montani	7a985cba24	Fix typo (closes #3232 ) [ci skip]	2019-02-08 17:29:18 +01:00
Ines Montani	5d0b60999d	Merge branch 'master' into develop	2019-02-07 20:54:07 +01:00
PierreMonico	114d64c4b5	Fix typo (#3223 )	2019-02-04 11:37:29 +01:00
adrianeboyd	03d58f9feb	Update TIGER/German dependency relations in documentation (#3204 ) * Add missing dependency relations for TIGER/German * Contributor agreement for adrianeboyd	2019-01-30 14:23:12 +01:00
Bram Vanroy	11cee62644	Updated spacy_conll information (#3158 )	2019-01-16 13:46:16 +01:00
Álvaro Abella Bascarán	1cd8f9823f	Correct docs of `Token.subtree` and `Span.subtree` (issue #3122 ) (#3124 ) * solve inconsistency between docs and Span.subtree (issue #3122) * solve inconsistency between docs and Token.subtree (issue #3122)	2019-01-09 03:11:15 +01:00
Mathieu Morey	f07b577fbd	Support CUDA 10 (#3126 ) * ENH support CUDA 10 * Update _instructions.jade	2019-01-09 03:10:45 +01:00
alvations	f43338a4c5	Joblib site has moved. (#3118 )	2019-01-05 13:10:54 +01:00
Matthew Honnibal	63b7accd74	💫 Make span.as_doc() return a copy, not a view. Closes #1537 (#3107 ) Initially span.as_doc() was designed to return a view of the span's contents, as a Doc object. This was a nice idea, but it fails due to the token.idx property, which refers to the character offset within the string. In a span, the idx of the first token might not be 0. Because this data is different, we can't have a view --- it'll be inconsistent. This patch changes span.as_doc() to instead return a copy. The docs are updated accordingly. Closes #1537 * Update test for span.as_doc() * Make span.as_doc() return a copy. Closes #1537 * Document change to Span.as_doc()	2018-12-30 15:17:46 +01:00
Sofie	b7916fffcf	Fixing few typos in the documentation (#3103 ) * few typos / small grammatical errors corrected in documentation * one more typo * one last typo	2018-12-28 15:52:26 +01:00
Ines Montani	2dc6c52ccc	Update displayed Binder version (see #3077 ) [ci skip]	2018-12-20 17:36:19 +01:00
Ines Montani	ca244f5f84	Small fixes to displaCy (#3076 ) ## Description - [x] fix auto-detection of Jupyter notebooks (even if `jupyter=True` isn't set) - [x] add `displacy.set_render_wrapper` method to define a custom function called around the HTML markup generated in all calls to `displacy.render` (can be used to allow custom integrations, callbacks and page formatting) - [x] add option to customise host for web server - [x] show warning if `displacy.serve` is called from within Jupyter notebooks - [x] move error message to `spacy.errors.Errors`. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-12-20 17:32:04 +01:00
Ines Montani	61d09c481b	Merge branch 'master' into develop	2018-12-18 13:48:10 +01:00
Ines Montani	8c0f0f50bc	Use nlp.make_doc instead of nlp for patterns [ci skip]	2018-12-08 11:56:01 +01:00
Aki Ariga	7fcd6419ff	Upadate the document for Unidic link with latest version URL (#3022 ) * Upadate Unidic link for latest version in document This patch improves #3017 . The link for Unidic was old version one, so will the lates version. * Add contributor agreement * Use more specific link for unidic-cwj	2018-12-07 17:24:48 +01:00
Ines Montani	27905a7b14	Remove reference to cuda10 in docs (closes #2894 ) [ci skip]	2018-12-06 16:05:37 +01:00
Gavriel Loria	9c8c4287bf	Accept iob2 and allow generic whitespace (#2999 ) * accept non-pipe whitespace as delimiter; allow iob2 filename * added small documentation note for IOB2 allowance * added contributor agreement	2018-12-06 15:50:25 +01:00
Paul O'Leary McCann	b36f6eabfb	Add note that Unidic is required for Japanese (#3017 ) This addresses #3001. -POLM	2018-12-06 15:14:10 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Gavriel Loria	919729d38c	replace user-facing references to "sbd" with "sentencizer" (#2985 ) ## Description Fixes #2693 Previously, the tokens `sbd` and `sentencizer` would create the same nlp pipe. Internally, both would be called `sbd`. This setup became problematic because it was hard for a user relying on the `sentencizer` pipe name to realize that their pipe's name would be `sbd` for all functions other than creating a pipe. This PR intends to change the API and API documentation to fully support `sentencizer` and drop any user-facing references to `sbd`. ### Types of change end-user API bug ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-30 21:22:40 +01:00
Ines Montani	add6469225	Add "new in v2.0.12" note to Span.ents (closes #2986 )	2018-11-30 20:50:55 +01:00
Ines Montani	37c7c85a86	💫 New JSON helpers, training data internals & CLI rewrite (#2932 ) * Support nowrap setting in util.prints * Tidy up and fix whitespace * Simplify script and use read_jsonl helper * Add JSON schemas (see #2928) * Deprecate Doc.print_tree Will be replaced with Doc.to_json, which will produce a unified format * Add Doc.to_json() method (see #2928) Converts Doc objects to JSON using the same unified format as the training data. Method also supports serializing selected custom attributes in the doc._. space. * Remove outdated test * Add write_json and write_jsonl helpers * WIP: Update spacy train * Tidy up spacy train * WIP: Use wasabi for formatting * Add GoldParse helpers for JSON format * WIP: add debug-data command * Fix typo * Add missing import * Update wasabi pin * Add missing import * 💫 Refactor CLI (#2943) To be merged into #2932. ## Description - [x] refactor CLI To use [`wasabi`](https://github.com/ines/wasabi) - [x] use [`black`](https://github.com/ambv/black) for auto-formatting - [x] add `flake8` config - [x] move all messy UD-related scripts to `cli.ud` - [x] make converters function that take the opened file and return the converted data (instead of having them handle the IO) ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Update wasabi pin * Delete old test * Update errors * Fix typo * Tidy up and format remaining code * Fix formatting * Improve formatting of messages * Auto-format remaining code * Add tok2vec stuff to spacy.train * Fix typo * Update wasabi pin * Fix path checks for when train() is called as function * Reformat and tidy up pretrain script * Update argument annotations * Raise error if model language doesn't match lang * Document new train command	2018-11-30 20:16:14 +01:00
wxv	06820ef6e7	Fix is_ascii documentation and create contributor file (#2988 ) Proposed in #2933	2018-11-30 15:57:58 +01:00
Ben Batorsky	658f7e0dc8	OntoNotes url fix (#2981 ) The website for OntoNotes 5 is: https://catalog.ldc.upenn.edu/LDC2013T19, currently the named entity section has it as https://catalog.ldc.upenn.edu/ldc2013T19.	2018-11-29 19:34:30 +01:00
Ines Montani	d33953037e	💫 Port master changes over to develop (#2979 ) * Create aryaprabhudesai.md (#2681) * Update _install.jade (#2688) Typo fix: "models" -> "model" * Add FAC to spacy.explain (resolves #2706) * Remove docstrings for deprecated arguments (see #2703) * When calling getoption() in conftest.py, pass a default option (#2709) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement * update bengali token rules for hyphen and digits (#2731) * Less norm computations in token similarity (#2730) * Less norm computations in token similarity * Contributor agreement * Remove ')' for clarity (#2737) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know. * added contributor agreement for mbkupfer (#2738) * Basic support for Telugu language (#2751) * Lex _attrs for polish language (#2750) * Signed spaCy contributor agreement * Added polish version of english lex_attrs * Introduces a bulk merge function, in order to solve issue #653 (#2696) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions * Describe converters more explicitly (see #2643) * Add multi-threading note to Language.pipe (resolves #2582) [ci skip] * Fix formatting * Fix dependency scheme docs (closes #2705) [ci skip] * Don't set stop word in example (closes #2657) [ci skip] * Add words to portuguese language _num_words (#2759) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Update Indonesian model (#2752) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file * Fixed spaCy+Keras example (#2763) * bug fixes in keras example * created contributor agreement * Adding French hyphenated first name (#2786) * Fix typo (closes #2784) * Fix typo (#2795) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer" * Adding basic support for Sinhala language. (#2788) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement * Also include lowercase norm exceptions * Fix error (#2802) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement * Add charlax's contributor agreement (#2805) * agreement of contributor, may I introduce a tiny pl languge contribution (#2799) * Contributors agreement * Contributors agreement * Contributors agreement * Add jupyter=True to displacy.render in documentation (#2806) * Revert "Also include lowercase norm exceptions" This reverts commit `70f4e8adf3`. * Remove deprecated encoding argument to msgpack * Set up dependency tree pattern matching skeleton (#2732) * Fix bug when too many entity types. Fixes #2800 * Fix Python 2 test failure * Require older msgpack-numpy * Restore encoding arg on msgpack-numpy * Try to fix version pin for msgpack-numpy * Update Portuguese Language (#2790) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language * Correct error in spacy universe docs concerning spacy-lookup (#2814) * Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround * Fix typo (closes #2815) [ci skip] * Update regex version dependency * Set version to 2.0.13.dev3 * Skip seemingly problematic test * Remove problematic test * Try previous version of regex * Revert "Remove problematic test" This reverts commit `bdebbef455`. * Unskip test * Try older version of regex * 💫 Update training examples and use minibatching (#2830) <!--- Provide a general summary of your changes in the title. --> ## Description Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results. ### Types of change enhancements ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Visual C++ link updated (#2842) (closes #2841) [ci skip] * New landing page * Add contribution agreement * Correcting lang/ru/examples.py (#2845) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file * Set version to 2.0.13.dev4 * Add Persian(Farsi) language support (#2797) * Also include lowercase norm exceptions * Remove in favour of https://github.com/explosion/spaCy/graphs/contributors * Rule-based French Lemmatizer (#2818) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> - Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version. - Add several files containing exhaustive list of words for each part of speech - Add some lemma rules - Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX - Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned - Modify the lemmatize function to check in lookup table as a last resort - Init files are updated so the model can support all the functionalities mentioned above - Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [X] I have submitted the spaCy Contributor Agreement. - [X] I ran the tests, and all new and existing tests passed. - [X] My changes don't require a change to the documentation, or if they do, I've added all required information. * Set version to 2.0.13 * Fix formatting and consistency * Update docs for new version [ci skip] * Increment version [ci skip] * Add info on wheels [ci skip] * Adding "This is a sentence" example to Sinhala (#2846) * Add wheels badge * Update badge [ci skip] * Update README.rst [ci skip] * Update murmurhash pin * Increment version to 2.0.14.dev0 * Update GPU docs for v2.0.14 * Add wheel to setup_requires * Import prefer_gpu and require_gpu functions from Thinc * Add tests for prefer_gpu() and require_gpu() * Update requirements and setup.py * Workaround bug in thinc require_gpu * Set version to v2.0.14 * Update push-tag script * Unhack prefer_gpu * Require thinc 6.10.6 * Update prefer_gpu and require_gpu docs [ci skip] * Fix specifiers for GPU * Set version to 2.0.14.dev1 * Set version to 2.0.14 * Update Thinc version pin * Increment version * Fix msgpack-numpy version pin * Increment version * Update version to 2.0.16 * Update version [ci skip] * Redundant ')' in the Stop words' example (#2856) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. * Documentation improvement regarding joblib and SO (#2867) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * raise error when setting overlapping entities as doc.ents (#2880) * Fix out-of-bounds access in NER training The helper method state.B(1) gets the index of the first token of the buffer, or -1 if no such token exists. Normally this is safe because we pass this to functions like state.safe_get(), which returns an empty token. Here we used it directly as an array index, which is not okay! This error may have been the cause of out-of-bounds access errors during training. Similar errors may still be around, so much be hunted down. Hunting this one down took a long time...I printed out values across training runs and diffed, looking for points of divergence between runs, when no randomness should be allowed. * Change PyThaiNLP Url (#2876) * Fix missing comma * Add example showing a fix-up rule for space entities * Set version to 2.0.17.dev0 * Update regex version * Revert "Update regex version" This reverts commit `62358dd867`. * Try setting older regex version, to align with conda * Set version to 2.0.17 * Add spacy-js to universe [ci-skip] * Add spacy-raspberry to universe (closes #2889) * Add script to validate universe json [ci skip] * Removed space in docs + added contributor indo (#2909) * - removed unneeded space in documentation * - added contributor info * Allow input text of length up to max_length, inclusive (#2922) * Include universe spec for spacy-wordnet component (#2919) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement * Minor formatting changes [ci skip] * Fix image [ci skip] Twitter URL doesn't work on live site * Check if the word is in one of the regular lists specific to each POS (#2886) * 💫 Create random IDs for SVGs to prevent ID clashes (#2927) Resolves #2924. ## Description Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.) ### Types of change bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix typo [ci skip] * fixes symbolic link on py3 and windows (#2949) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com> * Fix formatting * Update universe [ci skip] * Catalan Language Support (#2940) * Catalan language Support * Ddding Catalan to documentation * Sort languages alphabetically [ci skip] * Update tests for pytest 4.x (#2965) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize)) - [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here) ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix regex pin to harmonize with conda (#2964) * Update README.rst * Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) Fixes #2976 * Fix typo * Fix typo * Remove duplicate file * Require thinc 7.0.0.dev2 Fixes bug in gpu_ops that would use cupy instead of numpy on CPU * Add missing import * Fix error IDs * Fix tests	2018-11-29 16:30:29 +01:00
Ines Montani	c80c20e1ec	Sort languages alphabetically [ci skip]	2018-11-26 15:37:53 +01:00
Marc Puig	98fe1ab259	Catalan Language Support (#2940 ) * Catalan language Support * Ddding Catalan to documentation	2018-11-26 15:25:47 +01:00
Ines Montani	1844bc238a	Update universe [ci skip]	2018-11-26 14:16:22 +01:00
Ines Montani	696acb0f92	Fix typo [ci skip]	2018-11-24 15:20:57 +01:00
Ines Montani	dfcc8f02af	Fix image [ci skip] Twitter URL doesn't work on live site	2018-11-14 01:01:33 +01:00
Ines Montani	1aa91e926f	Minor formatting changes [ci skip]	2018-11-13 23:59:59 +01:00
Francisco Aranda	be99f1cac5	Include universe spec for spacy-wordnet component (#2919 ) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement	2018-11-13 23:54:46 +01:00
mikelibg	75e7d503b7	Removed space in docs + added contributor indo (#2909 ) * - removed unneeded space in documentation * - added contributor info	2018-11-08 14:18:25 +01:00
Ines Montani	11db4d2f27	Add script to validate universe json [ci skip]	2018-11-06 12:50:41 +01:00
Ines Montani	a9fda638a9	Add spacy-raspberry to universe (closes #2889 )	2018-11-06 12:45:50 +01:00
Ines Montani	c235ddf44f	Add spacy-js to universe [ci-skip]	2018-11-06 12:45:03 +01:00
Bram Vanroy	071789467e	Documentation improvement regarding joblib and SO (#2867 ) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-24 15:19:17 +02:00
Roman	5766d09a5b	Redundant ')' in the Stop words' example (#2856 ) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-18 10:21:16 +02:00
Ines Montani	c6a320cad4	Update version [ci skip]	2018-10-15 16:42:35 +02:00
Ines Montani	f02bb08f39	Update prefer_gpu and require_gpu docs [ci skip]	2018-10-14 23:30:44 +02:00
Ines Montani	5a4c5b78a8	Update GPU docs for v2.0.14	2018-10-14 16:38:12 +02:00
Ines Montani	ac4cadd31d	Add info on wheels [ci skip]	2018-10-14 00:04:37 +02:00
Ines Montani	30aa7f8b20	Increment version [ci skip]	2018-10-13 23:55:50 +02:00
Ines Montani	23d5b4ff5b	Update docs for new version [ci skip]	2018-10-13 23:53:33 +02:00
Ines Montani	f0e7da6478	Fix formatting and consistency	2018-10-13 23:53:26 +02:00
Jacopo Farina	42c42376a3	Visual C++ link updated (#2842 ) (closes #2841 ) [ci skip] * New landing page * Add contribution agreement	2018-10-12 14:59:45 +02:00
Ines Montani	7806deceb4	Fix typo (closes #2815 ) [ci skip]	2018-10-01 10:49:29 +02:00
Ioannis Daras	405a826436	Correct error in spacy universe docs concerning spacy-lookup (#2814 )	2018-10-01 10:24:50 +02:00
Charles-Axel Dein	014dd47c70	Add jupyter=True to displacy.render in documentation (#2806 )	2018-09-27 12:28:04 +02:00
Pranshu Jethmalani	9fd27d777e	Fix typo (#2795 ) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer"	2018-09-25 12:12:40 +02:00
Ines Montani	3c4e3ade30	Fix typo (closes #2784 )	2018-09-21 10:45:11 +02:00
Ines Montani	5001d31be6	Don't set stop word in example (closes #2657 ) [ci skip]	2018-09-12 15:36:51 +02:00
Ines Montani	4e89cfaae1	Fix dependency scheme docs (closes #2705 ) [ci skip]	2018-09-12 15:32:26 +02:00
Ines Montani	0729d1edca	Fix formatting	2018-09-12 15:32:08 +02:00
Ines Montani	907df53904	Add multi-threading note to Language.pipe (resolves #2582 ) [ci skip]	2018-09-12 15:03:30 +02:00
Ines Montani	885691a7ab	Describe converters more explicitly (see #2643 )	2018-09-12 14:53:03 +02:00
Steve Sharp	ca747f58a4	Update _install.jade (#2688 ) Typo fix: "models" -> "model"	2018-08-22 13:16:04 +02:00
Ines Montani	aeb49eb625	Update version [ci skip]	2018-08-16 16:56:02 +02:00
Ines Montani	a0eacd3293	Merge branch 'master' into develop	2018-08-16 16:55:05 +02:00
Ines Montani	c0fa9903f4	Update model directory JS [ci skip] Prevent the default release URL from being overwritten and add license type	2018-08-16 16:54:50 +02:00
Ines Montani	03f661fefb	Add Greek to models directory [ci skip]	2018-08-16 16:51:56 +02:00
Ines Montani	fd9d175a53	Update live code [ci skip]	2018-08-15 15:28:48 +02:00
Matthew Honnibal	4336397ecb	Update develop from master	2018-08-14 03:04:28 +02:00
Wojciech Łukasiewicz	3953e967a0	User correct variable name in the examples (#2664 ) * correct naming * add contributor agreement	2018-08-13 22:21:24 +02:00
Ines Montani	71723cece1	Add note on visualizing long texts ans sentences (see #2636 ) [ci skip]	2018-08-08 15:28:21 +02:00
Ines Montani	6147bd3eb4	Fix link target (closes #2645 ) [ci skip]	2018-08-08 15:03:52 +02:00
Ines Montani	8c47da1f19	Update Language serialization docs (see #2628 ) [ci skip] Add note on using from_disk and from_bytes via subclasses and add example	2018-08-07 14:17:57 +02:00
Matthew Honnibal	664cfc29bc	Merge branch 'master' of https://github.com/explosion/spaCy	2018-08-07 10:49:39 +02:00
Matthew Honnibal	2278c9734e	Fix spelling error #2640	2018-08-07 10:49:21 +02:00
Xiaoquan Kong	f0c9652ed1	New Feature: display more detail when Error E067 (#2639 ) * Fix off-by-one error * Add verbose option * Update verbose option * Update documents for verbose option	2018-08-07 10:45:29 +02:00
Ines Montani	6a4360e425	Update universe [ci skip]	2018-08-02 17:33:08 +02:00
Sami	dbc993f5b3	Updating description and code snippet spacy-lefff (#2623 ) * updating description and code snippet spacy-lefff * contributors agreement	2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav	d3e21aad64	Update _benchmarks.jade (#2618 )	2018-08-02 00:28:28 +02:00
Brian Phillips	8227de0099	Update language.jade (#2616 )	2018-07-31 12:34:42 +02:00
Ioannis Daras	055cc0de44	Bug fix to pseudocode for tokenizer customization (#2604 )	2018-07-27 11:04:12 +02:00
Andriy Mulyar	e9ef51137d	Fixed typo (#2596 ) Changed 'The index of the first character after the span.' to The index of the last character after the span' in description of doc.char_span	2018-07-25 22:17:15 +02:00
Ines Montani	75f3234404	💫 Refactor test suite (#2568 ) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-24 23:38:44 +02:00
kororo	b1ec827ee0	Fix typo (#2579 ) Update slogan, desc and code snippet to latest version	2018-07-24 22:47:33 +02:00
ines	cd687091fb	Remove nl examples from widget for now [ci skip] Restore for next spaCy version when path to example sentences is fixed	2018-07-24 22:41:20 +02:00
ines	2d8ffb8bcd	Fix formatting	2018-07-24 22:40:49 +02:00
ines	1b3da8d2ae	Update website for v2.0.12 [ci skip]	2018-07-24 21:04:22 +02:00
ines	ae5ed2d698	Update docs for v2.0.12 [ci skip]	2018-07-21 15:51:44 +02:00
ines	d517dd4297	Document remove_extension methods	2018-07-21 15:51:28 +02:00
ines	153f41a5cc	Use better examples for Doc extension methods	2018-07-21 15:51:11 +02:00
ines	3c30d1763c	Merge branch 'master' into develop	2018-07-21 15:34:18 +02:00
kororo	2784babef9	Add ExcelCy into Universe list (#2572 ) Hi guys, This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made. ## Description ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe. ### Types of change Update to Universe list in website. ## Checklist - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-19 19:28:33 +02:00
ines	80e7485630	Merge branch 'master' into develop	2018-07-18 17:28:47 +02:00
Xiang Ji	19a5ef1c58	Fix venv command examples (#2560 ) [ci skip] * Fix venv command examples The documentation refers to `venv`, which is native to Python3. However, the command examples are as if they were still `virtualenv`, which is a package independent of `venv`: - It doesn't need to be installed via `pip`. In fact `pip install venv` would return an error. - The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would return command not found. See https://docs.python.org/3/library/venv.html I suspect the documentation simply replaced all occurrences of `virtualenv` with `venv`. However they are different modules and are used differently. * Update comment [ci skip]	2018-07-18 10:31:24 +02:00
ines	50c367ee96	Update meta [ci skip]	2018-07-10 13:51:45 +02:00
ines	3a321e79ac	Merge branch 'master' into develop	2018-07-10 13:49:08 +02:00
ines	71bfc92913	Exclude models for non-stable versions [ci skip]	2018-07-10 13:44:55 +02:00
ines	b5200962c0	Adjust formatting [ci skip]	2018-07-09 18:35:46 +02:00
Alex Villarreal	bd35bf7f09	Guidance to handle binary files in git in Windows (#2526 ) Adds guidance on what to do if users encounter the error described in [1634](https://github.com/explosion/spaCy/issues/1634), which probably only happens in Windows environments.	2018-07-09 18:31:37 +02:00
ines	f575b01595	Update language and license meta [ci skip]	2018-07-04 15:09:36 +02:00
ines	63666af328	Merge branch 'master' into develop	2018-07-04 14:52:25 +02:00
Matthew Honnibal	a85620a731	Note CoreNLP tokenizer correction on website	2018-07-02 11:35:31 +02:00
ines	06c6dc6fbc	Update Juniper [ci skip]	2018-06-28 11:48:17 +02:00
Nipun Sadvilkar	741ba80bd5	Train model command n_iteration 20 -> 30 (#2454 ) In source code `train.py` default Number of iterations is 30	2018-06-18 11:57:08 +02:00
ines	53a2bc8c8d	Only scroll sidebar item into view if needed [ci skip]	2018-06-12 10:58:50 +02:00
ines	65713a6593	Increment versions [ci skip]	2018-06-12 10:49:50 +02:00
Ines Montani	968f6f0bda	💫 Document Cython API (#2433 ) ## Description This PR adds the most relevant documentation of spaCy's Cython API. (Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.) ### Types of change docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-06-11 17:47:46 +02:00
GolanLevy	72d7e80f94	adding a missing apostrophe (#2436 )	2018-06-11 17:47:24 +02:00
ines	778e5f4da3	Merge branch 'master' into develop	2018-06-11 00:38:04 +02:00
himkt	57311d5d47	replace janome with mecab in the documentation and the test (#2415 ) * Add links to Reddit data (see #2401) * replace janome with mecab in the documentation and the test * add the assignment	2018-06-11 00:33:13 +02:00
ines	effb55d591	Adjust formatting [ci skip]	2018-06-11 00:29:13 +02:00
Nathan Breit	ba6d2cf393	Add EpiTator to Universe (#2429 )	2018-06-11 00:24:13 +02:00
himkt	1a568f2e08	fix wrong documentations (#2423 )	2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi	d66292f767	fix UD data file extensions (#2425 ) * fix UD data files extension * add contributor agreement for msklvsk	2018-06-08 14:26:11 +02:00
ines	a0017e4909	Merge branch 'master' into develop	2018-05-30 14:10:47 +02:00
ines	0baaf836cf	Update formatting [ci skip]	2018-05-30 13:32:49 +02:00
ines	3913e18201	Add self-attentive-parser to universe (see #59 )	2018-05-30 13:31:28 +02:00

... 2 3 4 5 6 ...

1401 Commits