spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-25 15:39:46 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	2402ef498b	Remove unused import	2018-12-03 02:19:23 +01:00
Matthew Honnibal	1c71fdb805	Remove cytoolz usage from spaCy	2018-12-03 02:19:12 +01:00
Ines Montani	5b2741f751	Remove unused cytoolz / itertools imports	2018-12-03 02:12:07 +01:00
Ines Montani	ee4733b48c	Update srsly version pin	2018-12-03 02:10:37 +01:00
Matthew Honnibal	a7b085ae46	Set version back to 2.1.0a4	2018-12-03 02:03:26 +01:00
Matthew Honnibal	8e9a4d2f5e	Increment version to 2.1.0a5	2018-12-03 01:59:50 +01:00
Gavriel Loria	ae5601beae	Initialize trues to 0.0 in training example (#3004 ) * added contributor agreement * if there are no true positives, precision should be 0.0	2018-12-03 01:33:22 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Justin DuJardin	33fca8672f	fix issue compiling the latest spacy on MacOS 10.3.6 (#2998 )	2018-12-02 05:51:11 +01:00
Ines Montani	40b57ea4ac	Format example	2018-12-02 04:28:34 +01:00
Ines Montani	45798cc53e	Auto-format examples	2018-12-02 04:26:26 +01:00
Ines Montani	6f2d3c863a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-12-02 04:22:25 +01:00
Ines Montani	db7d250924	Update README.md	2018-12-02 04:22:23 +01:00
Matthew Honnibal	b47bd6a27f	Update thinc version	2018-12-02 03:57:19 +01:00
Matthew Honnibal	512ba48217	Revert "Allow binary deps when building pex" This reverts commit `2d0c366101`.	2018-12-01 17:37:27 +01:00
Matthew Honnibal	2d0c366101	Allow binary deps when building pex	2018-12-01 15:51:57 +01:00
Matthew Honnibal	fa617997de	Fix Thinc pin	2018-12-01 15:27:44 +01:00
Matthew Honnibal	78afc696b2	Fix push-tag script	2018-12-01 14:48:02 +01:00
Matthew Honnibal	40a273245c	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-12-01 14:43:29 +01:00
Matthew Honnibal	d9d339186b	Fix dropout and batch-size defaults	2018-12-01 13:42:35 +00:00
Matthew Honnibal	9536ee787c	Add comma deletion to data noising	2018-12-01 13:42:18 +00:00
Matthew Honnibal	21ee1c7a17	Improve parser multi-task objective	2018-12-01 13:41:24 +00:00
Matthew Honnibal	fe7d6f36b1	Fix parser default	2018-12-01 13:41:04 +00:00
Matthew Honnibal	a31d557f2d	Set version to v2.1.0a4	2018-12-01 14:40:03 +01:00
Ines Montani	5c966d0874	Simplify function	2018-12-01 04:59:12 +01:00
Ines Montani	ce7eec846b	Move CLi-specific Markdown helper to CLI	2018-12-01 04:55:48 +01:00
Ines Montani	40ae499f32	Remove unused helper function Now imported from wasabi	2018-12-01 04:54:46 +01:00
Ines Montani	e4f8bed3d2	Change order of requirements [ci skip]	2018-12-01 04:28:51 +01:00
Matthew Honnibal	bbaca991ba	Set version to v2.0.18	2018-12-01 03:35:09 +01:00
Matthew Honnibal	05b2336ffa	Try again to fix OSX build	2018-12-01 03:12:21 +01:00
Matthew Honnibal	e1a4b0d7f7	Set version to v2.0.18.dev1	2018-12-01 03:12:12 +01:00
Matthew Honnibal	413530b269	Set version to 2.0.18	2018-12-01 03:00:27 +01:00
Matthew Honnibal	24d52876e1	Set version to v2.0.18.dev0	2018-12-01 02:38:04 +01:00
Matthew Honnibal	4895b2e830	Merge branch 'master' of https://github.com/explosion/spaCy	2018-12-01 02:37:21 +01:00
Matthew Honnibal	3f16af123e	Try to fix OSX build error	2018-12-01 02:36:56 +01:00
Matthew Honnibal	61abb1ef70	Remove msgpack dependency, to try to fix #2995	2018-12-01 02:36:41 +01:00
Matthew Honnibal	3139b020b5	Fix train script	2018-11-30 22:17:08 +00:00
Matthew Honnibal	4aa1002546	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-11-30 20:58:51 +00:00
Matthew Honnibal	6bd1cc57ee	Increase length limit for pretrain	2018-11-30 20:58:18 +00:00
Gavriel Loria	919729d38c	replace user-facing references to "sbd" with "sentencizer" (#2985 ) ## Description Fixes #2693 Previously, the tokens `sbd` and `sentencizer` would create the same nlp pipe. Internally, both would be called `sbd`. This setup became problematic because it was hard for a user relying on the `sentencizer` pipe name to realize that their pipe's name would be `sbd` for all functions other than creating a pipe. This PR intends to change the API and API documentation to fully support `sentencizer` and drop any user-facing references to `sbd`. ### Types of change end-user API bug ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-30 21:22:40 +01:00
Ines Montani	add6469225	Add "new in v2.0.12" note to Span.ents (closes #2986 )	2018-11-30 20:50:55 +01:00
Ines Montani	37c7c85a86	💫 New JSON helpers, training data internals & CLI rewrite (#2932 ) * Support nowrap setting in util.prints * Tidy up and fix whitespace * Simplify script and use read_jsonl helper * Add JSON schemas (see #2928) * Deprecate Doc.print_tree Will be replaced with Doc.to_json, which will produce a unified format * Add Doc.to_json() method (see #2928) Converts Doc objects to JSON using the same unified format as the training data. Method also supports serializing selected custom attributes in the doc._. space. * Remove outdated test * Add write_json and write_jsonl helpers * WIP: Update spacy train * Tidy up spacy train * WIP: Use wasabi for formatting * Add GoldParse helpers for JSON format * WIP: add debug-data command * Fix typo * Add missing import * Update wasabi pin * Add missing import * 💫 Refactor CLI (#2943) To be merged into #2932. ## Description - [x] refactor CLI To use [`wasabi`](https://github.com/ines/wasabi) - [x] use [`black`](https://github.com/ambv/black) for auto-formatting - [x] add `flake8` config - [x] move all messy UD-related scripts to `cli.ud` - [x] make converters function that take the opened file and return the converted data (instead of having them handle the IO) ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Update wasabi pin * Delete old test * Update errors * Fix typo * Tidy up and format remaining code * Fix formatting * Improve formatting of messages * Auto-format remaining code * Add tok2vec stuff to spacy.train * Fix typo * Update wasabi pin * Fix path checks for when train() is called as function * Reformat and tidy up pretrain script * Update argument annotations * Raise error if model language doesn't match lang * Document new train command	2018-11-30 20:16:14 +01:00
Matthew Honnibal	0369db75c1	Fix support for parser multi-task objectives	2018-11-30 19:53:59 +01:00
Ines Montani	323fc26880	Tidy up and format remaining files	2018-11-30 17:43:08 +01:00
Matthew Honnibal	1b240f2119	Fix default token_vector_width	2018-11-30 16:40:11 +00:00
Ines Montani	2a95133138	Remove black from dev requirements for now	2018-11-30 17:16:49 +01:00
Ines Montani	eddeb36c96	💫 Tidy up and auto-format .py files (#2983 ) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files. - [x] Update flake8 config to exclude very large files (lemmatization tables etc.) - [x] Update code to be compatible with flake8 rules - [x] Fix various small bugs, inconsistencies and messy stuff in the language data - [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means) Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results. At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information. ### Types of change enhancement, code style ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-30 17:03:03 +01:00
Ines Montani	c9bdeafbc7	Don't run weird failing test for now	2018-11-30 16:13:40 +01:00
wxv	06820ef6e7	Fix is_ascii documentation and create contributor file (#2988 ) Proposed in #2933	2018-11-30 15:57:58 +01:00
Sofie	585de273cd	Fix small typo bug in French regexp + relevant unit test (#2980 ) * additional unit test for new entr word not in other lists * bugfix - unit test works * use _latin_lower instead of alpha_lower for french * revert back to ALPHA_LOWER (following the code for languages) * contributor agreement	2018-11-29 20:16:13 +01:00

1 2 3 4 5 ...

9346 Commits