spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 20:28:20 +03:00

Author	SHA1	Message	Date
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Matthew Honnibal	bdc77848f5	Add helper method to apply a transition in parser/NER	2019-03-10 13:00:00 +01:00
Matthew Honnibal	d74dbde828	Fix order of actions when labels added to parser When labels were added to the parser or NER, we weren't loading back the classes in the correct order. Re issue #3189	2019-02-24 16:36:29 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Matthew Honnibal	5080760288	Add extra comment on 'add label' in parser	2018-08-15 15:37:24 +02:00
Matthew Honnibal	8661218fe8	Refactor parser (#2308 ) * Work on refactoring greedy parser * Compile updated parser * Fix refactored parser * Update test * Fix refactored parser * Fix refactored parser * Readd beam search after refactor * Fix beam search after refactor * Fix parser * Fix beam parsing * Support oracle segmentation in ud-train CLI command * Avoid relying on final gold check in beam search * Add a keyword argument sink to GoldParse * Bug fixes to beam search after refactor * Avoid importing fused token symbol in ud-run-test, untl that's added * Avoid importing fused token symbol in ud-run-test, untl that's added * Don't modify Token in global scope * Fix error in beam gradient calculation * Default to beam_update_prob 1 * Set a more aggressive threshold on the max violn update * Disable some tests to figure out why CI fails * Disable some tests to figure out why CI fails * Add some diagnostics to travis.yml to try to figure out why build fails * Tell Thinc to link against system blas on Travis * Point thinc to libblas on Travis * Try running sudo=true for travis * Unhack travis.sh * Restore beam_density argument for parser beam * Require thinc 6.11.1.dev16 * Revert hacks to tests * Revert hacks to travis.yml * Update thinc requirement * Fix parser model loading * Fix size limits in training data * Add missing name attribute for parser * Fix appveyor for Windows	2018-05-15 22:17:29 +02:00
Matthew Honnibal	2c4a6d66fa	Merge master into develop. Big merge, many conflicts -- need to review	2018-04-29 14:49:26 +02:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Matthew Honnibal	1f7229f40f	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `c9ba3d3c2d`, reversing changes made to `92c26a35d4`.	2018-03-27 19:23:02 +02:00
Matthew Honnibal	2512ea9eeb	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
ines	b4d226a3f1	Tidy up syntax	2017-10-27 19:45:57 +02:00
Matthew Honnibal	f018f2030c	Try optimized parser forward loop	2017-10-18 21:48:00 +02:00
Matthew Honnibal	dd9cab0faf	Fix type-check for int/long	2017-09-06 19:03:05 +02:00
Matthew Honnibal	c307a0ffb8	Restore patches from nn-beam-parser to spacy/syntax	2017-08-18 22:38:59 +02:00
Matthew Honnibal	5f81d700ff	Restore patches from nn-beam-parser to spacy/syntax	2017-08-18 22:23:03 +02:00
Matthew Honnibal	426f84937f	Resolve conflicts when merging new beam parsing stuff	2017-08-18 13:38:32 -05:00
Matthew Honnibal	a6d8d7c82e	Add is_gold_parse method to transition system	2017-08-16 18:24:09 -05:00
Matthew Honnibal	52c180ecf5	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `ea8de11ad5`, reversing changes made to `08e443e083`.	2017-08-14 13:00:23 +02:00
Matthew Honnibal	78498a072d	Return Transition for missing actions in lookup_action	2017-08-06 14:16:36 +02:00
Matthew Honnibal	3da1063b36	Add beam decoding to parser, to allow NER uncertainties	2017-07-20 15:02:55 +02:00
Matthew Honnibal	097ab9c6e4	Fix transition system to/from disk	2017-05-31 13:44:00 +02:00
Matthew Honnibal	ff26aa6c37	Work on to/from bytes/disk serialization methods	2017-05-29 11:45:45 +02:00
Matthew Honnibal	7996d21717	Fixes for new StringStore	2017-05-28 11:09:27 -05:00
Matthew Honnibal	84e66ca6d4	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
Matthew Honnibal	7ebd26b8aa	Use ordered dict to specify transitions	2017-05-27 15:52:20 -05:00
Matthew Honnibal	3d5a536eaa	Improve efficiency of parser batching	2017-05-26 11:31:23 -05:00
Matthew Honnibal	e2136232f9	Exclude states with no matching gold annotations from parsing	2017-05-22 10:30:12 -05:00
Matthew Honnibal	8b04b0af9f	Remove freqs from transition_system	2017-05-20 02:20:48 -05:00
Matthew Honnibal	a9edb3aa1d	Improve integration of NN parser, to support unified training API	2017-05-15 21:53:27 +02:00
Matthew Honnibal	45464d065e	Remove print statement	2017-04-15 16:11:43 +02:00
ines	0739ae7b76	Tidy up and fix formatting and imports	2017-04-15 13:05:15 +02:00
Matthew Honnibal	354458484c	WIP on add_label bug during NER training Currently when a new label is introduced to NER during training, it causes the labels to be read in in an unexpected order. This invalidates the model.	2017-04-14 23:52:17 +02:00
Matthew Honnibal	c90dc7ac29	Clean up state initiatisation in transition system	2017-03-16 11:59:11 -05:00
Matthew Honnibal	931feb3360	Allow beam parsing for NER	2017-03-11 11:12:01 -06:00
Matthew Honnibal	708ea22208	Infer types in transition_system.pyx	2016-10-27 18:08:13 +02:00
Matthew Honnibal	508fd1f6dc	* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.	2016-05-02 14:25:10 +02:00
Matthew Honnibal	bcf8f7ba40	* Add a parse_batch method to Parser, that releases the GIL around a batch of documents.	2016-02-01 08:34:55 +01:00
Matthew Honnibal	28e5ad62bc	* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents	2016-02-01 03:00:15 +01:00
Matthew Honnibal	a47f00901b	* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents	2016-02-01 02:58:14 +01:00
Matthew Honnibal	10877a7791	* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser	2016-01-30 14:31:36 +01:00
Matthew Honnibal	04d0686b26	* Make TransitionSystem.add_action idempotent, i.e. ignore duplicate added actions.	2016-01-19 20:10:04 +01:00
Matthew Honnibal	151aa0b0e2	* Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model	2016-01-19 19:09:33 +01:00
Matthew Honnibal	20fd36a0f7	* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.	2015-10-13 13:44:41 +11:00
Matthew Honnibal	cc9deae960	* Add is_valid method to transition_system	2015-08-08 23:36:18 +02:00
Matthew Honnibal	a8bbd7312c	* Hackishly patch long dependencies problem	2015-07-28 00:14:29 +02:00
Matthew Honnibal	bb583f7f09	* Hackishly patch long dependencies problem	2015-07-27 23:14:33 +02:00
Matthew Honnibal	12699a1152	* Set initial freqs, to avoid missing values in serializer	2015-07-23 01:16:27 +02:00
Matthew Honnibal	317cbbc015	* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.	2015-07-19 15:18:17 +02:00
Matthew Honnibal	e29daea85f	* Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool, but in C it means int. So, type-casting to bint* is unsafe.	2015-07-17 22:37:24 +02:00
Matthew Honnibal	9a8db9743c	* Remove gil from parser.call	2015-07-14 23:47:33 +02:00

1 2

73 Commits