Commit Graph

10940 Commits

Author SHA1 Message Date
Matthew Honnibal
633a75c7e0 Break parser batches into sub-batches, sorted by length. 2017-10-18 21:45:01 +02:00
demfier
772c8035f7 Sign SCA 2017-10-18 23:12:24 +05:30
demfier
0b9e1d3660 Merge branch 'master' of https://github.com/explosion/spaCy into readme_update 2017-10-18 22:33:42 +05:30
demfier
f39fc34c95 Add minor update in README 2017-10-18 22:32:58 +05:30
Ines Montani
e7b78370d9 Add note on origin of manually moved agreement
See 8a2d22222d
2017-10-18 14:41:38 +02:00
Ines Montani
3357588b9f Create honnibal.md 2017-10-18 14:41:31 +02:00
Ines Montani
0b239ee646 Create ines.md 2017-10-18 14:37:08 +02:00
Ines Montani
9162ecb43f Update CONTRIBUTOR_AGREEMENT.md 2017-10-18 14:36:19 +02:00
Matthew Honnibal
e787045cf5 Revert "filled up CONTRIBUTOR_AGREEMENT.md"
This reverts commit 8a2d22222d.
2017-10-18 14:31:57 +02:00
Ines Montani
5a4b5b362c Create shuvanon.md 2017-10-18 14:29:10 +02:00
Ines Montani
8bd9b05fdc Update CONTRIBUTING.md 2017-10-18 14:13:36 +02:00
Ines Montani
f0d577e460 Merge pull request #1425 from explosion/feature/hindi-tokenizer
💫 Basic Hindi tokenization support
2017-10-18 13:34:52 +02:00
Ramanan Balakrishnan
b47b4e2654
Support single value for attribute list in doc.to_scalar conversion 2017-10-18 14:43:47 +05:30
Matthew Honnibal
394633efce Make doc pickling support hooks 2017-10-17 19:44:09 +02:00
Matthew Honnibal
fe844148f6 Test pickling hooks 2017-10-17 19:43:52 +02:00
Matthew Honnibal
cdb0c426d8 Improve deserialization of user_data, esp. for Underscore 2017-10-17 19:29:20 +02:00
Matthew Honnibal
374819edf8 Test user_data deserialization, re #1085 2017-10-17 19:28:54 +02:00
Matthew Honnibal
e35a83d142 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-17 18:22:06 +02:00
Matthew Honnibal
f45973848c Rename 'tokens' variable 'doc' in tokenizer 2017-10-17 18:21:41 +02:00
Matthew Honnibal
839de87ca9 Make lambda func a named function, for pickling 2017-10-17 18:21:20 +02:00
Matthew Honnibal
9baa8fe7ec Convert closure to functools.partial, to promote pickling 2017-10-17 18:20:52 +02:00
Matthew Honnibal
32a8564c79 Fix doc pickling 2017-10-17 18:20:24 +02:00
Matthew Honnibal
8ca97f32a3 Fix doc pickling test 2017-10-17 18:19:57 +02:00
Matthew Honnibal
9ce7d6af87 Make lex attr functions top-level functions, to promote pickling 2017-10-17 18:19:18 +02:00
Matthew Honnibal
1cc85a89ef Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes(). 2017-10-17 18:18:49 +02:00
Matthew Honnibal
0d57b9748a Serialize lex_attr_getters with dill, for better pickle support 2017-10-17 18:17:45 +02:00
Matthew Honnibal
45d1dd90b1 Add tests for pickling doc 2017-10-17 17:20:58 +02:00
Ines Montani
afa67de7ee Merge pull request #1428 from roanuz/develop
Fix trailing whitespace and Language.from_disk overwrites
2017-10-17 16:29:15 +02:00
ines
a74cba2ffa Remove Binder from docs (now covered by Doc API) 2017-10-17 16:27:19 +02:00
Matthew Honnibal
92c1eb2d6f Fix Doc pickling. This also removes need for Binder class 2017-10-17 16:11:13 +02:00
Matthew Honnibal
ed8da9b11f Add missing return statement in SentenceSegmenter 2017-10-17 15:32:56 +02:00
Ines Montani
aab299c8ae Merge pull request #1429 from vishnunekkanti/develop
fix syntax error in zh
2017-10-17 14:45:02 +02:00
Anto Binish Kaspar
534240648e Fix trailing whitespace on morphology features 2017-10-17 17:15:58 +05:30
Anto Binish Kaspar
8f5b60c168 Fix Language.from_disk overwrites the meta.json file. 2017-10-17 17:15:32 +05:30
ines
8ca344712d Add Language.has_pipe method 2017-10-17 11:20:07 +02:00
ines
485c4f6df5 Add Hungarian examples (see #1107) 2017-10-17 02:37:45 +02:00
Matthew Honnibal
fc797a58de Merge pull request #1424 from explosion/feature/streaming-data-memory-growth
💫 Fix streaming data memory growth (!!)
2017-10-16 23:08:18 +02:00
Matthew Honnibal
19531bad4c Merge branch 'develop' into feature/streaming-data-memory-growth 2017-10-16 21:44:11 +02:00
Matthew Honnibal
df488274b1 Fix deserialization of vectors 2017-10-16 20:55:00 +02:00
Matthew Honnibal
4018486d31 Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth 2017-10-16 20:49:48 +02:00
ines
4cfe259266 Fix formatting 2017-10-16 20:36:41 +02:00
ines
18793efef1 Remove Russian from v2.0 docs for now 2017-10-16 20:36:36 +02:00
ines
d383612225 Add note about word vectors in example (see #1117) 2017-10-16 20:31:58 +02:00
Matthew Honnibal
4174477161 Fix equality check in test 2017-10-16 19:50:35 +02:00
Matthew Honnibal
2bc06e4b22 Bump rolling buffer size to 10k 2017-10-16 19:38:29 +02:00
Matthew Honnibal
66e2eb8f39 Clean up remnant of frozen in StringStore 2017-10-16 19:34:41 +02:00
Matthew Honnibal
a002264fec Remove caching of Token in Doc, as caused cycle. 2017-10-16 19:34:21 +02:00
Matthew Honnibal
3e037054c8 Remove obsolete is_frozen functionality from StringStore 2017-10-16 19:23:10 +02:00
Matthew Honnibal
5c14f3f033 Create a rolling buffer for the StringStore in Language.pipe() 2017-10-16 19:22:40 +02:00
Matthew Honnibal
59c216196c Allow weakrefs on Doc objects 2017-10-16 19:22:11 +02:00