Matthew Honnibal
490ad3eaf0
Check that empty strings are handled. Closes #1242
2017-10-21 00:52:14 +02:00
Matthew Honnibal
8f8bccecb9
Patch deserialisation for invalid loads, to avoid model failure
2017-10-21 00:51:42 +02:00
Matthew Honnibal
d8391b1c4d
Fix #1434 : Matcher failed on ending ? if no token
2017-10-20 16:49:36 +02:00
Matthew Honnibal
fec53f09f7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-20 16:28:34 +02:00
Matthew Honnibal
f111b228e0
Fix re-parsing of previously parsed text
...
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:
1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.
This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.
Closes #1253 .
2017-10-20 16:27:36 +02:00
ines
108f1f786e
Update symbols and document missing token attributes (see #1439 )
2017-10-20 13:08:44 +02:00
ines
4acab77a8a
Add missing symbol for LAW entities ( resolves #1427 )
2017-10-20 13:07:57 +02:00
Matthew Honnibal
61bc203f3f
Merge pull request #1438 from explosion/feature/fast-parser
...
💫 Improve runtime CPU efficiency of parser/NER
2017-10-19 02:42:21 +02:00
Matthew Honnibal
15e5a04a8d
Clean up more depth=0 conditional code
2017-10-19 01:48:43 +02:00
Matthew Honnibal
906c50ac59
Fix loop typing, that caused error on windows
2017-10-19 01:48:39 +02:00
ines
24512420b1
Show error if data_path does not exist or is None (see #1102 )
2017-10-19 00:53:49 +02:00
ines
bf415fd778
Add test for serializing extension attrs (see #1085 )
2017-10-19 00:53:08 +02:00
Matthew Honnibal
d4cfff0476
Comment out currently hard-coded hyper-params
2017-10-19 00:47:24 +02:00
Matthew Honnibal
960788aaa2
Eliminate dead code in parser, and raise errors for obsolete options
2017-10-19 00:42:34 +02:00
Matthew Honnibal
bbfd7d8d5d
Clean up parser multi-threading
2017-10-19 00:25:21 +02:00
Matthew Honnibal
f018f2030c
Try optimized parser forward loop
2017-10-18 21:48:00 +02:00
Matthew Honnibal
79fcf8576a
Compile with march=native
2017-10-18 21:46:34 +02:00
Matthew Honnibal
65bf5e85bd
Improve piping in language.pipe
2017-10-18 21:46:12 +02:00
Matthew Honnibal
633a75c7e0
Break parser batches into sub-batches, sorted by length.
2017-10-18 21:45:01 +02:00
Ines Montani
f0d577e460
Merge pull request #1425 from explosion/feature/hindi-tokenizer
...
💫 Basic Hindi tokenization support
2017-10-18 13:34:52 +02:00
Matthew Honnibal
394633efce
Make doc pickling support hooks
2017-10-17 19:44:09 +02:00
Matthew Honnibal
fe844148f6
Test pickling hooks
2017-10-17 19:43:52 +02:00
Matthew Honnibal
cdb0c426d8
Improve deserialization of user_data, esp. for Underscore
2017-10-17 19:29:20 +02:00
Matthew Honnibal
374819edf8
Test user_data deserialization, re #1085
2017-10-17 19:28:54 +02:00
Matthew Honnibal
e35a83d142
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-17 18:22:06 +02:00
Matthew Honnibal
f45973848c
Rename 'tokens' variable 'doc' in tokenizer
2017-10-17 18:21:41 +02:00
Matthew Honnibal
839de87ca9
Make lambda func a named function, for pickling
2017-10-17 18:21:20 +02:00
Matthew Honnibal
9baa8fe7ec
Convert closure to functools.partial, to promote pickling
2017-10-17 18:20:52 +02:00
Matthew Honnibal
32a8564c79
Fix doc pickling
2017-10-17 18:20:24 +02:00
Matthew Honnibal
8ca97f32a3
Fix doc pickling test
2017-10-17 18:19:57 +02:00
Matthew Honnibal
9ce7d6af87
Make lex attr functions top-level functions, to promote pickling
2017-10-17 18:19:18 +02:00
Matthew Honnibal
1cc85a89ef
Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().
2017-10-17 18:18:49 +02:00
Matthew Honnibal
0d57b9748a
Serialize lex_attr_getters with dill, for better pickle support
2017-10-17 18:17:45 +02:00
Matthew Honnibal
45d1dd90b1
Add tests for pickling doc
2017-10-17 17:20:58 +02:00
Ines Montani
afa67de7ee
Merge pull request #1428 from roanuz/develop
...
Fix trailing whitespace and Language.from_disk overwrites
2017-10-17 16:29:15 +02:00
ines
a74cba2ffa
Remove Binder from docs (now covered by Doc API)
2017-10-17 16:27:19 +02:00
Matthew Honnibal
92c1eb2d6f
Fix Doc pickling. This also removes need for Binder class
2017-10-17 16:11:13 +02:00
Matthew Honnibal
ed8da9b11f
Add missing return statement in SentenceSegmenter
2017-10-17 15:32:56 +02:00
Ines Montani
aab299c8ae
Merge pull request #1429 from vishnunekkanti/develop
...
fix syntax error in zh
2017-10-17 14:45:02 +02:00
Anto Binish Kaspar
534240648e
Fix trailing whitespace on morphology features
2017-10-17 17:15:58 +05:30
Anto Binish Kaspar
8f5b60c168
Fix Language.from_disk overwrites the meta.json file.
2017-10-17 17:15:32 +05:30
ines
8ca344712d
Add Language.has_pipe method
2017-10-17 11:20:07 +02:00
ines
485c4f6df5
Add Hungarian examples (see #1107 )
2017-10-17 02:37:45 +02:00
Matthew Honnibal
fc797a58de
Merge pull request #1424 from explosion/feature/streaming-data-memory-growth
...
💫 Fix streaming data memory growth (!!)
2017-10-16 23:08:18 +02:00
Matthew Honnibal
19531bad4c
Merge branch 'develop' into feature/streaming-data-memory-growth
2017-10-16 21:44:11 +02:00
Matthew Honnibal
df488274b1
Fix deserialization of vectors
2017-10-16 20:55:00 +02:00
Matthew Honnibal
4018486d31
Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth
2017-10-16 20:49:48 +02:00
ines
4cfe259266
Fix formatting
2017-10-16 20:36:41 +02:00
ines
18793efef1
Remove Russian from v2.0 docs for now
2017-10-16 20:36:36 +02:00
ines
d383612225
Add note about word vectors in example (see #1117 )
2017-10-16 20:31:58 +02:00