Matthew Honnibal
|
6218af0105
|
Remove cpdef enum, to avoid too much code generation
|
2017-10-20 13:59:57 +02:00 |
|
Matthew Honnibal
|
92ac9316b5
|
Fix initialization of vectors, to address serialization problem
|
2017-10-20 13:59:24 +02:00 |
|
Ramanan Balakrishnan
|
0726946563
|
cleanup to_array implementation using fixes on master
|
2017-10-20 17:09:37 +05:30 |
|
ines
|
108f1f786e
|
Update symbols and document missing token attributes (see #1439)
|
2017-10-20 13:08:44 +02:00 |
|
ines
|
4acab77a8a
|
Add missing symbol for LAW entities (resolves #1427)
|
2017-10-20 13:07:57 +02:00 |
|
Ramanan Balakrishnan
|
b3ab124fc5
|
Support strings for attribute list in doc.to_array
|
2017-10-20 11:46:57 +05:30 |
|
Ramanan Balakrishnan
|
7b9b1be44c
|
Support single value for attribute list in doc.to_array
|
2017-10-19 17:00:41 +05:30 |
|
Matthew Honnibal
|
61bc203f3f
|
Merge pull request #1438 from explosion/feature/fast-parser
💫 Improve runtime CPU efficiency of parser/NER
|
2017-10-19 02:42:21 +02:00 |
|
Matthew Honnibal
|
15e5a04a8d
|
Clean up more depth=0 conditional code
|
2017-10-19 01:48:43 +02:00 |
|
Matthew Honnibal
|
906c50ac59
|
Fix loop typing, that caused error on windows
|
2017-10-19 01:48:39 +02:00 |
|
ines
|
24512420b1
|
Show error if data_path does not exist or is None (see #1102)
|
2017-10-19 00:53:49 +02:00 |
|
ines
|
bf415fd778
|
Add test for serializing extension attrs (see #1085)
|
2017-10-19 00:53:08 +02:00 |
|
Matthew Honnibal
|
960788aaa2
|
Eliminate dead code in parser, and raise errors for obsolete options
|
2017-10-19 00:42:34 +02:00 |
|
Matthew Honnibal
|
bbfd7d8d5d
|
Clean up parser multi-threading
|
2017-10-19 00:25:21 +02:00 |
|
Matthew Honnibal
|
f018f2030c
|
Try optimized parser forward loop
|
2017-10-18 21:48:00 +02:00 |
|
Matthew Honnibal
|
65bf5e85bd
|
Improve piping in language.pipe
|
2017-10-18 21:46:12 +02:00 |
|
Matthew Honnibal
|
633a75c7e0
|
Break parser batches into sub-batches, sorted by length.
|
2017-10-18 21:45:01 +02:00 |
|
Ines Montani
|
f0d577e460
|
Merge pull request #1425 from explosion/feature/hindi-tokenizer
💫 Basic Hindi tokenization support
|
2017-10-18 13:34:52 +02:00 |
|
Matthew Honnibal
|
394633efce
|
Make doc pickling support hooks
|
2017-10-17 19:44:09 +02:00 |
|
Matthew Honnibal
|
fe844148f6
|
Test pickling hooks
|
2017-10-17 19:43:52 +02:00 |
|
Matthew Honnibal
|
cdb0c426d8
|
Improve deserialization of user_data, esp. for Underscore
|
2017-10-17 19:29:20 +02:00 |
|
Matthew Honnibal
|
374819edf8
|
Test user_data deserialization, re #1085
|
2017-10-17 19:28:54 +02:00 |
|
Matthew Honnibal
|
e35a83d142
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-10-17 18:22:06 +02:00 |
|
Matthew Honnibal
|
f45973848c
|
Rename 'tokens' variable 'doc' in tokenizer
|
2017-10-17 18:21:41 +02:00 |
|
Matthew Honnibal
|
839de87ca9
|
Make lambda func a named function, for pickling
|
2017-10-17 18:21:20 +02:00 |
|
Matthew Honnibal
|
9baa8fe7ec
|
Convert closure to functools.partial, to promote pickling
|
2017-10-17 18:20:52 +02:00 |
|
Matthew Honnibal
|
32a8564c79
|
Fix doc pickling
|
2017-10-17 18:20:24 +02:00 |
|
Matthew Honnibal
|
8ca97f32a3
|
Fix doc pickling test
|
2017-10-17 18:19:57 +02:00 |
|
Matthew Honnibal
|
9ce7d6af87
|
Make lex attr functions top-level functions, to promote pickling
|
2017-10-17 18:19:18 +02:00 |
|
Matthew Honnibal
|
1cc85a89ef
|
Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().
|
2017-10-17 18:18:49 +02:00 |
|
Matthew Honnibal
|
0d57b9748a
|
Serialize lex_attr_getters with dill, for better pickle support
|
2017-10-17 18:17:45 +02:00 |
|
Matthew Honnibal
|
45d1dd90b1
|
Add tests for pickling doc
|
2017-10-17 17:20:58 +02:00 |
|
Ines Montani
|
afa67de7ee
|
Merge pull request #1428 from roanuz/develop
Fix trailing whitespace and Language.from_disk overwrites
|
2017-10-17 16:29:15 +02:00 |
|
Matthew Honnibal
|
92c1eb2d6f
|
Fix Doc pickling. This also removes need for Binder class
|
2017-10-17 16:11:13 +02:00 |
|
Matthew Honnibal
|
ed8da9b11f
|
Add missing return statement in SentenceSegmenter
|
2017-10-17 15:32:56 +02:00 |
|
Ines Montani
|
aab299c8ae
|
Merge pull request #1429 from vishnunekkanti/develop
fix syntax error in zh
|
2017-10-17 14:45:02 +02:00 |
|
Anto Binish Kaspar
|
534240648e
|
Fix trailing whitespace on morphology features
|
2017-10-17 17:15:58 +05:30 |
|
Anto Binish Kaspar
|
8f5b60c168
|
Fix Language.from_disk overwrites the meta.json file.
|
2017-10-17 17:15:32 +05:30 |
|
ines
|
8ca344712d
|
Add Language.has_pipe method
|
2017-10-17 11:20:07 +02:00 |
|
ines
|
485c4f6df5
|
Add Hungarian examples (see #1107)
|
2017-10-17 02:37:45 +02:00 |
|
Matthew Honnibal
|
19531bad4c
|
Merge branch 'develop' into feature/streaming-data-memory-growth
|
2017-10-16 21:44:11 +02:00 |
|
Matthew Honnibal
|
df488274b1
|
Fix deserialization of vectors
|
2017-10-16 20:55:00 +02:00 |
|
Matthew Honnibal
|
4018486d31
|
Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth
|
2017-10-16 20:49:48 +02:00 |
|
Matthew Honnibal
|
4174477161
|
Fix equality check in test
|
2017-10-16 19:50:35 +02:00 |
|
Matthew Honnibal
|
2bc06e4b22
|
Bump rolling buffer size to 10k
|
2017-10-16 19:38:29 +02:00 |
|
Matthew Honnibal
|
66e2eb8f39
|
Clean up remnant of frozen in StringStore
|
2017-10-16 19:34:41 +02:00 |
|
Matthew Honnibal
|
a002264fec
|
Remove caching of Token in Doc, as caused cycle.
|
2017-10-16 19:34:21 +02:00 |
|
Matthew Honnibal
|
3e037054c8
|
Remove obsolete is_frozen functionality from StringStore
|
2017-10-16 19:23:10 +02:00 |
|
Matthew Honnibal
|
5c14f3f033
|
Create a rolling buffer for the StringStore in Language.pipe()
|
2017-10-16 19:22:40 +02:00 |
|
Matthew Honnibal
|
59c216196c
|
Allow weakrefs on Doc objects
|
2017-10-16 19:22:11 +02:00 |
|
ines
|
d5418553eb
|
Fix whitespace
|
2017-10-16 18:30:04 +02:00 |
|
ines
|
6ceadcdb5c
|
Make sure from_disk passes string to numpy (see #1421)
If path is a WindowsPath, numpy does not recognise it as a path and as
a result, doesn't open the file.
https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L369
|
2017-10-16 18:29:56 +02:00 |
|
Matthew Honnibal
|
010a7309ff
|
Merge pull request #1402 from explosion/feature/fix-matcher-operators
💫 Fix Matcher variable-length operators
|
2017-10-16 17:53:19 +02:00 |
|
Matthew Honnibal
|
c29927d2e7
|
Fix matcher test
|
2017-10-16 17:22:18 +02:00 |
|
Vishnu Kumar Nekkanti
|
d3c54cf39a
|
fixed SyntaxError while checking for jieba
|
2017-10-16 18:51:33 +05:30 |
|
Matthew Honnibal
|
a928ae2f35
|
Merge branch 'develop' into feature/fix-matcher-operators
|
2017-10-16 13:38:36 +02:00 |
|
Matthew Honnibal
|
56aa42cc5d
|
Fix and document matcher operator 'shadowing' behaviour
|
2017-10-16 13:38:20 +02:00 |
|
Matthew Honnibal
|
748d525801
|
Add more matcher operator tests
|
2017-10-16 13:38:01 +02:00 |
|
Matthew Honnibal
|
0433181658
|
Document operator semantics in Matcher docstring
|
2017-10-16 12:06:33 +02:00 |
|
ines
|
266e7180a7
|
Add Language class, stop words and basic stemmer that sets NORM
|
2017-10-14 14:59:52 +02:00 |
|
ines
|
e85e1d571b
|
Update base punctuation
|
2017-10-14 14:59:23 +02:00 |
|
ines
|
9d6c8eaa49
|
Update base norm exceptions with more unicode characters
e.g. unicode variations of punctuation used in Chinese
|
2017-10-14 14:58:52 +02:00 |
|
ines
|
3516aa0cea
|
Port over changes from #1389
|
2017-10-14 13:32:55 +02:00 |
|
ines
|
cd6a29dce7
|
Port over changes from #1294
|
2017-10-14 13:28:46 +02:00 |
|
ines
|
38c756fd85
|
Port over changes from #1287
|
2017-10-14 13:16:21 +02:00 |
|
ines
|
612224c10d
|
Port over changes from #1157
|
2017-10-14 13:11:39 +02:00 |
|
ines
|
9b3f8f9ec3
|
Fix formatting and add comment on languages
|
2017-10-14 13:11:18 +02:00 |
|
ines
|
a4d974d97b
|
Port over URL pattern changes from #1411
|
2017-10-14 12:58:07 +02:00 |
|
ines
|
09aed58140
|
Port over changes from #1333 and add comments
|
2017-10-14 12:52:59 +02:00 |
|
Matthew Honnibal
|
cf6da9301a
|
Update lemmatizer test
|
2017-10-12 22:50:52 +02:00 |
|
Matthew Honnibal
|
9b90d235d1
|
Fix tag check in lemmatizer
|
2017-10-12 22:50:43 +02:00 |
|
Matthew Honnibal
|
dc01acd821
|
Escape encoding in validate function
|
2017-10-12 22:23:21 +02:00 |
|
Matthew Honnibal
|
27b927259a
|
Add locale_escape compat function
|
2017-10-12 22:22:04 +02:00 |
|
ines
|
9c6de3dcfa
|
Merge branch 'develop' into feature/cli-validate
|
2017-10-12 21:44:28 +02:00 |
|
Matthew Honnibal
|
462caf835a
|
Fix SBD test
|
2017-10-12 21:18:22 +02:00 |
|
ines
|
fff1028391
|
Add validate CLI command
|
2017-10-12 20:05:06 +02:00 |
|
Matthew Honnibal
|
908f44c3fe
|
Disable history features by default
|
2017-10-12 14:56:11 +02:00 |
|
Matthew Honnibal
|
a955843684
|
Increase default number of epochs
|
2017-10-12 13:13:01 +02:00 |
|
Matthew Honnibal
|
cecfcc7711
|
Set default hyper params back to 'slow' settings
|
2017-10-12 13:12:26 +02:00 |
|
Ines Montani
|
37aa523a8e
|
Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
|
2017-10-11 18:35:56 +02:00 |
|
ines
|
8ce6f96180
|
Don't make copies of language data components
|
2017-10-11 15:34:55 +02:00 |
|
ines
|
51519251c2
|
Fix underscore method test
|
2017-10-11 13:34:19 +02:00 |
|
ines
|
c6ae49e8bf
|
Fix formatting
|
2017-10-11 13:34:11 +02:00 |
|
ines
|
453c47ca24
|
Add German lemmatizer tests
|
2017-10-11 13:27:26 +02:00 |
|
ines
|
15fe0fd82d
|
Fix tests
|
2017-10-11 13:27:18 +02:00 |
|
ines
|
6dd14dc342
|
Add lookup lemmas to tokens without POS tags
|
2017-10-11 13:27:10 +02:00 |
|
ines
|
9620c1a640
|
Add lemma_lookup to Language defaults
|
2017-10-11 13:26:05 +02:00 |
|
ines
|
9fd471372a
|
Add lookup lemmatizer to lemmatizer as lookup() method
|
2017-10-11 13:25:51 +02:00 |
|
ines
|
e0ff145a8b
|
Merge branch 'develop' into feature/dot-underscore
|
2017-10-11 11:57:05 +02:00 |
|
ines
|
c1d6d43c83
|
Merge branch 'develop' into feature/lemmatizer
|
2017-10-11 11:56:35 +02:00 |
|
Matthew Honnibal
|
17c467e0ab
|
Avoid clobbering existing lemmas
|
2017-10-11 03:33:06 -05:00 |
|
Matthew Honnibal
|
807e109f2b
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-10-11 02:47:59 -05:00 |
|
Matthew Honnibal
|
6e552c9d83
|
Prune number of non-projective labels more aggressiely
|
2017-10-11 02:46:44 -05:00 |
|
Matthew Honnibal
|
76fe24f44d
|
Improve embedding defaults
|
2017-10-11 09:44:17 +02:00 |
|
Matthew Honnibal
|
188f620046
|
Improve parser defaults
|
2017-10-11 09:43:48 +02:00 |
|
Matthew Honnibal
|
acba2e1051
|
Fix metadata in training
|
2017-10-11 08:55:52 +02:00 |
|
Matthew Honnibal
|
74c2c6a58c
|
Add default name and lang to meta
|
2017-10-11 08:49:12 +02:00 |
|
Matthew Honnibal
|
3814a161e6
|
Avoid clobbering preset lemmas
|
2017-10-11 08:41:03 +02:00 |
|
Matthew Honnibal
|
fd47f8e89f
|
Fix failing test
|
2017-10-11 08:38:34 +02:00 |
|
Matthew Honnibal
|
462b2e26b4
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-10-11 08:23:04 +02:00 |
|