Matthew Honnibal
|
8eb0b7b779
|
Add docstrings for Pipe API
|
2017-09-25 16:22:07 +02:00 |
|
Matthew Honnibal
|
39f390dba7
|
Add docstrings for Pipe API
|
2017-09-25 16:20:49 +02:00 |
|
Matthew Honnibal
|
8716ffe57d
|
Serialize vocab last
|
2017-09-24 05:01:45 -05:00 |
|
Matthew Honnibal
|
72bbcc0871
|
Handle lemmatization for unknown string IDs
|
2017-09-24 05:01:31 -05:00 |
|
Matthew Honnibal
|
204b58c864
|
Fix evaluation during training
|
2017-09-24 05:01:03 -05:00 |
|
Matthew Honnibal
|
dc3a623d00
|
Remove unused update_shared argument
|
2017-09-24 05:00:37 -05:00 |
|
Matthew Honnibal
|
63bd87508d
|
Don't use iterated convolutions
|
2017-09-23 04:39:17 -05:00 |
|
Matthew Honnibal
|
5a7fd0fd36
|
Fix vector linkage
|
2017-09-22 20:11:52 -05:00 |
|
Matthew Honnibal
|
4348c479fc
|
Merge pre-trained vectors and noshare patches
|
2017-09-22 20:07:28 -05:00 |
|
Matthew Honnibal
|
7dc61b3f43
|
Whitespace
|
2017-09-22 20:00:50 -05:00 |
|
Matthew Honnibal
|
e93d43a43a
|
Fix training with preset vectors
|
2017-09-22 20:00:40 -05:00 |
|
Matthew Honnibal
|
0795857dcb
|
Fix beam parsing
|
2017-09-23 02:59:53 +02:00 |
|
Matthew Honnibal
|
4bd6a12b1f
|
Fix Tok2Vec
|
2017-09-23 02:58:54 +02:00 |
|
Matthew Honnibal
|
386c1a5bd8
|
Fix tagger training
|
2017-09-23 02:58:06 +02:00 |
|
Matthew Honnibal
|
a2357cce3f
|
Set random seed in train script
|
2017-09-23 02:57:31 +02:00 |
|
Matthew Honnibal
|
05596159bf
|
Fix serialization when pre-trained vectors
|
2017-09-22 15:33:27 -05:00 |
|
Matthew Honnibal
|
980fb6e854
|
Refactor Tok2Vec
|
2017-09-22 09:38:36 -05:00 |
|
Matthew Honnibal
|
d9124f1aa3
|
Add link_vectors_to_models function
|
2017-09-22 09:38:22 -05:00 |
|
Matthew Honnibal
|
a186596307
|
Add 'reapply' combinator, for iterated CNN
|
2017-09-22 09:37:03 -05:00 |
|
Matthew Honnibal
|
40a4873b70
|
Fix serialization of model options
|
2017-09-21 13:07:26 -05:00 |
|
Matthew Honnibal
|
0a9016cade
|
Fix serialization during training
|
2017-09-21 13:06:45 -05:00 |
|
Matthew Honnibal
|
20193371f5
|
Don't share CNN, to reduce complexities
|
2017-09-21 14:59:48 +02:00 |
|
Matthew Honnibal
|
1d73dec8b1
|
Refactor train script
|
2017-09-20 19:17:10 -05:00 |
|
Matthew Honnibal
|
ffda38356a
|
Add util function to enable GPU
|
2017-09-20 19:16:35 -05:00 |
|
Matthew Honnibal
|
24e85c2048
|
Pass values for CNN maxout pieces option
|
2017-09-20 19:16:12 -05:00 |
|
Matthew Honnibal
|
b832f89ff8
|
Add resume_training function
|
2017-09-20 19:15:20 -05:00 |
|
Matthew Honnibal
|
f5144f04be
|
Add argument for CNN maxout pieces
|
2017-09-20 19:14:41 -05:00 |
|
Matthew Honnibal
|
842e21de9f
|
Fix int type error for Python 2
|
2017-09-20 23:55:30 +02:00 |
|
Matthew Honnibal
|
0c93c73e49
|
Add __reduce__ method for PhraseMatcher
|
2017-09-20 22:26:40 +02:00 |
|
Matthew Honnibal
|
cc408fc189
|
Make PhraseMatcher API like Matcher API
|
2017-09-20 22:20:35 +02:00 |
|
Matthew Honnibal
|
43ad250dd5
|
Update matcher tests
|
2017-09-20 21:54:49 +02:00 |
|
Matthew Honnibal
|
828cc91545
|
Fix PhraseMatcher for spaCy 2
|
2017-09-20 21:54:31 +02:00 |
|
Matthew Honnibal
|
78301b2d29
|
Avoid comparison to None in Tok2Vec
|
2017-09-20 00:19:34 +02:00 |
|
Matthew Honnibal
|
b36a38f63d
|
Fix serialization of pretrained_dims property
|
2017-09-19 23:42:27 +02:00 |
|
Matthew Honnibal
|
2489dcaccf
|
Fix serialization of parser
|
2017-09-19 23:42:12 +02:00 |
|
Matthew Honnibal
|
40837b275d
|
Fix tensorizer with pretrained vectors
|
2017-09-18 18:05:38 -05:00 |
|
Matthew Honnibal
|
a0c4b33d03
|
Support resuming a model during spacy train
|
2017-09-18 18:04:47 -05:00 |
|
Matthew Honnibal
|
c858927271
|
Copy vectors to GPU on begin training
|
2017-09-18 18:04:16 -05:00 |
|
Matthew Honnibal
|
3fa76c17d1
|
Refactor Tok2Vec
|
2017-09-18 15:00:05 -05:00 |
|
Matthew Honnibal
|
217e7891cd
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-18 11:36:21 -05:00 |
|
Matthew Honnibal
|
7b3f391f80
|
Try dropping the Affine layer, conditionally
|
2017-09-18 11:35:59 -05:00 |
|
ines
|
2480f8f521
|
Add missing return in Doc.from_disk() (closes #1330)
|
2017-09-18 15:32:00 +02:00 |
|
Matthew Honnibal
|
2148ae605b
|
Dont use iterated convolutions
|
2017-09-17 17:36:04 -05:00 |
|
Matthew Honnibal
|
c013e5996f
|
Fix parser test
|
2017-09-17 13:13:20 -05:00 |
|
Matthew Honnibal
|
8f42f8d305
|
Remove unused 'preprocess' argument in Tok2Vec'
|
2017-09-17 12:30:16 -05:00 |
|
Matthew Honnibal
|
039d609362
|
Remove hard-coded default vectors width
|
2017-09-17 12:29:39 -05:00 |
|
Matthew Honnibal
|
4f38a67a89
|
Make width default to 0 in vectors.pyx
|
2017-09-17 12:29:14 -05:00 |
|
Matthew Honnibal
|
16122f566e
|
Fix cpdef enum in attrs.pyx
|
2017-09-17 12:28:53 -05:00 |
|
Matthew Honnibal
|
b159e0eb50
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-17 05:47:50 -05:00 |
|
Matthew Honnibal
|
2b0efc77ae
|
Fix wiring of pre-trained vectors in parser loading
|
2017-09-17 05:47:34 -05:00 |
|
Matthew Honnibal
|
31c2e91c35
|
Fix wiring of pre-trained vectors in parser loading
|
2017-09-17 05:46:55 -05:00 |
|
Matthew Honnibal
|
8f913a74ca
|
Fix defaults and args to build_tagger_model
|
2017-09-17 05:46:36 -05:00 |
|
Matthew Honnibal
|
c003c561c3
|
Revert NER action loading change, for model compatibility
|
2017-09-17 05:46:03 -05:00 |
|
Matthew Honnibal
|
43210abacc
|
Resolve fine-tuning conflict
|
2017-09-17 05:30:04 -05:00 |
|
ines
|
ece30c28a8
|
Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
|
2017-09-16 20:40:15 +02:00 |
|
ines
|
68f66aebf8
|
Use pkg_resources instead of pip for is_package (resolves #1293)
|
2017-09-16 20:27:59 +02:00 |
|
Matthew Honnibal
|
5ff2491f24
|
Pass option for pre-trained vectors in parser
|
2017-09-16 12:47:21 -05:00 |
|
Matthew Honnibal
|
8665a77f48
|
Fix feature error in NER
|
2017-09-16 12:46:57 -05:00 |
|
Matthew Honnibal
|
e37a50a436
|
Pass documents to tensorizer, not 'features'
|
2017-09-16 12:46:36 -05:00 |
|
Matthew Honnibal
|
84e637e2e6
|
Pass option for pretrained vectors in pipeline
|
2017-09-16 12:46:02 -05:00 |
|
Matthew Honnibal
|
2a93404da6
|
Support optional pre-trained vectors in tensorizer model
|
2017-09-16 12:45:37 -05:00 |
|
Matthew Honnibal
|
e0a2aa9289
|
Support having word vectors data on GPU
|
2017-09-16 12:45:09 -05:00 |
|
Matthew Honnibal
|
ebf8942564
|
Fix test for Python3
|
2017-09-16 16:22:38 +02:00 |
|
Matthew Honnibal
|
8c945310fb
|
Excuse emoji failure on narrow unicode builds
|
2017-09-16 16:21:13 +02:00 |
|
Matthew Honnibal
|
11f2a05ede
|
Fix code explosion from long enum in Python 3, Cython 0.24+
|
2017-09-16 12:20:04 +02:00 |
|
Matthew Honnibal
|
3fa5b40b5c
|
Add test for hash consistency
|
2017-09-16 11:21:35 +02:00 |
|
Matthew Honnibal
|
f730d07e4e
|
Fix prange error for Windows
|
2017-09-16 00:25:33 +02:00 |
|
Matthew Honnibal
|
4b2065430e
|
Merge branch 'feature/parser-history' into develop
|
2017-09-15 10:42:20 +02:00 |
|
Matthew Honnibal
|
2f08489694
|
Remove AddHistory layer -- didnt work as planned
|
2017-09-15 10:41:40 +02:00 |
|
Matthew Honnibal
|
8b481e0465
|
Remove redundant brackets
|
2017-09-15 10:38:08 +02:00 |
|
Matthew Honnibal
|
d84607f6bb
|
Vectorize update in AddHistory
|
2017-09-14 20:34:40 +02:00 |
|
Ines Montani
|
bd3da3d6fb
|
Port over change from #1323 and tidy up
|
2017-09-14 19:23:13 +02:00 |
|
Matthew Honnibal
|
18347ab69c
|
Implement AddHistory layer wrapper
|
2017-09-14 19:07:35 +02:00 |
|
Matthew Honnibal
|
d4ca6cef9e
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-14 17:00:07 +02:00 |
|
Matthew Honnibal
|
8c503487af
|
Fix lookup of missing NER actions
|
2017-09-14 16:59:45 +02:00 |
|
Matthew Honnibal
|
664c5af745
|
Revert padding in parser
|
2017-09-14 16:59:25 +02:00 |
|
Matthew Honnibal
|
8496d76224
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-14 09:21:20 -05:00 |
|
Matthew Honnibal
|
d1518027a9
|
Increment version
|
2017-09-14 16:18:46 +02:00 |
|
Matthew Honnibal
|
70da88a3a7
|
Update comment on Language.begin_training
|
2017-09-14 16:18:30 +02:00 |
|
Matthew Honnibal
|
c6395b057a
|
Improve parser feature extraction, for missing values
|
2017-09-14 16:18:02 +02:00 |
|
Matthew Honnibal
|
daf869ab3b
|
Fix add_action for NER, so labelled 'O' actions aren't added
|
2017-09-14 16:16:41 +02:00 |
|
Matthew Honnibal
|
9cb2aef587
|
Remove print statement
|
2017-09-14 13:38:28 +02:00 |
|
Matthew Honnibal
|
ba23d63c35
|
Fix minibatch function, for fixed batch size
|
2017-09-14 13:37:41 +02:00 |
|
Matthew Honnibal
|
456bb8a74c
|
Unxfail and close #1305
|
2017-09-06 19:14:17 +02:00 |
|
Matthew Honnibal
|
99e44fbdbb
|
Update regression test
|
2017-09-06 19:13:51 +02:00 |
|
Matthew Honnibal
|
5c3ff06924
|
Fix lemmatizer rules
|
2017-09-06 19:13:24 +02:00 |
|
Matthew Honnibal
|
dd9cab0faf
|
Fix type-check for int/long
|
2017-09-06 19:03:05 +02:00 |
|
Matthew Honnibal
|
497a9308a8
|
Xfail new lemmatizer test
|
2017-09-06 18:41:22 +02:00 |
|
Matthew Honnibal
|
dcbf866970
|
Merge parser changes
|
2017-09-06 18:41:05 +02:00 |
|
Matthew Honnibal
|
5384fff5ce
|
Add test for 1305: Incorrect lemmatization of VBZ for English
|
2017-09-06 18:40:18 +02:00 |
|
Matthew Honnibal
|
24ff6b0ad9
|
Fix parsing and tok2vec models
|
2017-09-06 05:50:58 -05:00 |
|
Matthew Honnibal
|
1b65115bc2
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-04 20:02:53 -05:00 |
|
Matthew Honnibal
|
33fa91feb7
|
Restore correctness of parser model
|
2017-09-04 21:19:30 +02:00 |
|
Matthew Honnibal
|
e88a42e460
|
Increment version
|
2017-09-04 21:14:39 +02:00 |
|
Matthew Honnibal
|
9d65d67985
|
Preserve model compatibility in parser, for now
|
2017-09-04 16:46:22 +02:00 |
|
Matthew Honnibal
|
d5fbf27335
|
Fix test
|
2017-09-04 16:45:11 +02:00 |
|
Matthew Honnibal
|
7fdafcc4c4
|
Fix config loading in tagger
|
2017-09-04 16:38:49 +02:00 |
|
Matthew Honnibal
|
058372d120
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-04 16:27:53 +02:00 |
|
Matthew Honnibal
|
16e25ce3b5
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-04 09:26:53 -05:00 |
|
Matthew Honnibal
|
9f512e657a
|
Fix drop_layer calculation
|
2017-09-04 09:26:38 -05:00 |
|
Matthew Honnibal
|
cb4839033c
|
Fix loader for EN tests
|
2017-09-04 15:19:18 +02:00 |
|
Matthew Honnibal
|
382ce566eb
|
Fix deserialization bug
|
2017-09-04 15:19:01 +02:00 |
|
Matthew Honnibal
|
bfddf50081
|
Fix #1296: Incorrect lemmatization of base form verbs
|
2017-09-04 15:18:41 +02:00 |
|
Matthew Honnibal
|
b29e6bff46
|
Improve lemmatization rule for am|VBP
|
2017-09-04 15:18:10 +02:00 |
|
Matthew Honnibal
|
644d6c9e1a
|
Improve lemmatization tests, re #1296
|
2017-09-04 15:17:44 +02:00 |
|
Matthew Honnibal
|
3cf3fa1704
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-02 12:46:11 -05:00 |
|
Matthew Honnibal
|
e920885676
|
Fix pickle during train
|
2017-09-02 12:46:01 -05:00 |
|
Matthew Honnibal
|
c0eaba8b28
|
Fix low-data textcat
|
2017-09-02 15:17:32 +02:00 |
|
Matthew Honnibal
|
9e378bdac5
|
Fix textcat serialization
|
2017-09-02 15:17:20 +02:00 |
|
Matthew Honnibal
|
e3ea6ee02b
|
Increment version
|
2017-09-02 15:17:01 +02:00 |
|
Matthew Honnibal
|
a3b69bcb3d
|
Add low_data mode in textcat
|
2017-09-02 14:56:30 +02:00 |
|
Matthew Honnibal
|
ead78c7b9b
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-09-02 12:55:25 +02:00 |
|
Matthew Honnibal
|
5e6a9e7dcc
|
Add rule-based SBD
|
2017-09-02 12:53:38 +02:00 |
|
Matthew Honnibal
|
a824cf8f9a
|
Adjust text classification model
|
2017-09-02 11:41:00 +02:00 |
|
Matthew Honnibal
|
ac040b99bb
|
Add support for pre-trained vectors in text classifier
|
2017-09-01 16:39:55 +02:00 |
|
Matthew Honnibal
|
7742a6d559
|
Add GloVe vectors reader
|
2017-09-01 16:39:22 +02:00 |
|
Matthew Honnibal
|
789e1a3980
|
Use 13 parser features, not 8
|
2017-08-31 14:13:00 -05:00 |
|
Matthew Honnibal
|
30e35d9666
|
Fix syntax error
|
2017-08-30 17:35:39 -05:00 |
|
Matthew Honnibal
|
4ceebde523
|
Fix gradient bug in parser
|
2017-08-30 17:32:56 -05:00 |
|
ines
|
173089a45a
|
Add more validation for model meta
|
2017-08-29 11:21:46 +02:00 |
|
Matthew Honnibal
|
2e28982e28
|
Merge pull request #1288 from geovedi/indonesian
Indonesian language support
|
2017-08-26 21:31:13 +02:00 |
|
ines
|
7e04b7f89c
|
Fix info text on pipeline in package cli
|
2017-08-26 18:30:59 +02:00 |
|
ines
|
40afa13a8a
|
Increment version
|
2017-08-26 18:30:49 +02:00 |
|
Matthew Honnibal
|
876f38c548
|
Merge pull request #1279 from oroszgy/model_cli_v2
Added vector loading to model cli
|
2017-08-26 15:57:50 +02:00 |
|
Matthew Honnibal
|
cfc055734e
|
Split % in units, for compatibility with corpus
|
2017-08-25 20:03:37 -05:00 |
|
Matthew Honnibal
|
4bb6bc3f9e
|
Add support for sent_start to GoldParse
|
2017-08-25 20:03:14 -05:00 |
|
Matthew Honnibal
|
44589fb38c
|
Fix Break oracle
|
2017-08-25 19:50:55 -05:00 |
|
Matthew Honnibal
|
6d4e8e14ca
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-08-25 12:37:16 -05:00 |
|
Matthew Honnibal
|
4ce5531389
|
Use layer norm instead of batch norm
|
2017-08-25 12:37:10 -05:00 |
|
Matthew Honnibal
|
20dd66ddc2
|
Constrain sentence boundaries to IS_PUNCT and IS_SPACE tokens
|
2017-08-25 19:35:47 +02:00 |
|
Jim Geovedi
|
58d8078971
|
Merge remote-tracking branch 'upstream/develop' into indonesian
|
2017-08-25 09:21:49 +08:00 |
|
Matthew Honnibal
|
6ceb0f0518
|
Allow Lexeme.rank to be set
|
2017-08-24 21:43:00 +02:00 |
|
Matthew Honnibal
|
44a1fa80d3
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-08-23 13:02:16 +02:00 |
|
ines
|
bb1abbeba5
|
Only link model if download was successfull
|
2017-08-23 12:36:31 +02:00 |
|
Matthew Honnibal
|
bb2541ffd3
|
Fix PROB attr for OOV words
|
2017-08-23 12:11:52 +02:00 |
|
Matthew Honnibal
|
1c5c256e58
|
Fix fine_tune when optimizer is None
|
2017-08-23 10:51:33 +02:00 |
|
Matthew Honnibal
|
9c580ad28a
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-08-22 17:02:04 -05:00 |
|
Matthew Honnibal
|
a4633fff6f
|
Restore use of batch norm in model
|
2017-08-22 17:01:58 -05:00 |
|
Matthew Honnibal
|
03b5b9727a
|
Fix Doc.vector for empty doc objects
|
2017-08-22 19:52:19 +02:00 |
|
Matthew Honnibal
|
0551b7b03a
|
Fix doc.vector
|
2017-08-22 19:46:52 +02:00 |
|
Matthew Honnibal
|
83f8e98450
|
Fix retrieval of OOV vectors
|
2017-08-22 19:46:35 +02:00 |
|
Matthew Honnibal
|
df2745eb08
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-08-22 19:00:43 +02:00 |
|
Matthew Honnibal
|
5b329acbf2
|
Fix vectors_length property in vocab
|
2017-08-22 19:00:27 +02:00 |
|
Matthew Honnibal
|
1fe605dfe5
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-08-21 19:18:31 -05:00 |
|
Matthew Honnibal
|
18b64e79ec
|
Fix fine tuning
|
2017-08-21 19:18:26 -05:00 |
|
Matthew Honnibal
|
682346dd66
|
Restore optimized hidden_depth=0 for parser
|
2017-08-21 19:18:04 -05:00 |
|
Matthew Honnibal
|
a21d8f3f0b
|
Add predict paths to _ml models
|
2017-08-21 23:23:45 +02:00 |
|
Matthew Honnibal
|
cec76801dc
|
Add profile command to CLI
|
2017-08-21 23:23:05 +02:00 |
|
Matthew Honnibal
|
7be5f30f17
|
Add profile function
|
2017-08-21 23:22:49 +02:00 |
|
ines
|
a68dc891ea
|
Port over changes from #1281
|
2017-08-21 23:19:18 +02:00 |
|