Ines Montani
959c46eabe
Merge pull request #1365 from wannaphongcom/develop
...
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Matthew Honnibal
1ef4236f8e
Merge pull request #1343 from explosion/feature/phrasematcher
...
Update PhraseMatcher for spaCy 2
2017-09-26 20:44:23 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4
fix thai test
2017-09-26 23:54:15 +07:00
ines
1ff62eaee7
Fix option shortcut to avoid conflict
2017-09-26 17:59:34 +02:00
Wannaphong Phatthiyaphaibun
3d5046c499
fix import in th
2017-09-26 22:41:20 +07:00
ines
7fdfb78141
Add version option to cli.train
2017-09-26 17:34:52 +02:00
Wannaphong Phatthiyaphaibun
a63f790b8c
fix thai tag_map
2017-09-26 22:28:57 +07:00
Wannaphong Phatthiyaphaibun
2ea27d07f4
fix tokenizer_exceptions in thai
2017-09-26 22:14:47 +07:00
Matthew Honnibal
41cc5c4c17
Merge branch 'develop' into feature/phrasematcher
2017-09-26 09:59:17 -05:00
Matthew Honnibal
c2e2f81773
Merge pull request #1355 from explosion/feature/noshare
...
Make pipeline components independent
2017-09-26 16:58:09 +02:00
Wannaphong Phatthiyaphaibun
a2bf4cc7bf
fix newline in file
2017-09-26 21:49:43 +07:00
ines
bb5c631402
Implement like_num getter for French (via #1161 )
2017-09-26 16:47:45 +02:00
ines
15479b3bae
Add comment to like_num re: future work
2017-09-26 16:43:28 +02:00
ines
adda08fe14
Implement like_num getter for Dutch (via #1177 )
2017-09-26 16:39:15 +02:00
ines
5ee10379db
Port over changes from #1340
2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun
5cba67146c
add thai in spacy2
2017-09-26 21:36:27 +07:00
ines
10d291f129
Port over change from #1351
2017-09-26 16:11:41 +02:00
Matthew Honnibal
3274b46a0d
Try to fix compile error on Windows
2017-09-26 09:05:53 -05:00
Matthew Honnibal
19c7c09bf7
Fix PhraseMatcher.__contains__
2017-09-26 08:35:53 -05:00
Matthew Honnibal
d02a41a8c9
Merge remote-tracking branch 'origin/develop' into feature/phrasematcher
2017-09-26 08:32:55 -05:00
Matthew Honnibal
698fc0d016
Remove merge artefact
2017-09-26 08:31:37 -05:00
Matthew Honnibal
defb68e94f
Update feature/noshare with recent develop changes
2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd
Use dep and ent multi-task objectives for parser'
2017-09-26 08:13:52 -05:00
Matthew Honnibal
9bfd585a11
Fix parameter name in .pxd file
2017-09-26 07:28:50 -05:00
Matthew Honnibal
74f08e1ad5
Update test
2017-09-26 06:45:56 -05:00
Matthew Honnibal
5aaef3e7b8
Dont link vectors in vocab deserialize
2017-09-26 06:45:47 -05:00
Matthew Honnibal
18a27c7579
Fix typo in tensorizer serialization
2017-09-26 06:45:14 -05:00
Matthew Honnibal
5056743ad5
Fix parser serialization
2017-09-26 06:44:56 -05:00
Ines Montani
7123139b2b
Add __contains__ to PhraseMatcher
2017-09-26 13:13:27 +02:00
Ines Montani
50ad50f96a
Update matcher.pyx
2017-09-26 13:11:17 +02:00
Matthew Honnibal
e34e70673f
Allow tagger models to be built with pre-defined tok2vec layer
2017-09-26 05:51:52 -05:00
Matthew Honnibal
bf917225ab
Allow multi-task objectives during training
2017-09-26 05:42:52 -05:00
Matthew Honnibal
4ae9ea7684
Remove unused argument in Language
2017-09-26 05:41:35 -05:00
ines
edf7e4881d
Add meta.json option to cli.train and add relevant properties
...
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
ines
d2d35b63b7
Fix formatting
2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779
Add docstrings for Pipe API
2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7
Add docstrings for Pipe API
2017-09-25 16:20:49 +02:00
Matthew Honnibal
8716ffe57d
Serialize vocab last
2017-09-24 05:01:45 -05:00
Matthew Honnibal
72bbcc0871
Handle lemmatization for unknown string IDs
2017-09-24 05:01:31 -05:00
Matthew Honnibal
204b58c864
Fix evaluation during training
2017-09-24 05:01:03 -05:00
Matthew Honnibal
dc3a623d00
Remove unused update_shared argument
2017-09-24 05:00:37 -05:00
Matthew Honnibal
63bd87508d
Don't use iterated convolutions
2017-09-23 04:39:17 -05:00
Matthew Honnibal
5a7fd0fd36
Fix vector linkage
2017-09-22 20:11:52 -05:00
Matthew Honnibal
4348c479fc
Merge pre-trained vectors and noshare patches
2017-09-22 20:07:28 -05:00
Matthew Honnibal
7dc61b3f43
Whitespace
2017-09-22 20:00:50 -05:00
Matthew Honnibal
e93d43a43a
Fix training with preset vectors
2017-09-22 20:00:40 -05:00
Matthew Honnibal
0795857dcb
Fix beam parsing
2017-09-23 02:59:53 +02:00
Matthew Honnibal
4bd6a12b1f
Fix Tok2Vec
2017-09-23 02:58:54 +02:00
Matthew Honnibal
386c1a5bd8
Fix tagger training
2017-09-23 02:58:06 +02:00
Matthew Honnibal
a2357cce3f
Set random seed in train script
2017-09-23 02:57:31 +02:00
Matthew Honnibal
05596159bf
Fix serialization when pre-trained vectors
2017-09-22 15:33:27 -05:00
Matthew Honnibal
980fb6e854
Refactor Tok2Vec
2017-09-22 09:38:36 -05:00
Matthew Honnibal
d9124f1aa3
Add link_vectors_to_models function
2017-09-22 09:38:22 -05:00
Matthew Honnibal
a186596307
Add 'reapply' combinator, for iterated CNN
2017-09-22 09:37:03 -05:00
Matthew Honnibal
40a4873b70
Fix serialization of model options
2017-09-21 13:07:26 -05:00
Matthew Honnibal
0a9016cade
Fix serialization during training
2017-09-21 13:06:45 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
1d73dec8b1
Refactor train script
2017-09-20 19:17:10 -05:00
Matthew Honnibal
ffda38356a
Add util function to enable GPU
2017-09-20 19:16:35 -05:00
Matthew Honnibal
24e85c2048
Pass values for CNN maxout pieces option
2017-09-20 19:16:12 -05:00
Matthew Honnibal
b832f89ff8
Add resume_training function
2017-09-20 19:15:20 -05:00
Matthew Honnibal
f5144f04be
Add argument for CNN maxout pieces
2017-09-20 19:14:41 -05:00
Matthew Honnibal
842e21de9f
Fix int type error for Python 2
2017-09-20 23:55:30 +02:00
Matthew Honnibal
0c93c73e49
Add __reduce__ method for PhraseMatcher
2017-09-20 22:26:40 +02:00
Matthew Honnibal
cc408fc189
Make PhraseMatcher API like Matcher API
2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5
Update matcher tests
2017-09-20 21:54:49 +02:00
Matthew Honnibal
828cc91545
Fix PhraseMatcher for spaCy 2
2017-09-20 21:54:31 +02:00
Matthew Honnibal
78301b2d29
Avoid comparison to None in Tok2Vec
2017-09-20 00:19:34 +02:00
Matthew Honnibal
b36a38f63d
Fix serialization of pretrained_dims property
2017-09-19 23:42:27 +02:00
Matthew Honnibal
2489dcaccf
Fix serialization of parser
2017-09-19 23:42:12 +02:00
Matthew Honnibal
40837b275d
Fix tensorizer with pretrained vectors
2017-09-18 18:05:38 -05:00
Matthew Honnibal
a0c4b33d03
Support resuming a model during spacy train
2017-09-18 18:04:47 -05:00
Matthew Honnibal
c858927271
Copy vectors to GPU on begin training
2017-09-18 18:04:16 -05:00
Matthew Honnibal
3fa76c17d1
Refactor Tok2Vec
2017-09-18 15:00:05 -05:00
Matthew Honnibal
217e7891cd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-18 11:36:21 -05:00
Matthew Honnibal
7b3f391f80
Try dropping the Affine layer, conditionally
2017-09-18 11:35:59 -05:00
ines
2480f8f521
Add missing return in Doc.from_disk() ( closes #1330 )
2017-09-18 15:32:00 +02:00
Matthew Honnibal
2148ae605b
Dont use iterated convolutions
2017-09-17 17:36:04 -05:00
Matthew Honnibal
c013e5996f
Fix parser test
2017-09-17 13:13:20 -05:00
Matthew Honnibal
8f42f8d305
Remove unused 'preprocess' argument in Tok2Vec'
2017-09-17 12:30:16 -05:00
Matthew Honnibal
039d609362
Remove hard-coded default vectors width
2017-09-17 12:29:39 -05:00
Matthew Honnibal
4f38a67a89
Make width default to 0 in vectors.pyx
2017-09-17 12:29:14 -05:00
Matthew Honnibal
16122f566e
Fix cpdef enum in attrs.pyx
2017-09-17 12:28:53 -05:00
Matthew Honnibal
b159e0eb50
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-17 05:47:50 -05:00
Matthew Honnibal
2b0efc77ae
Fix wiring of pre-trained vectors in parser loading
2017-09-17 05:47:34 -05:00
Matthew Honnibal
31c2e91c35
Fix wiring of pre-trained vectors in parser loading
2017-09-17 05:46:55 -05:00
Matthew Honnibal
8f913a74ca
Fix defaults and args to build_tagger_model
2017-09-17 05:46:36 -05:00
Matthew Honnibal
c003c561c3
Revert NER action loading change, for model compatibility
2017-09-17 05:46:03 -05:00
Matthew Honnibal
43210abacc
Resolve fine-tuning conflict
2017-09-17 05:30:04 -05:00
ines
ece30c28a8
Don't split hyphenated words in German
...
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
ines
68f66aebf8
Use pkg_resources instead of pip for is_package ( resolves #1293 )
2017-09-16 20:27:59 +02:00
Matthew Honnibal
5ff2491f24
Pass option for pre-trained vectors in parser
2017-09-16 12:47:21 -05:00
Matthew Honnibal
8665a77f48
Fix feature error in NER
2017-09-16 12:46:57 -05:00
Matthew Honnibal
e37a50a436
Pass documents to tensorizer, not 'features'
2017-09-16 12:46:36 -05:00
Matthew Honnibal
84e637e2e6
Pass option for pretrained vectors in pipeline
2017-09-16 12:46:02 -05:00
Matthew Honnibal
2a93404da6
Support optional pre-trained vectors in tensorizer model
2017-09-16 12:45:37 -05:00
Matthew Honnibal
e0a2aa9289
Support having word vectors data on GPU
2017-09-16 12:45:09 -05:00
Matthew Honnibal
ebf8942564
Fix test for Python3
2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb
Excuse emoji failure on narrow unicode builds
2017-09-16 16:21:13 +02:00
Matthew Honnibal
11f2a05ede
Fix code explosion from long enum in Python 3, Cython 0.24+
2017-09-16 12:20:04 +02:00
Matthew Honnibal
3fa5b40b5c
Add test for hash consistency
2017-09-16 11:21:35 +02:00
Matthew Honnibal
f730d07e4e
Fix prange error for Windows
2017-09-16 00:25:33 +02:00
Matthew Honnibal
4b2065430e
Merge branch 'feature/parser-history' into develop
2017-09-15 10:42:20 +02:00
Matthew Honnibal
2f08489694
Remove AddHistory layer -- didnt work as planned
2017-09-15 10:41:40 +02:00
Matthew Honnibal
8b481e0465
Remove redundant brackets
2017-09-15 10:38:08 +02:00
Matthew Honnibal
d84607f6bb
Vectorize update in AddHistory
2017-09-14 20:34:40 +02:00
Ines Montani
bd3da3d6fb
Port over change from #1323 and tidy up
2017-09-14 19:23:13 +02:00
Matthew Honnibal
18347ab69c
Implement AddHistory layer wrapper
2017-09-14 19:07:35 +02:00
Matthew Honnibal
d4ca6cef9e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-14 17:00:07 +02:00
Matthew Honnibal
8c503487af
Fix lookup of missing NER actions
2017-09-14 16:59:45 +02:00
Matthew Honnibal
664c5af745
Revert padding in parser
2017-09-14 16:59:25 +02:00
Matthew Honnibal
8496d76224
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-14 09:21:20 -05:00
Matthew Honnibal
d1518027a9
Increment version
2017-09-14 16:18:46 +02:00
Matthew Honnibal
70da88a3a7
Update comment on Language.begin_training
2017-09-14 16:18:30 +02:00
Matthew Honnibal
c6395b057a
Improve parser feature extraction, for missing values
2017-09-14 16:18:02 +02:00
Matthew Honnibal
daf869ab3b
Fix add_action for NER, so labelled 'O' actions aren't added
2017-09-14 16:16:41 +02:00
Matthew Honnibal
9cb2aef587
Remove print statement
2017-09-14 13:38:28 +02:00
Matthew Honnibal
ba23d63c35
Fix minibatch function, for fixed batch size
2017-09-14 13:37:41 +02:00
Matthew Honnibal
456bb8a74c
Unxfail and close #1305
2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb
Update regression test
2017-09-06 19:13:51 +02:00
Matthew Honnibal
5c3ff06924
Fix lemmatizer rules
2017-09-06 19:13:24 +02:00
Matthew Honnibal
dd9cab0faf
Fix type-check for int/long
2017-09-06 19:03:05 +02:00
Matthew Honnibal
497a9308a8
Xfail new lemmatizer test
2017-09-06 18:41:22 +02:00
Matthew Honnibal
dcbf866970
Merge parser changes
2017-09-06 18:41:05 +02:00
Matthew Honnibal
5384fff5ce
Add test for 1305: Incorrect lemmatization of VBZ for English
2017-09-06 18:40:18 +02:00
Matthew Honnibal
24ff6b0ad9
Fix parsing and tok2vec models
2017-09-06 05:50:58 -05:00
Matthew Honnibal
1b65115bc2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-04 20:02:53 -05:00
Matthew Honnibal
33fa91feb7
Restore correctness of parser model
2017-09-04 21:19:30 +02:00
Matthew Honnibal
e88a42e460
Increment version
2017-09-04 21:14:39 +02:00
Matthew Honnibal
9d65d67985
Preserve model compatibility in parser, for now
2017-09-04 16:46:22 +02:00
Matthew Honnibal
d5fbf27335
Fix test
2017-09-04 16:45:11 +02:00
Matthew Honnibal
7fdafcc4c4
Fix config loading in tagger
2017-09-04 16:38:49 +02:00
Matthew Honnibal
058372d120
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-04 16:27:53 +02:00
Matthew Honnibal
16e25ce3b5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-04 09:26:53 -05:00
Matthew Honnibal
9f512e657a
Fix drop_layer calculation
2017-09-04 09:26:38 -05:00
Matthew Honnibal
cb4839033c
Fix loader for EN tests
2017-09-04 15:19:18 +02:00
Matthew Honnibal
382ce566eb
Fix deserialization bug
2017-09-04 15:19:01 +02:00
Matthew Honnibal
bfddf50081
Fix #1296 : Incorrect lemmatization of base form verbs
2017-09-04 15:18:41 +02:00
Matthew Honnibal
b29e6bff46
Improve lemmatization rule for am|VBP
2017-09-04 15:18:10 +02:00
Matthew Honnibal
644d6c9e1a
Improve lemmatization tests, re #1296
2017-09-04 15:17:44 +02:00
Matthew Honnibal
3cf3fa1704
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-02 12:46:11 -05:00
Matthew Honnibal
e920885676
Fix pickle during train
2017-09-02 12:46:01 -05:00
Matthew Honnibal
c0eaba8b28
Fix low-data textcat
2017-09-02 15:17:32 +02:00
Matthew Honnibal
9e378bdac5
Fix textcat serialization
2017-09-02 15:17:20 +02:00
Matthew Honnibal
e3ea6ee02b
Increment version
2017-09-02 15:17:01 +02:00
Matthew Honnibal
a3b69bcb3d
Add low_data mode in textcat
2017-09-02 14:56:30 +02:00
Matthew Honnibal
ead78c7b9b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-02 12:55:25 +02:00
Matthew Honnibal
5e6a9e7dcc
Add rule-based SBD
2017-09-02 12:53:38 +02:00
Matthew Honnibal
a824cf8f9a
Adjust text classification model
2017-09-02 11:41:00 +02:00
Matthew Honnibal
ac040b99bb
Add support for pre-trained vectors in text classifier
2017-09-01 16:39:55 +02:00