Matthew Honnibal
1a59db1c86
Fix dropout and learn rate in parser
2017-08-12 05:44:39 -05:00
Matthew Honnibal
d01dc3704a
Adjust parser model
2017-08-09 20:06:33 -05:00
Matthew Honnibal
f37528ef58
Pass embed size for parser fine-tune. Use SELU
2017-08-09 17:52:53 -05:00
Matthew Honnibal
f93f2bed58
Revert use of layer normalization in Tok2Vec
2017-08-09 17:47:03 -05:00
Matthew Honnibal
20944dd8aa
Fix conflict in parser fine-tuning
2017-08-09 16:43:05 -05:00
Matthew Honnibal
ac2de6dced
Switch to ReLu layers in Tok2Vec
2017-08-09 16:41:25 -05:00
Matthew Honnibal
bbace204be
Gate parser fine-tuning behind feature flag
2017-08-09 16:40:42 -05:00
Matthew Honnibal
a59a1deac4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-09 16:23:19 -05:00
Matthew Honnibal
bcce6f7de0
Fix parser fine tuning
2017-08-09 16:23:12 -05:00
ines
28e2fec23b
Fix autolinking failure on fresh model install ( resolves #1138 )
...
On fresh install via subprocess, pip.get_installed_distributions()
won't show new model, so is_package check in link command fails.
Solution for now is to get model package path explicitly and pass it to
link command.
2017-08-09 11:52:38 +02:00
Jim Geovedi
c62b49b7cc
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-09 09:17:46 +07:00
Matthew Honnibal
dbdd8afc4b
Fix parser fine-tune training
2017-08-08 15:46:07 -05:00
Matthew Honnibal
88bf1cf87c
Update parser for fine tuning
2017-08-08 15:34:17 -05:00
Matthew Honnibal
5d837c3776
Add mix weights on fine_tune
2017-08-07 06:32:59 -05:00
Matthew Honnibal
42bd26f6f3
Give parser its own tok2vec weights
2017-08-06 18:33:46 +02:00
Matthew Honnibal
3ed203de25
Use LayerNorm and SELU in Tok2Vec
2017-08-06 18:33:18 +02:00
Matthew Honnibal
78498a072d
Return Transition for missing actions in lookup_action
2017-08-06 14:16:36 +02:00
Matthew Honnibal
4a5cc89138
Fix tagger 'fine_tune', to keep private CNN weights
2017-08-06 14:15:48 +02:00
Matthew Honnibal
3cb8f06881
Fix NeuralLabeller
2017-08-06 14:15:14 +02:00
Matthew Honnibal
0acce0521b
Fix Language.update for pipeline
2017-08-06 14:13:03 +02:00
Matthew Honnibal
bfffdeabb2
Fix parser batch-size bug introduced during cleanup
2017-08-06 14:10:48 +02:00
Matthew Honnibal
0eec7c9e9b
Fix Language.evaluate
2017-08-06 02:18:31 +02:00
Matthew Honnibal
0a566dc320
Add update_tensors flag to Language.update. Experimental, re #1182
2017-08-06 02:18:12 +02:00
Matthew Honnibal
cc19ea0e7c
Add update_tensors flag to Language.update. Experimental, re #1182
2017-08-06 02:17:10 +02:00
Matthew Honnibal
4cfb7a54e7
Fix tagger
2017-08-06 01:53:31 +02:00
Matthew Honnibal
e9ab800e15
Fix tagging model
2017-08-06 01:50:08 +02:00
Matthew Honnibal
468c138ab3
WIP: Add fine-tuning logic to tagger model, re #1182
2017-08-06 01:13:23 +02:00
Matthew Honnibal
7f876a7a82
Clean up some unused code in parser
2017-08-06 00:00:21 +02:00
Matthew Honnibal
ae1ad81069
Increment version
2017-08-05 18:09:32 +02:00
Jim Geovedi
cc4772cac2
reworks
2017-08-03 13:08:38 +07:00
Jim Geovedi
37f19f5ed2
added more currencies based on corpus data
2017-08-03 13:03:25 +07:00
Jim Geovedi
30fd068d42
hashtag prefix should be handled somewhere else
2017-08-03 13:03:02 +07:00
Jim Geovedi
4705ae19ba
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-03 12:40:19 +07:00
Jim Geovedi
ba07e23c87
added USD in currency rules
2017-08-02 22:42:47 +07:00
Matthew Honnibal
5c323daa1a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-01 22:10:37 +02:00
Matthew Honnibal
2e00361522
Fix update when 0 docs
2017-08-01 22:10:17 +02:00
Matthew Honnibal
8fce187de4
Fix ArcEager for missing values
2017-08-01 22:10:05 +02:00
ines
78e262140f
Add workaround for displaCy server on Python 2/3 ( resolves #1227 )
...
Make sure status and headers are bytes on Python 2 and strings on
Python 3
2017-08-01 01:11:35 +02:00
Jim Geovedi
2572a9ddf0
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-07-30 21:24:16 +07:00
Jim Geovedi
bb08d696f9
added hashtag rule and fixed currency rules
2017-07-30 21:23:28 +07:00
Jim Geovedi
e9af79a803
added u-\d+ rules (sports team)
2017-07-30 21:23:01 +07:00
Matthew Honnibal
27abc56e98
Add method to get beam entities
2017-07-29 21:59:02 +02:00
Matthew Honnibal
ec63f4fe7b
Add option to control how missing entities are handled when getting NER tags
2017-07-29 21:58:37 +02:00
Jim Geovedi
e5adc26c72
simplified rules
2017-07-29 18:21:32 +07:00
Jim Geovedi
783f7d8b86
added test set for Indonesian language
2017-07-29 18:21:07 +07:00
Jim Geovedi
4d04898dea
updated regexp
2017-07-29 17:44:57 +07:00
Jim Geovedi
7d96d477ea
updated like_num
2017-07-29 17:44:46 +07:00
Jim Geovedi
3cca4ed798
added lex attrs rules
2017-07-29 17:22:21 +07:00
Jim Geovedi
8b814c63f1
more exceptions
2017-07-27 19:46:30 +07:00
Jim Geovedi
6c725e8dcf
updated lemma
2017-07-27 19:46:21 +07:00
Jim Geovedi
c194f7ae26
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-07-27 10:55:34 +07:00
Jim Geovedi
547973b92a
wip syntax iterators
2017-07-27 10:51:34 +07:00
Jim Geovedi
bbc75da38d
enable syntax iterator and lemma lookup
2017-07-27 10:51:15 +07:00
Jim Geovedi
24a8c8bf28
added wip lemma dict
2017-07-26 21:39:54 +07:00
Jim Geovedi
63f14ba46b
added hyphen-suffix rules
2017-07-26 19:28:57 +07:00
Jim Geovedi
f288964441
removed -el from suffix rules
2017-07-26 19:28:38 +07:00
Jim Geovedi
6eee7a7411
updated tokenizer exceptions
2017-07-26 19:13:47 +07:00
Jim Geovedi
edec51b1b1
update punctuation rules
2017-07-26 19:13:36 +07:00
Jim Geovedi
62443d495a
enable token match
2017-07-26 19:13:14 +07:00
Jim Geovedi
c97f5ae0bb
updated tokenizer exceptions
2017-07-26 19:12:52 +07:00
Matthew Honnibal
aff325b7e0
Increment version
2017-07-25 19:41:20 +02:00
Matthew Honnibal
6780132821
Fix tagger loading
2017-07-25 19:41:11 +02:00
Matthew Honnibal
fd20a4af55
Increment version
2017-07-25 18:58:34 +02:00
Matthew Honnibal
523b0df2c9
Update text classification model
2017-07-25 18:57:59 +02:00
Matthew Honnibal
7c7fac9337
Add spacy.blank() loading function
2017-07-25 18:56:37 +02:00
Jim Geovedi
73f6ac9d9b
added hyhen
2017-07-24 15:56:31 +07:00
Jim Geovedi
68454c40bf
added missing import
2017-07-24 14:12:34 +07:00
Jim Geovedi
eaf9cbd708
cursed of copy & paste
2017-07-24 14:11:51 +07:00
Jim Geovedi
7aad6718bc
enable tokenizer exceptions
2017-07-24 14:11:10 +07:00
Jim Geovedi
ad56c9179a
added tokenizer exceptions list
2017-07-24 14:10:16 +07:00
Jim Geovedi
c1f3fe99fe
updated punctuation rules
2017-07-24 13:57:21 +07:00
Jim Geovedi
37fa2c8c80
punctution rules
2017-07-24 06:17:18 +07:00
Jim Geovedi
082e94ac1c
added inflix rules
2017-07-24 06:17:07 +07:00
Jim Geovedi
d0ec484725
reverted
2017-07-24 06:16:29 +07:00
Jim Geovedi
0e590c711f
added prefix & suffix rules
2017-07-23 23:46:40 +07:00
Jim Geovedi
ba922e30e8
added ampere hour unit
2017-07-23 23:46:18 +07:00
Jim Geovedi
3b17eba27b
added frequency units
2017-07-23 23:10:52 +07:00
Jim Geovedi
d5fd32a572
added known currencies
2017-07-23 22:56:48 +07:00
Jim Geovedi
f6f15678fb
added lex_attrs
2017-07-23 22:55:22 +07:00
Jim Geovedi
bed8162d00
added tokenizer_exceptions
2017-07-23 22:55:05 +07:00
Jim Geovedi
b80c35bc9a
added norm_exceptions
2017-07-23 22:54:49 +07:00
Jim Geovedi
b5de329ea3
added norm_exceptions
2017-07-23 22:54:19 +07:00
Jim Geovedi
082e9ade46
fixed typo
2017-07-23 21:30:34 +07:00
Jim Geovedi
e2efeb186e
added stopwords
2017-07-23 20:52:37 +07:00
Jim Geovedi
da98676839
use template
2017-07-23 20:51:31 +07:00
Jim Geovedi
c2b4dd7809
start working on Indonesian language
2017-07-23 20:50:56 +07:00
Matthew Honnibal
5771bd1ff8
Increment version
2017-07-23 14:18:38 +02:00
Matthew Honnibal
c4a81a47a4
Fix deserialization
2017-07-23 14:11:07 +02:00
Matthew Honnibal
2df563ad24
Remove optimization for textcat that caused loading problem
2017-07-23 14:10:51 +02:00
Matthew Honnibal
4fe77bced2
Add cfg attr to pipeline components
2017-07-23 00:52:47 +02:00
Matthew Honnibal
d8aa721664
Compute Language.meta with a property
2017-07-23 00:50:18 +02:00
Matthew Honnibal
a88a7deffe
Five save/load of textcat config
2017-07-23 00:33:43 +02:00
Matthew Honnibal
9bae0ddc50
Fix minibatching
2017-07-22 20:14:49 +02:00
Matthew Honnibal
ded0df5e2f
Expose hyper-param as keyword arg
2017-07-22 20:14:37 +02:00
Matthew Honnibal
f5de8deeec
Increment version
2017-07-22 20:04:53 +02:00
Matthew Honnibal
b55714d5d1
Make gold_tuples arg optional in begin_training
2017-07-22 20:04:43 +02:00
Matthew Honnibal
ed6c85fa3c
Fix loading of text categories in GoldParse
2017-07-22 20:04:03 +02:00
Matthew Honnibal
6ffec9dfea
Update _ml, for textcat model
2017-07-22 20:03:40 +02:00
Matthew Honnibal
d6a5c2c85a
Add test for NER
2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da
Add test for beam parsing
2017-07-22 01:48:35 +02:00
Matthew Honnibal
c86445bdfd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-07-22 01:14:28 +02:00
Matthew Honnibal
b3a749610e
Fix name of TextCategorizer
2017-07-22 01:14:07 +02:00
Matthew Honnibal
2424493970
Remove unnecessary import of Mock
2017-07-22 01:13:54 +02:00
Matthew Honnibal
baa3d81c35
Add text categorizer to Language
2017-07-22 01:13:36 +02:00
Matthew Honnibal
a6a2159969
Add slot for text categories to Doc
2017-07-22 00:34:15 +02:00
Matthew Honnibal
374ab3ecfb
Increment alpha version
2017-07-22 00:32:49 +02:00
Matthew Honnibal
289f23df51
Test beam parsing
2017-07-20 15:03:10 +02:00
Matthew Honnibal
3da1063b36
Add beam decoding to parser, to allow NER uncertainties
2017-07-20 15:02:55 +02:00
Matthew Honnibal
0ca5832427
Improve negative example handling in NER oracle
2017-07-20 00:18:49 +02:00
Matthew Honnibal
a231b56d40
Add text-classification hook to pipeline
2017-07-20 00:18:15 +02:00
Matthew Honnibal
7ea50182a5
Add support for text-classification labels to GoldParse
2017-07-20 00:17:47 +02:00
Matthew Honnibal
727481377e
Add text-classifer thinc models
2017-07-20 00:17:17 +02:00
Matthew Honnibal
f014138c11
Fix parser tests
2017-07-20 00:16:52 +02:00
Ines Montani
c91642efd5
Port over changes from #1168
2017-07-01 11:43:54 +02:00
Jim Regan
d81ceb0cd5
Merge branch 'develop' into polish
2017-06-26 22:42:27 +01:00
Jim O'Regan
2f84c73585
a start
2017-06-26 22:40:04 +01:00
Jim O'Regan
28d7f0a672
reference
2017-06-26 22:38:28 +01:00
Matthew Honnibal
91e52543ef
Merge pull request #1118 from Gregory-Howard/patch-2
...
Update _tokenizer_exceptions_list (adding cities)
2017-06-20 11:16:07 +02:00
Matthew Honnibal
8ea785e01a
Merge pull request #1119 from oroszgy/patch-3
...
Fixed conllu converter
2017-06-20 11:14:41 +02:00
Tpt
7745b3ae04
Adds noun chunks to French syntax iterators
2017-06-12 15:29:58 +02:00
Tpt
57e8254f63
Adds function to extract french noun chunks
2017-06-12 15:20:49 +02:00
György Orosz
62dbf9025c
Fixed conllu converter
2017-06-09 22:53:56 +02:00
Grégory Howard
cd974b32b7
Update _tokenizer_exceptions_list (adding cities)
2017-06-09 17:58:18 +02:00
ines
34a2eecb17
Add simple "naughty strings" test (see #1107 )
2017-06-06 17:43:51 +02:00
ines
045574a936
Update package name and increment version
2017-06-05 20:41:30 +02:00
Matthew Honnibal
1f5874a927
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 20:20:00 +02:00
ines
03db56f48c
Detect spaCy version and add package title
...
Package title allows customised package names (like spacy-nightly)
2017-06-05 20:11:02 +02:00
Matthew Honnibal
c0d90f52f7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 19:20:13 +02:00
ines
cc9c5dc7a3
Fix noun chunks test
2017-06-05 16:39:04 +02:00
Matthew Honnibal
836bfa2d0f
Add factory for experimental SimilarityHook component
2017-06-05 15:40:22 +02:00
Matthew Honnibal
d59fa32df1
Add experimental SimilarityHook omponent
2017-06-05 15:40:03 +02:00
Matthew Honnibal
5489b49203
Remove print statement
2017-06-05 13:20:41 +02:00
Matthew Honnibal
fc4204a12a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 13:13:23 +02:00
Matthew Honnibal
2479cde446
Support disable keyword in Language.__init__
2017-06-05 13:13:07 +02:00
ines
ea167e14db
Fix model package loading from link
2017-06-05 13:10:49 +02:00
ines
dd6dc4c120
Update spacy.load() helper functions
2017-06-05 13:02:31 +02:00
Matthew Honnibal
b4cdd05466
Add vectors.pyx in setup
2017-06-05 12:45:29 +02:00
Matthew Honnibal
280d419529
Add pickle method for vectors
2017-06-05 12:36:04 +02:00
Matthew Honnibal
30369d580f
Start testing Vectors class
2017-06-05 12:32:49 +02:00
Matthew Honnibal
eb7cbb62c2
Flesh out Vectors class
2017-06-05 12:32:08 +02:00
ines
51d7414e94
Make sure sents are a list
2017-06-05 12:30:13 +02:00
Matthew Honnibal
ebb6c49cd5
Make alignment case-insensitive for gold
2017-06-04 20:26:42 -05:00
Matthew Honnibal
fc4dd62e84
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 20:19:05 -05:00
Matthew Honnibal
8f8f90b46b
Disable labeller if not parsing
2017-06-04 20:18:54 -05:00
Matthew Honnibal
c52fde40f4
Improve train CLI
2017-06-04 20:18:37 -05:00
Matthew Honnibal
a053b1218e
Fix item counting during training
2017-06-04 20:18:20 -05:00
Matthew Honnibal
b3b5521625
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 20:17:18 -05:00
Matthew Honnibal
9bc4a26213
Add option of data augmentation noise
2017-06-04 20:16:57 -05:00
Matthew Honnibal
7b2ede783d
Add SP tag to tag map if missing
2017-06-04 20:16:30 -05:00
ines
a0f4592f0a
Update tests
2017-06-05 02:26:13 +02:00
ines
3e105bcd36
Update tests
2017-06-05 02:09:27 +02:00
Matthew Honnibal
516798e9fc
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 01:35:21 +02:00
Matthew Honnibal
193bf913c0
Set is_tagged=True after tagging
2017-06-05 01:35:07 +02:00
ines
078232932c
Fix tokenizer fixture scope
2017-06-05 01:06:34 +02:00
Matthew Honnibal
58be0e1f6f
Update tests
2017-06-04 16:35:06 -05:00
Matthew Honnibal
b78cc318c3
Fix loading of morphology exceptions
2017-06-04 16:34:32 -05:00
Matthew Honnibal
bb98d45a63
Fix tests
2017-06-04 16:00:44 -05:00
Matthew Honnibal
55d0621532
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 15:53:25 -05:00
Matthew Honnibal
5b9f116aca
Update tests
2017-06-04 15:53:17 -05:00
Matthew Honnibal
2a3bd5ee90
Fix fetching of noun chunk iterator
2017-06-04 15:53:05 -05:00
Matthew Honnibal
3680c51b8f
Avoid clobbering preset POS tags
2017-06-04 15:52:42 -05:00
Matthew Honnibal
939e8ed567
Add lookup properties for components in Language
2017-06-04 15:52:09 -05:00
Matthew Honnibal
e28f90b672
Fix syntax iterators
2017-06-04 15:51:50 -05:00
ines
8a29308d0b
Remove unused imports
2017-06-04 22:39:29 +02:00
Ines Montani
112c5787eb
Merge pull request #1101 from oroszgy/hu_tokenizer_fix
...
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines
96867a24ae
Fix typo
2017-06-04 22:36:40 +02:00
ines
f432bb4b48
Fix fixture scopes
2017-06-04 22:34:31 +02:00
Matthew Honnibal
6d0356e6cc
Whitespace
2017-06-04 14:55:24 -05:00
Matthew Honnibal
8a683a4494
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 21:53:56 +02:00
Matthew Honnibal
92ae36f84e
Improve way noun chunks iterator is looked up
2017-06-04 21:53:39 +02:00
ines
9254a3dd78
Import and add Spanish syntax iterators
2017-06-04 21:42:15 +02:00
ines
7db1a0e83e
Make sure printed values are always strings
2017-06-04 21:27:20 +02:00
Matthew Honnibal
51e1541ddb
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 14:26:29 -05:00
Matthew Honnibal
add9a33782
Return False for vocab.has_vector
2017-06-04 14:26:14 -05:00
Matthew Honnibal
675f448313
Fix vector linkage on Doc
2017-06-04 14:25:30 -05:00
Matthew Honnibal
f4662e9218
Fix vector linkage for token
2017-06-04 14:19:58 -05:00
ines
070e026ed9
Ensure path on read_json
2017-06-04 20:44:37 +02:00
ines
e1e73936b1
Raise correct error
2017-06-04 20:44:27 +02:00
ines
848e47669e
Fix typo
2017-06-04 20:44:15 +02:00
ines
c4614c02a2
Fix dev resources URL
2017-06-04 15:45:50 +02:00
ines
a66cf24ee8
xfail tokenizer serialization tests for now
...
Tests pass locally, but not on Travis – needs more investigation
2017-06-04 13:58:20 +02:00
ines
7b7d46b64e
Fix typo and success message
2017-06-04 13:45:50 +02:00
ines
90d117f378
Update version
2017-06-04 13:41:16 +02:00
Matthew Honnibal
7ca215bc26
Resolve lex_attr_getters conflict
2017-06-03 16:12:01 -05:00
Matthew Honnibal
21eef90dbc
Support specifying which GPU
2017-06-03 16:10:23 -05:00
Matthew Honnibal
d0e42f9275
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-03 15:30:32 -05:00
Matthew Honnibal
8a17b99b1c
Use NORM attribute, not LOWER
2017-06-03 15:30:16 -05:00
ines
4c643d74c5
Add norm exceptions to other Language classes
2017-06-03 22:29:21 +02:00
ines
fa7e576c57
Change order of exception dicts
2017-06-03 21:52:06 +02:00
Matthew Honnibal
3f5c85d8de
Reorder setting of lex attrs, to avoid clobbering
2017-06-03 14:47:55 -05:00
Matthew Honnibal
aeb7520133
Make norm use lower-case
2017-06-03 14:47:38 -05:00
Matthew Honnibal
de3954843e
Populate norm exceptions with lower-case
2017-06-03 14:47:12 -05:00
Matthew Honnibal
f6955a459c
Fix prev commit
2017-06-03 14:38:37 -05:00
Matthew Honnibal
468ca6c760
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-03 14:33:51 -05:00
Matthew Honnibal
c647a0d33e
Fix training counter for gold preprocessing
2017-06-03 14:33:39 -05:00
ines
e47eef5e03
Update German tokenizer exceptions and tests
2017-06-03 21:07:44 +02:00
ines
d77c2cc8bb
Add tests for English norm exceptions
2017-06-03 20:59:50 +02:00
ines
0d6fa8b241
Add German norm exceptions
2017-06-03 20:54:18 +02:00
ines
5bd311c77e
Fix update of norm exceptions
2017-06-03 20:54:09 +02:00
Matthew Honnibal
94e063ae2a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-03 13:31:40 -05:00
Matthew Honnibal
fea1144e6d
Set max batch size in evaluate
2017-06-03 13:31:33 -05:00
Matthew Honnibal
805495af27
Fix off-by-one in number of tags
2017-06-03 13:29:23 -05:00
Matthew Honnibal
e62f46d39f
Clarify gold.pyx slightly
2017-06-03 13:28:52 -05:00
Matthew Honnibal
43353b5413
Improve train CLI script
2017-06-03 13:28:20 -05:00
ines
746653880c
Add English norm exceptions to lex_attrs
2017-06-03 20:27:28 +02:00
ines
095eeeb12f
Update English tokenizer exceptions and add norms
2017-06-03 20:27:16 +02:00
ines
e5d426406a
Add base norm exceptions
2017-06-03 20:27:05 +02:00
ines
4c2bbc3ccc
Add add_lookups util function
2017-06-03 19:44:47 +02:00
ines
05fe6758a7
Set lexeme attributes for tokenizer special cases
2017-06-03 19:44:39 +02:00
ines
3152ee5ca2
Update serialization tests for tokenizer
2017-06-03 17:05:28 +02:00
ines
7c919aeb09
Make sure serializers and deserializers are ordered
2017-06-03 17:05:09 +02:00
ines
1ebd0d3f27
Add assert_packed_msg_equal util function
2017-06-03 17:04:30 +02:00
ines
de974f7bef
Add serializer tests for tokenizer
2017-06-03 13:26:34 +02:00
ines
0153b66a86
Return self in Tokenizer.from_bytes
2017-06-03 13:26:13 +02:00
ines
82154a1861
Add letter spacing to arrow label
2017-06-03 13:25:41 +02:00
ines
32c6f05de9
Adjust spacing and sizing in compact mode
2017-06-03 13:25:32 +02:00
ines
cc8c8617a4
Shut down displaCy server on KeyboardInterrupt
2017-06-03 13:24:56 +02:00
ines
70fbba7d08
Clone Doc to never merge punctuation on original Doc
2017-06-03 13:24:43 +02:00
ines
459a1e8470
Fix whitespace
2017-06-03 11:31:18 +02:00
ines
5109bba910
Port over fix from #1070
2017-06-03 11:31:11 +02:00
ines
d21459f87d
Update serializer tests
2017-06-02 21:42:26 +02:00
ines
6669583f4e
Use OrderedDict
2017-06-02 21:07:56 +02:00
ines
2f1025a94c
Port over Spanish changes from #1096
2017-06-02 19:09:58 +02:00
ines
d86e7cde93
Add entity recognizer to parser serialization tests
2017-06-02 18:40:06 +02:00
ines
0051c05964
Add tests for serializing parser
2017-06-02 18:37:19 +02:00
ines
fdd0923be4
Translate model=True in exclude to lower_model and upper_model
2017-06-02 18:37:07 +02:00
ines
cef547a9f0
Add serialization tests for tensorizer
2017-06-02 18:18:30 +02:00
ines
924c58bde3
Fix serialization of optional elements
2017-06-02 18:18:17 +02:00
ines
f74a45c1fe
Remove unnecessary argument
2017-06-02 18:17:46 +02:00
ines
43b4d63f85
Add serialization tests for tagger
2017-06-02 17:29:34 +02:00
ines
1b593bbd6d
Fix encoding on tagger serialization
2017-06-02 17:29:21 +02:00
Matthew Honnibal
5f4d328e2c
Fix serialization of tag_map in NeuralTagger
2017-06-02 10:18:37 -05:00
Matthew Honnibal
ed6f575e06
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-02 04:26:39 -05:00
ines
acd65c00f6
Add serialization tests for StringStore and Vocab
2017-06-02 10:57:42 +02:00
ines
41a6adf1f6
Initialise Vocab length correctly
2017-06-02 10:57:25 +02:00
ines
53b82f972a
Add strings to Vocab in init, instead of StringStore
2017-06-02 10:57:06 +02:00
ines
023f38bdd4
Fix return value of Vocab.from_bytes
2017-06-02 10:56:40 +02:00
ines
9692c98f57
Add test utils for temp file and temp dir
2017-06-02 10:56:09 +02:00
Matthew Honnibal
c650bc481c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-01 13:03:57 -05:00
Matthew Honnibal
307d615c5f
Fix serialization for tagger when tag_map has changed
2017-06-01 12:18:36 -05:00
Matthew Honnibal
1d18cedae8
Fiddle with msgpack bytes vs unicode
2017-06-01 10:48:43 -05:00
ines
7a2380f617
Rename "nn_tagger" to "tagger"
2017-06-01 17:37:53 +02:00
ines
e5ae6ccf4e
Fix typo
2017-06-01 16:46:15 +02:00
ines
a3e4f91f4a
Only load vocab if it exists
2017-06-01 14:38:35 +02:00
Matthew Honnibal
d310b0aab3
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-01 04:58:03 -05:00
Matthew Honnibal
3ff7d7fcef
Merge for updated requirements
2017-06-01 04:57:47 -05:00
Matthew Honnibal
5eae3b9a1e
Fix to/from disk in tagger
2017-06-01 04:55:49 -05:00
ines
d5c8d2f5fd
Update about.py and increment version
2017-06-01 11:52:24 +02:00
Matthew Honnibal
4c97371051
Fixes for thinc 6.7
2017-06-01 04:22:16 -05:00
Matthew Honnibal
53d00a0371
Move weight serialization to Thinc
2017-06-01 03:04:36 -05:00
Matthew Honnibal
ae8010b526
Move weight serialization to Thinc
2017-06-01 02:56:12 -05:00
Gyorgy Orosz
f0c3b09242
More robust Hungarian tokenizer.
2017-05-31 22:28:40 +02:00
Matthew Honnibal
c8a58cfcf8
Fix Python2/3 load bug
2017-05-31 15:21:44 -05:00
Matthew Honnibal
99982684b0
Fix normalize_string_keys function'
2017-05-31 14:08:16 -05:00
Matthew Honnibal
67ade63fc4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-31 08:28:42 -05:00
Matthew Honnibal
490b38e6bb
Fix reference to thinc copy_array util
2017-05-31 08:25:21 -05:00
Matthew Honnibal
9805e0e369
Fix vocab pickling
2017-05-31 08:25:01 -05:00
Matthew Honnibal
6c51cd77b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-31 15:06:56 +02:00
Matthew Honnibal
8dfb9546f0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-31 07:21:14 -05:00
Matthew Honnibal
480ef8bfc8
Add compat function to normalize dict keys
2017-05-31 07:14:29 -05:00
Matthew Honnibal
92f9e5cc9a
Silence env_opt, and fix serialization for GPU
2017-05-31 07:14:11 -05:00
Matthew Honnibal
0561df2a9d
Fix tokenizer serialization
2017-05-31 14:12:38 +02:00
Matthew Honnibal
4a398c15b7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-31 13:44:16 +02:00
Matthew Honnibal
097ab9c6e4
Fix transition system to/from disk
2017-05-31 13:44:00 +02:00
Matthew Honnibal
b1469d3360
Fix string serialisation
2017-05-31 13:43:44 +02:00
Matthew Honnibal
e9419072e7
Fix tokenizer serialisation
2017-05-31 13:43:31 +02:00
Matthew Honnibal
33e5ec737f
Fix to/from disk methods
2017-05-31 13:43:10 +02:00
ines
5e1c361270
Update tests README with info on model tests
2017-05-31 12:22:58 +02:00
Matthew Honnibal
fe28602f2e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-31 11:43:56 +02:00
Matthew Honnibal
66af019d5d
Fix serialization of tokenizer
2017-05-31 11:43:40 +02:00
Ines Montani
e6cf3c7e1c
Merge pull request #1093 from oroszgy/hu_emoji_fix
...
Fixed emoji handling for Hungarian
2017-05-31 11:33:24 +02:00
Matthew Honnibal
e98eff275d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-31 10:29:15 +02:00
Matthew Honnibal
53a3824334
Fix mistake in ner feature
2017-05-31 03:01:02 +02:00
Matthew Honnibal
8a693c2605
Write binary file during training
2017-05-31 02:59:18 +02:00
Matthew Honnibal
498ad85309
Try using tensor for vector/similarity methdos
2017-05-30 23:35:17 +02:00
Matthew Honnibal
a131981f3b
Work on vectors
2017-05-30 23:34:50 +02:00
Matthew Honnibal
6937e311a4
Update doc tests
2017-05-30 23:34:23 +02:00
Matthew Honnibal
cc911feab2
Fix bug in NER state
2017-05-30 22:12:19 +02:00
Gyorgy Orosz
8c0b4b850e
Fixed emoji handling for Hungarian
2017-05-30 21:34:46 +02:00
Matthew Honnibal
be4a640f0c
Fix arc eager label costs for uint64
2017-05-30 20:37:58 +02:00
Matthew Honnibal
b127645afc
Fix test_misc merge conflict
2017-05-29 18:31:44 -05:00
Matthew Honnibal
e0e8eae7c7
Tweak package test
2017-05-29 18:30:42 -05:00
Matthew Honnibal
11840ff5dd
Store tag map before normalizing props
2017-05-29 17:53:48 -05:00
Matthew Honnibal
b92a89f87b
Make it easier to reference embedding tables
2017-05-29 17:53:29 -05:00
Matthew Honnibal
293d1b425b
Serialize in consistent order
2017-05-29 17:53:06 -05:00
Matthew Honnibal
9bf22a94aa
Fix tag set serialisation
2017-05-29 17:52:36 -05:00
Matthew Honnibal
2a061e2777
Fix serialisation, for reals this time
2017-05-29 17:52:08 -05:00
ines
20a7003c0d
Update model fixtures and reorganise tests
2017-05-29 22:14:31 +02:00
ines
795fe43a4d
Add load_test_model function with importorskip()
...
Loads model only if it can be imported, i.e. if it's installed as a
package.
2017-05-29 22:11:31 +02:00
ines
ad3c8b3ad9
Fix formatting
2017-05-29 22:10:50 +02:00
ines
6e3937efc5
Check for arguments of model markers to specify models to test
...
Lets user set --models --en for only English models
2017-05-29 22:10:16 +02:00
Matthew Honnibal
35d981241f
Fix model deserialization
2017-05-29 14:46:31 -05:00
Matthew Honnibal
5b29f227ae
Fix serialization
2017-05-29 14:35:53 -05:00
Matthew Honnibal
1e6df0a2a1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-29 14:30:12 -05:00
ines
08382f21e3
Pass model meta to nlp object in load_model
2017-05-29 20:44:11 +02:00
ines
6145fe6a93
Catch all kwargs on Language
2017-05-29 20:43:48 +02:00
ines
0d7d50fe22
Add __version__ to __init__.py
2017-05-29 20:43:24 +02:00
Matthew Honnibal
6522ea6c8b
More serialization fixes. Still broken
2017-05-29 13:23:47 -05:00
Matthew Honnibal
9c9ee24411
Fix broken lambda scoping in Python 2
2017-05-29 13:23:28 -05:00
Matthew Honnibal
f1acdaab55
Fix serialization of weight offsets
2017-05-29 13:23:11 -05:00
Matthew Honnibal
c044e9c21c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-29 08:41:02 -05:00
Matthew Honnibal
aa4c33914b
Work on serialization
2017-05-29 08:40:45 -05:00
ines
9e83a17e95
Use new model templates
2017-05-29 15:27:24 +02:00
ines
567485a818
Fix and document model loading with pipeline and overrides
2017-05-29 14:10:10 +02:00
Matthew Honnibal
deac7eb01c
Fix for serialization
2017-05-29 13:54:18 +02:00
Matthew Honnibal
04c32aa091
Fix for serialization
2017-05-29 13:53:32 +02:00
Matthew Honnibal
a1960c2d09
Fix for serialization
2017-05-29 13:47:42 +02:00
Matthew Honnibal
7b06bb896e
Fix for serialization
2017-05-29 13:42:55 +02:00
Matthew Honnibal
74235587ef
Fix to serialization
2017-05-29 13:40:31 +02:00
Matthew Honnibal
59f355d525
Fixes for serialization
2017-05-29 13:38:20 +02:00
Matthew Honnibal
920887f4e4
Specify order of vocab deserialization
2017-05-29 13:04:40 +02:00
Matthew Honnibal
f4aafca222
Merge changes to test_misc
2017-05-29 12:26:02 +02:00
Matthew Honnibal
a318f0cae1
Add to/from disk/bytes methods for tokenizer
2017-05-29 12:24:41 +02:00
Matthew Honnibal
ff26aa6c37
Work on to/from bytes/disk serialization methods
2017-05-29 11:45:45 +02:00
ines
df920ba0e7
Add tests for displaCy and util functions and fix util typo
2017-05-29 10:51:19 +02:00
ines
c5714d4fb2
xfail matcher test for now until setting norm via Span.merge works
2017-05-29 10:51:02 +02:00
Matthew Honnibal
6b019b0540
Update to/from bytes methods
2017-05-29 10:14:20 +02:00
Matthew Honnibal
c91b121aeb
Move serialization functions to util
2017-05-29 10:13:42 +02:00
Matthew Honnibal
1fa2bfb600
Add model_to_bytes and model_from_bytes helpers. Probably belong in thinc.
2017-05-29 09:27:04 +02:00
Matthew Honnibal
6dad4117ad
Work on serialization for models
2017-05-29 01:37:57 +02:00
ines
7b1ddcc04d
Add test for vocab serialization
2017-05-29 01:09:52 +02:00
ines
00b2094dc3
Fix typos, long integers and tests
2017-05-29 01:09:52 +02:00
ines
804dbb8d25
Add StringStore test for API docs
2017-05-29 01:09:52 +02:00
Matthew Honnibal
6cd5730ee7
Fix lex struct setters for strings
2017-05-29 01:05:09 +02:00
Matthew Honnibal
2edd96ce47
Draft Vocab to/from disk/bytes
2017-05-28 23:34:12 +02:00
Matthew Honnibal
4ddff020c3
Fix compile error
2017-05-28 23:30:40 +02:00
Matthew Honnibal
6d3caeadd2
Fix type check for long
2017-05-28 23:22:45 +02:00
Matthew Honnibal
92dbf28c1e
Hack a fixture in the vectors tests, for xfail
2017-05-28 20:28:32 +02:00
Matthew Honnibal
9239f06ed3
Fix german noun chunks iterator
2017-05-28 20:13:03 +02:00
Matthew Honnibal
fd9b6722a9
Fix noun chunks iterator for new stringstore
2017-05-28 20:12:10 +02:00
ines
414193e9ba
Update docs to reflect StringStore changes
2017-05-28 18:19:11 +02:00
Matthew Honnibal
7996d21717
Fixes for new StringStore
2017-05-28 11:09:27 -05:00
Matthew Honnibal
8a24c60c1e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-28 08:12:05 -05:00
Matthew Honnibal
bc97bc292c
Fix __call__ method
2017-05-28 08:11:58 -05:00
Matthew Honnibal
5cf47b847b
Handle iob with no tag in converter
2017-05-28 08:11:39 -05:00
Matthew Honnibal
fe11564b8e
Finish stringstore change. Also xfail vectors tests
2017-05-28 15:10:22 +02:00
Matthew Honnibal
b007a2b0d3
Update stringstore tests
2017-05-28 14:08:09 +02:00
Matthew Honnibal
84e66ca6d4
WIP on stringstore change. 27 failures
2017-05-28 14:06:40 +02:00
Matthew Honnibal
fe4a746300
Accomodate symbols in new string scheme
2017-05-28 13:03:16 +02:00
Matthew Honnibal
f51e6a6c16
Adjust lexeme sizing for attr_t being 64 bit
2017-05-28 12:51:09 +02:00
Matthew Honnibal
a5606c3eda
Work on changing StringStore to return hashes.
2017-05-28 12:36:27 +02:00
Matthew Honnibal
39293ab2ee
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-28 11:46:57 +02:00
Matthew Honnibal
dd052572d4
Update arc eager for SBD changes
2017-05-28 11:46:51 +02:00
Matthew Honnibal
3ea98e2043
Remove vector member from lexeme
2017-05-28 11:46:24 +02:00
Matthew Honnibal
2445707f3c
Re-delegate vectors to vocab
2017-05-28 11:46:10 +02:00
Matthew Honnibal
6863d01361
Remove vectors from lexeme
2017-05-28 11:45:48 +02:00
Matthew Honnibal
15f6efc127
Remove vectors from vocab
2017-05-28 11:45:32 +02:00
Matthew Honnibal
c1263a844b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-27 18:32:57 -05:00
Matthew Honnibal
9e711c3476
Divide d_loss by batch size
2017-05-27 18:32:46 -05:00
Matthew Honnibal
b082f76494
Randomize pipeline order during training
2017-05-27 18:32:21 -05:00