Commit Graph

3424 Commits

Author SHA1 Message Date
ines
8a29308d0b Remove unused imports 2017-06-04 22:39:29 +02:00
Ines Montani
112c5787eb Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines
96867a24ae Fix typo 2017-06-04 22:36:40 +02:00
ines
f432bb4b48 Fix fixture scopes 2017-06-04 22:34:31 +02:00
Matthew Honnibal
6d0356e6cc Whitespace 2017-06-04 14:55:24 -05:00
Matthew Honnibal
8a683a4494 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 21:53:56 +02:00
Matthew Honnibal
92ae36f84e Improve way noun chunks iterator is looked up 2017-06-04 21:53:39 +02:00
ines
9254a3dd78 Import and add Spanish syntax iterators 2017-06-04 21:42:15 +02:00
ines
7db1a0e83e Make sure printed values are always strings 2017-06-04 21:27:20 +02:00
Matthew Honnibal
51e1541ddb Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 14:26:29 -05:00
Matthew Honnibal
add9a33782 Return False for vocab.has_vector 2017-06-04 14:26:14 -05:00
Matthew Honnibal
675f448313 Fix vector linkage on Doc 2017-06-04 14:25:30 -05:00
Matthew Honnibal
f4662e9218 Fix vector linkage for token 2017-06-04 14:19:58 -05:00
ines
070e026ed9 Ensure path on read_json 2017-06-04 20:44:37 +02:00
ines
e1e73936b1 Raise correct error 2017-06-04 20:44:27 +02:00
ines
848e47669e Fix typo 2017-06-04 20:44:15 +02:00
ines
c4614c02a2 Fix dev resources URL 2017-06-04 15:45:50 +02:00
ines
a66cf24ee8 xfail tokenizer serialization tests for now
Tests pass locally, but not on Travis – needs more investigation
2017-06-04 13:58:20 +02:00
ines
7b7d46b64e Fix typo and success message 2017-06-04 13:45:50 +02:00
ines
90d117f378 Update version 2017-06-04 13:41:16 +02:00
Matthew Honnibal
7ca215bc26 Resolve lex_attr_getters conflict 2017-06-03 16:12:01 -05:00
Matthew Honnibal
21eef90dbc Support specifying which GPU 2017-06-03 16:10:23 -05:00
Matthew Honnibal
d0e42f9275 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-03 15:30:32 -05:00
Matthew Honnibal
8a17b99b1c Use NORM attribute, not LOWER 2017-06-03 15:30:16 -05:00
ines
4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines
fa7e576c57 Change order of exception dicts 2017-06-03 21:52:06 +02:00
Matthew Honnibal
3f5c85d8de Reorder setting of lex attrs, to avoid clobbering 2017-06-03 14:47:55 -05:00
Matthew Honnibal
aeb7520133 Make norm use lower-case 2017-06-03 14:47:38 -05:00
Matthew Honnibal
de3954843e Populate norm exceptions with lower-case 2017-06-03 14:47:12 -05:00
Matthew Honnibal
f6955a459c Fix prev commit 2017-06-03 14:38:37 -05:00
Matthew Honnibal
468ca6c760 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-03 14:33:51 -05:00
Matthew Honnibal
c647a0d33e Fix training counter for gold preprocessing 2017-06-03 14:33:39 -05:00
ines
e47eef5e03 Update German tokenizer exceptions and tests 2017-06-03 21:07:44 +02:00
ines
d77c2cc8bb Add tests for English norm exceptions 2017-06-03 20:59:50 +02:00
ines
0d6fa8b241 Add German norm exceptions 2017-06-03 20:54:18 +02:00
ines
5bd311c77e Fix update of norm exceptions 2017-06-03 20:54:09 +02:00
Matthew Honnibal
94e063ae2a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-03 13:31:40 -05:00
Matthew Honnibal
fea1144e6d Set max batch size in evaluate 2017-06-03 13:31:33 -05:00
Matthew Honnibal
805495af27 Fix off-by-one in number of tags 2017-06-03 13:29:23 -05:00
Matthew Honnibal
e62f46d39f Clarify gold.pyx slightly 2017-06-03 13:28:52 -05:00
Matthew Honnibal
43353b5413 Improve train CLI script 2017-06-03 13:28:20 -05:00
ines
746653880c Add English norm exceptions to lex_attrs 2017-06-03 20:27:28 +02:00
ines
095eeeb12f Update English tokenizer exceptions and add norms 2017-06-03 20:27:16 +02:00
ines
e5d426406a Add base norm exceptions 2017-06-03 20:27:05 +02:00
ines
4c2bbc3ccc Add add_lookups util function 2017-06-03 19:44:47 +02:00
ines
05fe6758a7 Set lexeme attributes for tokenizer special cases 2017-06-03 19:44:39 +02:00
ines
3152ee5ca2 Update serialization tests for tokenizer 2017-06-03 17:05:28 +02:00
ines
7c919aeb09 Make sure serializers and deserializers are ordered 2017-06-03 17:05:09 +02:00
ines
1ebd0d3f27 Add assert_packed_msg_equal util function 2017-06-03 17:04:30 +02:00
ines
de974f7bef Add serializer tests for tokenizer 2017-06-03 13:26:34 +02:00
ines
0153b66a86 Return self in Tokenizer.from_bytes 2017-06-03 13:26:13 +02:00
ines
82154a1861 Add letter spacing to arrow label 2017-06-03 13:25:41 +02:00
ines
32c6f05de9 Adjust spacing and sizing in compact mode 2017-06-03 13:25:32 +02:00
ines
cc8c8617a4 Shut down displaCy server on KeyboardInterrupt 2017-06-03 13:24:56 +02:00
ines
70fbba7d08 Clone Doc to never merge punctuation on original Doc 2017-06-03 13:24:43 +02:00
ines
459a1e8470 Fix whitespace 2017-06-03 11:31:18 +02:00
ines
5109bba910 Port over fix from #1070 2017-06-03 11:31:11 +02:00
ines
d21459f87d Update serializer tests 2017-06-02 21:42:26 +02:00
ines
6669583f4e Use OrderedDict 2017-06-02 21:07:56 +02:00
ines
2f1025a94c Port over Spanish changes from #1096 2017-06-02 19:09:58 +02:00
ines
d86e7cde93 Add entity recognizer to parser serialization tests 2017-06-02 18:40:06 +02:00
ines
0051c05964 Add tests for serializing parser 2017-06-02 18:37:19 +02:00
ines
fdd0923be4 Translate model=True in exclude to lower_model and upper_model 2017-06-02 18:37:07 +02:00
ines
cef547a9f0 Add serialization tests for tensorizer 2017-06-02 18:18:30 +02:00
ines
924c58bde3 Fix serialization of optional elements 2017-06-02 18:18:17 +02:00
ines
f74a45c1fe Remove unnecessary argument 2017-06-02 18:17:46 +02:00
ines
43b4d63f85 Add serialization tests for tagger 2017-06-02 17:29:34 +02:00
ines
1b593bbd6d Fix encoding on tagger serialization 2017-06-02 17:29:21 +02:00
Matthew Honnibal
5f4d328e2c Fix serialization of tag_map in NeuralTagger 2017-06-02 10:18:37 -05:00
Matthew Honnibal
ed6f575e06 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-02 04:26:39 -05:00
ines
acd65c00f6 Add serialization tests for StringStore and Vocab 2017-06-02 10:57:42 +02:00
ines
41a6adf1f6 Initialise Vocab length correctly 2017-06-02 10:57:25 +02:00
ines
53b82f972a Add strings to Vocab in init, instead of StringStore 2017-06-02 10:57:06 +02:00
ines
023f38bdd4 Fix return value of Vocab.from_bytes 2017-06-02 10:56:40 +02:00
ines
9692c98f57 Add test utils for temp file and temp dir 2017-06-02 10:56:09 +02:00
Matthew Honnibal
c650bc481c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-01 13:03:57 -05:00
Matthew Honnibal
307d615c5f Fix serialization for tagger when tag_map has changed 2017-06-01 12:18:36 -05:00
Matthew Honnibal
1d18cedae8 Fiddle with msgpack bytes vs unicode 2017-06-01 10:48:43 -05:00
ines
7a2380f617 Rename "nn_tagger" to "tagger" 2017-06-01 17:37:53 +02:00
ines
e5ae6ccf4e Fix typo 2017-06-01 16:46:15 +02:00
ines
a3e4f91f4a Only load vocab if it exists 2017-06-01 14:38:35 +02:00
Matthew Honnibal
d310b0aab3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-01 04:58:03 -05:00
Matthew Honnibal
3ff7d7fcef Merge for updated requirements 2017-06-01 04:57:47 -05:00
Matthew Honnibal
5eae3b9a1e Fix to/from disk in tagger 2017-06-01 04:55:49 -05:00
ines
d5c8d2f5fd Update about.py and increment version 2017-06-01 11:52:24 +02:00
Matthew Honnibal
4c97371051 Fixes for thinc 6.7 2017-06-01 04:22:16 -05:00
Matthew Honnibal
53d00a0371 Move weight serialization to Thinc 2017-06-01 03:04:36 -05:00
Matthew Honnibal
ae8010b526 Move weight serialization to Thinc 2017-06-01 02:56:12 -05:00
Gyorgy Orosz
f0c3b09242 More robust Hungarian tokenizer. 2017-05-31 22:28:40 +02:00
Matthew Honnibal
c8a58cfcf8 Fix Python2/3 load bug 2017-05-31 15:21:44 -05:00
Matthew Honnibal
99982684b0 Fix normalize_string_keys function' 2017-05-31 14:08:16 -05:00
Matthew Honnibal
67ade63fc4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 08:28:42 -05:00
Matthew Honnibal
490b38e6bb Fix reference to thinc copy_array util 2017-05-31 08:25:21 -05:00
Matthew Honnibal
9805e0e369 Fix vocab pickling 2017-05-31 08:25:01 -05:00
Matthew Honnibal
6c51cd77b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 15:06:56 +02:00
Matthew Honnibal
8dfb9546f0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 07:21:14 -05:00
Matthew Honnibal
480ef8bfc8 Add compat function to normalize dict keys 2017-05-31 07:14:29 -05:00
Matthew Honnibal
92f9e5cc9a Silence env_opt, and fix serialization for GPU 2017-05-31 07:14:11 -05:00
Matthew Honnibal
0561df2a9d Fix tokenizer serialization 2017-05-31 14:12:38 +02:00
Matthew Honnibal
4a398c15b7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-31 13:44:16 +02:00