Commit Graph

4573 Commits

Author SHA1 Message Date
Mathias Deschamps
288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
Matthew Honnibal
6e641f46d4 Create a preprocess function that gets bigrams 2017-11-12 00:43:41 +01:00
Matthew Honnibal
c9251d79e3
Edit comment 2017-11-11 18:38:32 +01:00
Matthew Honnibal
dd1678eab3
Edit comment 2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09 Remove unused import 2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
ines
35653bef3a Add missing import (fixes #1546) 2017-11-10 19:05:18 +01:00
ines
4c5d2c80d5 Re-add python -m to commands, too brittle :( (see #1536) 2017-11-10 02:30:55 +01:00
ines
123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
49fd5a646f Set version for 2.0.2 release 2017-11-08 22:39:39 +01:00
Matthew Honnibal
fba2dbddf7 Increment version 2017-11-08 22:19:08 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
de45702bbe Strip dev suffixes from version for compatibility check 2017-11-08 18:40:21 +01:00
Matthew Honnibal
51639214a1 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 18:04:33 +01:00
Matthew Honnibal
a2f980de4e Exclude .devN versioning from compatibility check 2017-11-08 18:03:52 +01:00
Daniel Hershcovich
d7ae54ff44
Fix typo in message 2017-11-08 16:06:28 +02:00
Matthew Honnibal
4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
Matthew Honnibal
d5537e5516 Work on Windows test failure 2017-11-08 13:25:18 +01:00
Matthew Honnibal
c27c82d5f9 Fix serialization 2017-11-08 13:08:48 +01:00
Matthew Honnibal
1d5599cd28 Fix dtype 2017-11-08 12:18:32 +01:00
Matthew Honnibal
fa7fdd0d9b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 12:11:31 +01:00
Matthew Honnibal
072ff38a01 Try to fix python3.5 serialization 2017-11-08 12:10:49 +01:00
Ines Montani
3a0f34d567
Merge pull request #1509 from abhi18av/patch-1
Create examples.py for Hindi language
2017-11-08 11:37:19 +01:00
Ines Montani
42b241ccd0
Update language code in usage example in comment 2017-11-08 11:36:38 +01:00
Matthew Honnibal
e262e8d942 Increment version to v2.0.2.dev0 2017-11-08 11:25:47 +01:00
Matthew Honnibal
a8b592783b Make a dtype more specific, to fix a windows build 2017-11-08 11:24:35 +01:00
Abhinav Sharma
84edade82d
Create examples.py
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
Matthew Honnibal
d725aee4e2 Increment version to 2.0.1 2017-11-08 02:14:47 +01:00
Matthew Honnibal
8d6f68f1df Increment version 2017-11-08 01:12:34 +01:00
ines
bcf42b8846 Fix typo 2017-11-08 01:06:37 +01:00
Matthew Honnibal
bbd2a3dee1 Fix title in about.py 2017-11-07 14:02:58 +01:00
Matthew Honnibal
4efaf9306c Set version to spacy-nightly rc2 2017-11-07 13:27:26 +01:00
Matthew Honnibal
bf1ec2965f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-07 13:20:29 +01:00
Matthew Honnibal
726f689da4 Fix missing import 2017-11-07 13:20:12 +01:00
ines
834f9c1aab Update about.py 2017-11-07 13:11:33 +01:00
ines
a4662a31a9 Move model package templates to cli.package and update docs 2017-11-07 12:15:35 +01:00
ines
a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
Matthew Honnibal
9a88e66103 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-07 02:00:06 +01:00
Matthew Honnibal
174abe4677 Increment to 2.0.0rc1 2017-11-07 01:59:46 +01:00
ines
42a0fbf291 Fix textcat simple train example 2017-11-07 01:25:54 +01:00
ines
8fb48b9b91 Update and document new util functions 2017-11-07 00:22:43 +01:00
Matthew Honnibal
1cab703bba Move minibatch function to util 2017-11-06 23:45:36 +01:00
ines
5f43953536 Move test 2017-11-06 23:14:10 +01:00
Matthew Honnibal
dd90fe09f5 Remove extraneous label from textcat class 2017-11-06 22:09:02 +01:00
Matthew Honnibal
45e0617e61 Allow Language.update to take unicode text and dict objects 2017-11-06 22:07:38 +01:00
Matthew Honnibal
1831dbd065 Add test of simple textcat workflow 2017-11-06 22:04:29 +01:00
Matthew Honnibal
ffb9101f3f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-06 19:20:41 +01:00
Matthew Honnibal
8fea512ac8 Don't set tensor in textcat 2017-11-06 19:20:14 +01:00
ines
acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
Matthew Honnibal
7d46793dd7 Add PRON_LEMMA to spacy.symbols 2017-11-06 17:38:25 +01:00
Matthew Honnibal
2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
Matthew Honnibal
75e1618ec3 Fix lemma clobbering 2017-11-06 16:56:19 +01:00
Matthew Honnibal
6fdffd7246
Merge pull request #1497 from explosion/feature/improve-optimizer-handling
💫 Improve optimizer handling
2017-11-06 16:41:15 +01:00
Matthew Honnibal
8e6795437b Set release=True 2017-11-06 16:39:32 +01:00
Matthew Honnibal
5c85bf3791 Fix missing import 2017-11-06 15:06:27 +01:00
Matthew Honnibal
25859dbb48 Return optimizer from begin_training, creating if necessary 2017-11-06 14:26:49 +01:00
Matthew Honnibal
465adfee94 Remove unused resume_training method, and pass optimizer through 2017-11-06 14:26:00 +01:00
Matthew Honnibal
13336a6197 Fix Adam import 2017-11-06 14:25:37 +01:00
Matthew Honnibal
2eb11d60f2 Add function create_default_optimizer to spacy._ml 2017-11-06 14:11:59 +01:00
Matthew Honnibal
31babe3c3f Fix non-clobbering lemmatization 2017-11-06 12:36:05 +01:00
Matthew Honnibal
63c6ae4191 Fix lemmatizer test 2017-11-06 11:57:06 +01:00
Matthew Honnibal
a86a0181b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 22:19:10 +01:00
Matthew Honnibal
134d3b8143 Fix morphology 2017-11-05 22:18:22 +01:00
ines
08d1cf850a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 21:41:58 +01:00
ines
baa231745c Fix Dutch tag map 2017-11-05 21:41:50 +01:00
Matthew Honnibal
46e62ad747 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 19:40:00 +01:00
Matthew Honnibal
bb25cb0f76 Avoid clobbering preset lemmas 2017-11-05 19:39:38 +01:00
ines
507ecb67af Fix Spanish tag map 2017-11-05 19:23:34 +01:00
Matthew Honnibal
320008352b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 18:46:15 +01:00
Matthew Honnibal
38109a0e4a Register SentenceSegmenter in Language.factories 2017-11-05 18:45:57 +01:00
ines
975e1042ff Fix Italian tag map 2017-11-05 18:34:09 +01:00
ines
6b2d6e4937 Fix Portuguese tag map 2017-11-05 18:31:00 +01:00
ines
fa2687fded Fix Dutch tag map 2017-11-05 17:57:59 +01:00
ines
fb8990d916 Fix Spanish tag map 2017-11-05 17:48:46 +01:00
ines
9d13288f73 Fix French tag map 2017-11-05 17:47:59 +01:00
ines
54579805c5 Fix French tag map 2017-11-05 17:44:05 +01:00
Matthew Honnibal
2b35bb76ad Fix tensorizer on GPU 2017-11-05 15:34:40 +01:00
Matthew Honnibal
6e5181bbaa Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 15:33:56 +01:00
Matthew Honnibal
6f438b17c1 Increment version to v2.0.0a19 2017-11-05 14:43:36 +01:00
Matthew Honnibal
225cc249c9 Pass string path to numpy, to fix #1479 2017-11-05 14:42:46 +01:00
Matthew Honnibal
00435d8f0c Add extra beam parsing test 2017-11-05 14:39:57 +01:00
Matthew Honnibal
e777ea25bb
Merge pull request #1492 from uwol/develop
TextCategorizer return parameter fix
2017-11-05 14:13:04 +01:00
Matthew Honnibal
0d4bd6414e Fix Italian tag map 2017-11-05 14:11:03 +01:00
ines
ef597622a6 Add Portuguese tag map 2017-11-05 13:58:34 +01:00
ines
793c62dfda Add Dutch tag map 2017-11-05 13:48:07 +01:00
ines
f7485a09c8 Fix Italian tag map 2017-11-05 13:12:58 +01:00
uwol
a2162b8908 tensorizer return parameter fix 2017-11-05 12:25:10 +01:00
ines
0a27afbf86 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:32:52 +01:00
ines
3cef901834 Add tag map for French and Italian 2017-11-04 23:32:51 +01:00
Matthew Honnibal
cfb83c231c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:08:19 +01:00
Matthew Honnibal
d185927998 Undo harmful pickling hacks on Language class 2017-11-04 23:07:03 +01:00
ines
6c15aafebd Fix formatting 2017-11-04 23:07:02 +01:00
Matthew Honnibal
3ca16ddbd4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 00:25:02 +01:00
Matthew Honnibal
e4ec4be948 Fix parser test 2017-11-04 00:23:45 +01:00
Matthew Honnibal
98c29b7912 Add padding vector in parser, to make gradient more correct 2017-11-04 00:23:23 +01:00
ines
5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines
718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal
1e9634691a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 20:21:15 +01:00
Matthew Honnibal
13c8881d2f Expose parser's tok2vec model component 2017-11-03 20:20:59 +01:00
Matthew Honnibal
17c63906f9 Update tensorizer component 2017-11-03 20:20:26 +01:00
Matthew Honnibal
2bf21cbe29 Update model after optimising it instead of waiting 2017-11-03 20:20:01 +01:00
Matthew Honnibal
d6e831bf89 Fix lemmatizer tests 2017-11-03 19:46:34 +01:00
ines
eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines
f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667 Make test less flakey 2017-11-03 14:36:08 +01:00
Matthew Honnibal
7fea845374 Remove print statement 2017-11-03 14:04:51 +01:00
Matthew Honnibal
0a534ae96a Fix test for backprop d_pad 2017-11-03 14:04:16 +01:00
Matthew Honnibal
33bd2428db Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 13:29:56 +01:00
Matthew Honnibal
6681058abd Fix tensor extending in tagger 2017-11-03 13:29:36 +01:00
Matthew Honnibal
bd2cbdfa85 Make Morphology not fail on unknown tags 2017-11-03 13:29:09 +01:00
Matthew Honnibal
c9b118a7e9 Set softmax attr in tagger model 2017-11-03 11:22:01 +01:00
Matthew Honnibal
a5b05f85f0 Set Doc.tensor attribute in parser 2017-11-03 11:21:00 +01:00
Matthew Honnibal
62ed58935a Add Doc.extend_tensor() method 2017-11-03 11:20:31 +01:00
Matthew Honnibal
d6fc39c8a6 Set Doc.tensor from Tagger 2017-11-03 11:20:05 +01:00
Matthew Honnibal
b3264aa5f0 Expose the softmax layer in the tagger model, to allow setting tensors 2017-11-03 11:19:51 +01:00
Matthew Honnibal
c2bbf076a4 Add document length cap for training 2017-11-03 01:54:54 +01:00
Matthew Honnibal
6771780d3f Fix backprop of padding variable 2017-11-03 01:54:34 +01:00
Matthew Honnibal
54a716f2ec Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 00:55:20 +01:00
Matthew Honnibal
260e6ee3fb Improve efficiency of backprop of padding variable 2017-11-03 00:49:11 +01:00
Matthew Honnibal
a22f96c3f1 Add test for backpropagating padding 2017-11-03 00:48:54 +01:00
ines
9baab241b4 Add skeleton language data for Turkish 2017-11-02 16:32:24 +01:00
ines
c6fea3e5f6 Add Romanian and Croatian skeletons (experimental)
Add language data templates to make it easier for others to contribute to the language support
2017-11-01 23:04:28 +01:00
ines
18c859500b Add missing imports 2017-11-01 23:02:51 +01:00
ines
819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines
3af281a334 Update test model name 2017-11-01 23:02:00 +01:00
Matthew Honnibal
b30dd36179 Allow Tagger.add_label() before training 2017-11-01 21:49:24 +01:00
Matthew Honnibal
eca41f0cf6 Fix filename conversion for conllu 2017-11-01 21:26:49 +01:00
Matthew Honnibal
e237472cdc Fix tag and filename conversion for conllu 2017-11-01 21:25:33 +01:00
Matthew Honnibal
b84d99b281 Revert tagger.add_label() changes, to fix model 2017-11-01 21:10:45 +01:00
Matthew Honnibal
f5855e539b Fix tagger model loading 2017-11-01 20:42:36 +01:00
Matthew Honnibal
624644adfe Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 20:26:41 +01:00
ines
5f661a1b3a Remove tensorizer from pre-set pipe_names 2017-11-01 19:48:33 +01:00
Matthew Honnibal
190522efd3 Fix tagger when some tags aren't in Morphology 2017-11-01 19:27:49 +01:00
Matthew Honnibal
e85e31cfbd Fix backprop of d_pad 2017-11-01 19:27:26 +01:00
Matthew Honnibal
759cc79185 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 19:00:19 +01:00
Matthew Honnibal
1ae40b50b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 17:07:02 +01:00
Matthew Honnibal
7ae1aacdb8 Fix add_label methods 2017-11-01 17:06:43 +01:00
ines
8c2260e18c Move span tests to /doc 2017-11-01 16:56:35 +01:00
Matthew Honnibal
2ef7b59eb0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:51:41 +01:00
ines
1d1f91a041 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:49:44 +01:00
ines
9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines
260cb37224 Catch deprecation warning 2017-11-01 16:49:18 +01:00