Commit Graph

4563 Commits

Author SHA1 Message Date
Felix Sonntag
33b0f86de3 Changed tokenizer to add infix when infix_start is offset 2017-11-19 16:32:10 +01:00
Felix Sonntag
8be3392302 Added regression text for 1494 2017-11-19 16:30:35 +01:00
Motoki Wu
a52e195a0a Fixes Issue #1207 where noun_chunks of Span gives an error.
Make sure to reference `self.doc` when getting the noun chunks.

Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
Motoki Wu
b818afaa0e Added failing test for Issue #1207.
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
Vadim Mazaev
a0739a06d4 Returned russian support from v1.10 branch 2017-11-17 17:06:15 +03:00
yuukos
7401152289 updated Russian tokenizer
moved the trying to import pymorph into __init__
2017-11-17 17:04:50 +03:00
yuukos
3aad66cf00 added russian language support 2017-11-17 17:04:22 +03:00
ines
a3d4dd1a5d Test adding of lots of pipeline components (see #1585)
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
61d28d03e4 Try again to do selective remove cache 2017-11-15 19:11:12 +03:00
Roman Domrachev
b3311100c7 Merge branch 'master' of github.com:explosion/spaCy 2017-11-15 18:30:04 +03:00
Matthew Honnibal
b60d92aca8 Increment version 2017-11-15 16:14:46 +01:00
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Matthew Honnibal
cf0be62096 Increment version 2017-11-15 15:00:18 +01:00
ines
97a4f9362b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 14:24:00 +01:00
ines
8e65247886 Fix lex.id if vectors is None 2017-11-15 14:23:58 +01:00
Matthew Honnibal
437ad1a852
Merge pull request #1570 from explosion/feature/fix-beam-leak
Fix memory leak in beam parser
2017-11-15 14:15:05 +01:00
Matthew Honnibal
2f169fdb0a Set lex ID correctly for new tokens in Vocab 2017-11-15 13:58:03 +01:00
Matthew Honnibal
fe3c42a06b Fix caching in tokenizer 2017-11-15 13:55:46 +01:00
Matthew Honnibal
8d692771f6 Improve profiling 2017-11-15 13:51:25 +01:00
Matthew Honnibal
b797dca977 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 13:11:43 +01:00
ines
c9d72de0fb Add dummy serialization methods for Japanese and missing lang getter (resolves #1557) 2017-11-15 12:44:02 +01:00
Matthew Honnibal
d274d3a3b9 Let beam forward use minibatches 2017-11-15 00:51:42 +01:00
Matthew Honnibal
855872f872 Remove state hashing 2017-11-14 23:36:46 +01:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
a33d5a068d Try to hold origin data instead of restore it 2017-11-14 22:40:03 +03:00
Roman Domrachev
91e2fa6561 Clean all caches 2017-11-14 21:15:04 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman
caae77f72d
Update strings.pyx 2017-11-14 19:44:40 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev
870defa815 Swap keys in proper place
Remove unnecessary clear of the hits
2017-11-14 17:56:30 +03:00
Roman Domrachev
86ca434c93 Merge github.com:explosion/spaCy 2017-11-14 17:46:22 +03:00
Roman Domrachev
a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal
2512ea9eeb Fix memory leak in beam parser 2017-11-14 02:11:40 +01:00
Matthew Honnibal
86ddf692a1 Fix bug in limit calculation on dev data 2017-11-14 01:37:10 +01:00
Ines Montani
ea6c85c67a
Merge pull request #1566 from MathiasDesch/master (resolves #1248)
Add exceptions to tokenizer and norm
2017-11-13 19:05:22 +01:00
Matthew Honnibal
1b348389bb Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-13 18:18:48 +01:00
Matthew Honnibal
ca73d0d8fe Cleanup states after beam parsing, explicitly 2017-11-13 18:18:26 +01:00
Matthew Honnibal
63ef9a2e73 Remove __dealloc__ from ParserBeam 2017-11-13 18:18:08 +01:00
Mathias Deschamps
c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
Matthew Honnibal
6e641f46d4 Create a preprocess function that gets bigrams 2017-11-12 00:43:41 +01:00
Matthew Honnibal
c9251d79e3
Edit comment 2017-11-11 18:38:32 +01:00
Matthew Honnibal
dd1678eab3
Edit comment 2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09 Remove unused import 2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
ines
35653bef3a Add missing import (fixes #1546) 2017-11-10 19:05:18 +01:00
ines
4c5d2c80d5 Re-add python -m to commands, too brittle :( (see #1536) 2017-11-10 02:30:55 +01:00
ines
123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
49fd5a646f Set version for 2.0.2 release 2017-11-08 22:39:39 +01:00
Matthew Honnibal
fba2dbddf7 Increment version 2017-11-08 22:19:08 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
de45702bbe Strip dev suffixes from version for compatibility check 2017-11-08 18:40:21 +01:00
Matthew Honnibal
51639214a1 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 18:04:33 +01:00
Matthew Honnibal
a2f980de4e Exclude .devN versioning from compatibility check 2017-11-08 18:03:52 +01:00
Daniel Hershcovich
d7ae54ff44
Fix typo in message 2017-11-08 16:06:28 +02:00
Matthew Honnibal
4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
Matthew Honnibal
d5537e5516 Work on Windows test failure 2017-11-08 13:25:18 +01:00
Matthew Honnibal
c27c82d5f9 Fix serialization 2017-11-08 13:08:48 +01:00
Matthew Honnibal
1d5599cd28 Fix dtype 2017-11-08 12:18:32 +01:00
Matthew Honnibal
fa7fdd0d9b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 12:11:31 +01:00
Matthew Honnibal
072ff38a01 Try to fix python3.5 serialization 2017-11-08 12:10:49 +01:00
Ines Montani
3a0f34d567
Merge pull request #1509 from abhi18av/patch-1
Create examples.py for Hindi language
2017-11-08 11:37:19 +01:00
Ines Montani
42b241ccd0
Update language code in usage example in comment 2017-11-08 11:36:38 +01:00
Matthew Honnibal
e262e8d942 Increment version to v2.0.2.dev0 2017-11-08 11:25:47 +01:00
Matthew Honnibal
a8b592783b Make a dtype more specific, to fix a windows build 2017-11-08 11:24:35 +01:00
Abhinav Sharma
84edade82d
Create examples.py
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
Matthew Honnibal
d725aee4e2 Increment version to 2.0.1 2017-11-08 02:14:47 +01:00
Matthew Honnibal
8d6f68f1df Increment version 2017-11-08 01:12:34 +01:00
ines
bcf42b8846 Fix typo 2017-11-08 01:06:37 +01:00
Matthew Honnibal
bbd2a3dee1 Fix title in about.py 2017-11-07 14:02:58 +01:00
Matthew Honnibal
4efaf9306c Set version to spacy-nightly rc2 2017-11-07 13:27:26 +01:00
Matthew Honnibal
bf1ec2965f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-07 13:20:29 +01:00
Matthew Honnibal
726f689da4 Fix missing import 2017-11-07 13:20:12 +01:00
ines
834f9c1aab Update about.py 2017-11-07 13:11:33 +01:00
ines
a4662a31a9 Move model package templates to cli.package and update docs 2017-11-07 12:15:35 +01:00
ines
a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
Matthew Honnibal
9a88e66103 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-07 02:00:06 +01:00
Matthew Honnibal
174abe4677 Increment to 2.0.0rc1 2017-11-07 01:59:46 +01:00
ines
42a0fbf291 Fix textcat simple train example 2017-11-07 01:25:54 +01:00
ines
8fb48b9b91 Update and document new util functions 2017-11-07 00:22:43 +01:00
Matthew Honnibal
1cab703bba Move minibatch function to util 2017-11-06 23:45:36 +01:00
ines
5f43953536 Move test 2017-11-06 23:14:10 +01:00
Matthew Honnibal
dd90fe09f5 Remove extraneous label from textcat class 2017-11-06 22:09:02 +01:00
Matthew Honnibal
45e0617e61 Allow Language.update to take unicode text and dict objects 2017-11-06 22:07:38 +01:00
Matthew Honnibal
1831dbd065 Add test of simple textcat workflow 2017-11-06 22:04:29 +01:00
Matthew Honnibal
ffb9101f3f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-06 19:20:41 +01:00
Matthew Honnibal
8fea512ac8 Don't set tensor in textcat 2017-11-06 19:20:14 +01:00
ines
acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
Matthew Honnibal
7d46793dd7 Add PRON_LEMMA to spacy.symbols 2017-11-06 17:38:25 +01:00
Matthew Honnibal
2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
Matthew Honnibal
75e1618ec3 Fix lemma clobbering 2017-11-06 16:56:19 +01:00
Matthew Honnibal
6fdffd7246
Merge pull request #1497 from explosion/feature/improve-optimizer-handling
💫 Improve optimizer handling
2017-11-06 16:41:15 +01:00