Commit Graph

11857 Commits

Author SHA1 Message Date
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Matthew Honnibal
cf0be62096 Increment version 2017-11-15 15:00:18 +01:00
Matthew Honnibal
716ccbb71e Require thinc 6.10.1 2017-11-15 14:59:34 +01:00
ines
97a4f9362b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 14:24:00 +01:00
ines
8e65247886 Fix lex.id if vectors is None 2017-11-15 14:23:58 +01:00
Matthew Honnibal
437ad1a852
Merge pull request #1570 from explosion/feature/fix-beam-leak
Fix memory leak in beam parser
2017-11-15 14:15:05 +01:00
Matthew Honnibal
2f169fdb0a Set lex ID correctly for new tokens in Vocab 2017-11-15 13:58:03 +01:00
Matthew Honnibal
fe3c42a06b Fix caching in tokenizer 2017-11-15 13:55:46 +01:00
Matthew Honnibal
8d692771f6 Improve profiling 2017-11-15 13:51:25 +01:00
Matthew Honnibal
b797dca977 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 13:11:43 +01:00
Ines Montani
9177c7d7aa
Merge pull request #1583 from yogendrasoni/master (resolves #1582)
Add rstrip after reading line from  vec file #1582
2017-11-15 12:02:48 +00:00
ines
c9d72de0fb Add dummy serialization methods for Japanese and missing lang getter (resolves #1557) 2017-11-15 12:44:02 +01:00
yogendrasoni
334ed433b2
rstrip line before rsplit
loading english fast text giving error because line contains new line at the end and rsplit is splitting it incorrectly
2017-11-15 13:55:08 +05:30
Matthew Honnibal
d274d3a3b9 Let beam forward use minibatches 2017-11-15 00:51:42 +01:00
Matthew Honnibal
855872f872 Remove state hashing 2017-11-14 23:36:46 +01:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
a33d5a068d Try to hold origin data instead of restore it 2017-11-14 22:40:03 +03:00
ines
40c4e8fc09 Remove "optional" from dev_data arg and add more info (see #1578) 2017-11-14 20:26:05 +01:00
Roman Domrachev
91e2fa6561 Clean all caches 2017-11-14 21:15:04 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman
caae77f72d
Update strings.pyx 2017-11-14 19:44:40 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev
870defa815 Swap keys in proper place
Remove unnecessary clear of the hits
2017-11-14 17:56:30 +03:00
Roman Domrachev
86ca434c93 Merge github.com:explosion/spaCy 2017-11-14 17:46:22 +03:00
Roman Domrachev
a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal
2512ea9eeb Fix memory leak in beam parser 2017-11-14 02:11:40 +01:00
Ines Montani
48b6cfe59e
Merge pull request #1569 from KMLDS/patch-1
trivial typo in docs
2017-11-14 01:46:34 +01:00
Matthew Honnibal
86ddf692a1 Fix bug in limit calculation on dev data 2017-11-14 01:37:10 +01:00
KMLDS
d5b20ac3b6
Update span.jade 2017-11-13 19:27:20 -05:00
Ines Montani
ea6c85c67a
Merge pull request #1566 from MathiasDesch/master (resolves #1248)
Add exceptions to tokenizer and norm
2017-11-13 19:05:22 +01:00
Matthew Honnibal
1b348389bb Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-13 18:18:48 +01:00
Matthew Honnibal
ca73d0d8fe Cleanup states after beam parsing, explicitly 2017-11-13 18:18:26 +01:00
Matthew Honnibal
63ef9a2e73 Remove __dealloc__ from ParserBeam 2017-11-13 18:18:08 +01:00
Mathias Deschamps
d82f868e1c Ignore pycharm project files 2017-11-13 17:46:05 +01:00
Mathias Deschamps
c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
ines
0e5642593e Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-13 17:00:07 +01:00
ines
bc79274706 Fix typo 2017-11-13 17:00:03 +01:00
Ines Montani
339675c9fb
Merge pull request #1565 from DuyguA/patch-2
added contributor agreement for DuyguA
2017-11-13 16:21:50 +01:00
Ines Montani
0cf0fca174
Merge pull request #1564 from DuyguA/patch-1
cleaned encoding problems
2017-11-13 16:21:24 +01:00
Ines Montani
6ef702f79f
Merge pull request #1563 from abhi18av/patch-2
improved upon the list of included stop_words
2017-11-13 16:14:28 +01:00
Duygu Altinok
c263c3acce
added contributor agreement for DuyguA 2017-11-13 15:45:13 +01:00
Duygu Altinok
c0ceb775fb
cleaned encoding problems
Some Turkish only letters had some kind of encoding problems. For instance line13 "altmýþ", is indeed line14 "altmış". I cleaned the duplicates that was led by this problem, also went over the word list once.
2017-11-13 14:53:21 +01:00
Abhinav Sharma
4dd34058a2
Create abhi18av.md 2017-11-13 17:23:05 +05:30
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
ines
7a7b01feb1 Update links 2017-11-13 08:30:06 +01:00
ines
b3e502a076 Add videos section to resources 2017-11-13 08:29:57 +01:00
ines
f2b6b98b75 Fix typo in code example (resolves #1556) 2017-11-13 08:29:16 +01:00
Matthew Honnibal
f0e28e8ae5
Make fasttext reader accommodate whitespace 2017-11-12 12:07:13 +01:00