Commit Graph

4498 Commits

Author SHA1 Message Date
Vadim Mazaev
81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev
52ee1f9bf9 Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
2017-11-21 11:44:46 +03:00
Vadim Mazaev
a0739a06d4 Returned russian support from v1.10 branch 2017-11-17 17:06:15 +03:00
yuukos
7401152289 updated Russian tokenizer
moved the trying to import pymorph into __init__
2017-11-17 17:04:50 +03:00
yuukos
3aad66cf00 added russian language support 2017-11-17 17:04:22 +03:00
ines
a3d4dd1a5d Test adding of lots of pipeline components (see #1585)
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Matthew Honnibal
b60d92aca8 Increment version 2017-11-15 16:14:46 +01:00
Matthew Honnibal
cf0be62096 Increment version 2017-11-15 15:00:18 +01:00
ines
97a4f9362b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 14:24:00 +01:00
ines
8e65247886 Fix lex.id if vectors is None 2017-11-15 14:23:58 +01:00
Matthew Honnibal
437ad1a852
Merge pull request #1570 from explosion/feature/fix-beam-leak
Fix memory leak in beam parser
2017-11-15 14:15:05 +01:00
Matthew Honnibal
2f169fdb0a Set lex ID correctly for new tokens in Vocab 2017-11-15 13:58:03 +01:00
Matthew Honnibal
fe3c42a06b Fix caching in tokenizer 2017-11-15 13:55:46 +01:00
Matthew Honnibal
8d692771f6 Improve profiling 2017-11-15 13:51:25 +01:00
Matthew Honnibal
b797dca977 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 13:11:43 +01:00
ines
c9d72de0fb Add dummy serialization methods for Japanese and missing lang getter (resolves #1557) 2017-11-15 12:44:02 +01:00
Matthew Honnibal
d274d3a3b9 Let beam forward use minibatches 2017-11-15 00:51:42 +01:00
Matthew Honnibal
855872f872 Remove state hashing 2017-11-14 23:36:46 +01:00
Matthew Honnibal
2512ea9eeb Fix memory leak in beam parser 2017-11-14 02:11:40 +01:00
Matthew Honnibal
86ddf692a1 Fix bug in limit calculation on dev data 2017-11-14 01:37:10 +01:00
Ines Montani
ea6c85c67a
Merge pull request #1566 from MathiasDesch/master (resolves #1248)
Add exceptions to tokenizer and norm
2017-11-13 19:05:22 +01:00
Matthew Honnibal
1b348389bb Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-13 18:18:48 +01:00
Matthew Honnibal
ca73d0d8fe Cleanup states after beam parsing, explicitly 2017-11-13 18:18:26 +01:00
Matthew Honnibal
63ef9a2e73 Remove __dealloc__ from ParserBeam 2017-11-13 18:18:08 +01:00
Mathias Deschamps
c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
Matthew Honnibal
6e641f46d4 Create a preprocess function that gets bigrams 2017-11-12 00:43:41 +01:00
Matthew Honnibal
c9251d79e3
Edit comment 2017-11-11 18:38:32 +01:00
Matthew Honnibal
dd1678eab3
Edit comment 2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09 Remove unused import 2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
ines
35653bef3a Add missing import (fixes #1546) 2017-11-10 19:05:18 +01:00
ines
4c5d2c80d5 Re-add python -m to commands, too brittle :( (see #1536) 2017-11-10 02:30:55 +01:00
ines
123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
49fd5a646f Set version for 2.0.2 release 2017-11-08 22:39:39 +01:00
Matthew Honnibal
fba2dbddf7 Increment version 2017-11-08 22:19:08 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
de45702bbe Strip dev suffixes from version for compatibility check 2017-11-08 18:40:21 +01:00
Matthew Honnibal
51639214a1 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 18:04:33 +01:00
Matthew Honnibal
a2f980de4e Exclude .devN versioning from compatibility check 2017-11-08 18:03:52 +01:00
Daniel Hershcovich
d7ae54ff44
Fix typo in message 2017-11-08 16:06:28 +02:00
Matthew Honnibal
4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
Matthew Honnibal
d5537e5516 Work on Windows test failure 2017-11-08 13:25:18 +01:00
Matthew Honnibal
c27c82d5f9 Fix serialization 2017-11-08 13:08:48 +01:00
Matthew Honnibal
1d5599cd28 Fix dtype 2017-11-08 12:18:32 +01:00