Commit Graph

15224 Commits

Author SHA1 Message Date
Matthew Honnibal
5ce02c1b17
Merge pull request #5470 from svlandeg/bugfix/noun-chunks
Bugfix in noun chunks
2020-05-21 20:51:31 +02:00
Ines Montani
32c2bb3d99 Add course to landing [ci skip] 2020-05-21 20:50:17 +02:00
Matthw Honnibal
25b51f4fc8 Set version to v3.0.0.dev9 2020-05-21 20:47:52 +02:00
Matthw Honnibal
bc94fdabd0 Fix begin_training 2020-05-21 20:46:21 +02:00
Matthw Honnibal
d507ac28d8 Fix shape inference 2020-05-21 20:46:10 +02:00
Ines Montani
53da6bd672 Add course to landing [ci skip] 2020-05-21 20:45:33 +02:00
Ines Montani
cb02bff0eb Add blank:{lang} shortcut to util.load_mode 2020-05-21 20:24:07 +02:00
Matthw Honnibal
df87c32a40 Pass smaller doc sample into model initialize 2020-05-21 20:17:24 +02:00
Ines Montani
581bda9f98 Update senter test and auto-format 2020-05-21 20:17:14 +02:00
Ines Montani
0f1beb5ff2 Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
svlandeg
51715b9f72 span / noun chunk has +1 because end is exclusive 2020-05-21 19:56:56 +02:00
Adriane Boyd
132b2a6898 Merge remote-tracking branch 'upstream/master-tmp' into HEAD 2020-05-21 19:50:30 +02:00
Adriane Boyd
17ee9ab53a Fix _SP/POS=SPACE in strings serialization tests 2020-05-21 19:49:08 +02:00
Ines Montani
245f91df78 Fix merge issues 2020-05-21 19:42:13 +02:00
Matthw Honnibal
3b5cfec1fc Tweak memory management in train_from_config 2020-05-21 19:32:04 +02:00
Matthw Honnibal
f075655deb Fix shape inference in begin_training 2020-05-21 19:26:29 +02:00
svlandeg
84d5b7ad0a Merge remote-tracking branch 'upstream/master' into bugfix/noun-chunks
# Conflicts:
#	spacy/lang/el/syntax_iterators.py
#	spacy/lang/en/syntax_iterators.py
#	spacy/lang/fa/syntax_iterators.py
#	spacy/lang/fr/syntax_iterators.py
#	spacy/lang/id/syntax_iterators.py
#	spacy/lang/nb/syntax_iterators.py
#	spacy/lang/sv/syntax_iterators.py
2020-05-21 19:19:50 +02:00
svlandeg
f7d10da555 avoid unnecessary loop to check overlapping noun chunks 2020-05-21 19:15:57 +02:00
Matthw Honnibal
1729165e90 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-21 19:11:08 +02:00
Ines Montani
631e20d0c6 Fix test and schemas 2020-05-21 19:01:02 +02:00
Ines Montani
d34fc0915e Remove serialization getter 2020-05-21 18:48:21 +02:00
Ines Montani
f44897e4c6 Update warning IDs 2020-05-21 18:39:11 +02:00
Ines Montani
24f72c669c Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
Ines Montani
c6ec19c844 Add missing declaration 2020-05-21 17:30:05 +02:00
Matthew Honnibal
884d9b060d
Merge pull request #5466 from adrianeboyd/feature/omit-extra-lexeme-info
Add option to omit extra lexeme tables in CLI
2020-05-21 16:40:02 +02:00
Matthew Honnibal
e6c4c1a507
Merge pull request #5468 from adrianeboyd/feature/cli-conllu-misc-ner
Improve handling of NER in CoNLL-U MISC
2020-05-21 16:39:46 +02:00
Matthew Honnibal
26cd6a0229
Merge pull request #5462 from adrianeboyd/feature/lemmatizer-all-upos
Extend lemmatizer rules for all UPOS tags
2020-05-21 16:05:31 +02:00
Matthew Honnibal
cad9b290a2
Merge branch 'master' into feature/omit-extra-lexeme-info 2020-05-21 16:04:24 +02:00
Matthew Honnibal
1f572ce89b
Merge pull request #5473 from explosion/fix/travis-tests
Fix Python 2.7 compat
2020-05-21 15:56:16 +02:00
Matthew Honnibal
7902ebc63c
Rename argument: doc_or_span/obj -> doclike (#5463)
* doc_or_span -> obj

* Revert "doc_or_span -> obj"

This reverts commit 78bb9ff5e0.

* obj -> doclike

* Refer to correct object
2020-05-21 15:17:54 +02:00
Ines Montani
a9cb2882cb
Rename argument: doc_or_span/obj -> doclike (#5463)
* doc_or_span -> obj

* Revert "doc_or_span -> obj"

This reverts commit 78bb9ff5e0.

* obj -> doclike

* Refer to correct object
2020-05-21 15:17:39 +02:00
Ines Montani
bea863acd2 Fix naming conflict and formatting 2020-05-21 14:24:38 +02:00
Ines Montani
bd6353715a Merge branch 'master' into fix/travis-tests 2020-05-21 14:23:04 +02:00
Ines Montani
e2fe83e35d Refer to correct object 2020-05-21 14:20:29 +02:00
Ines Montani
b1f45c9da3 obj -> doclike 2020-05-21 14:19:58 +02:00
Ines Montani
69fb4bedf2 Revert "doc_or_span -> obj"
This reverts commit 78bb9ff5e0.
2020-05-21 14:14:28 +02:00
Ines Montani
d8f3190c0a Tidy up and auto-format 2020-05-21 14:14:01 +02:00
Ines Montani
56de520afd Try to fix tests on Travis (2.7) 2020-05-21 14:04:57 +02:00
Kevin Lu
a3b7ae4f98 Update universe.json 2020-05-21 13:59:09 +02:00
Ines Montani
f2a131bd9a
Merge pull request #5461 from kevinlu1248/master 2020-05-21 13:53:10 +02:00
adrianeboyd
d45602bc11
Merge branch 'master' into feature/omit-extra-lexeme-info 2020-05-21 10:26:01 +02:00
svlandeg
b221bcf1ba fixing all languages 2020-05-21 00:17:28 +02:00
svlandeg
b509a3e7fc fix: use actual range in 'seen' instead of subtree 2020-05-20 23:06:39 +02:00
svlandeg
36a94c409a failing test to reproduce overlapping spans problem 2020-05-20 23:06:03 +02:00
adrianeboyd
49ef06d793
Add option for base model in init-model CLI (#5467)
Intended for languages like Chinese with a custom tokenizer.
2020-05-20 18:49:11 +02:00
Adriane Boyd
4b229bfc22 Improve handling of NER in CoNLL-U MISC 2020-05-20 18:48:51 +02:00
Matthew Honnibal
609c0ba557
Fix accidentally quadratic runtime in Example.split_sents (#5464)
* Tidy up train-from-config a bit

* Fix accidentally quadratic perf in TokenAnnotation.brackets

When we're reading in the gold data, we had a nested loop where
we looped over the brackets for each token, looking for brackets
that start on that word. This is accidentally quadratic, because
we have one bracket per word (for the POS tags). So we had
an O(N**2) behaviour here that ended up being pretty slow.

To solve this I'm indexing the brackets by their starting word
on the TokenAnnotations object, and having a property to provide
the previous view.

* Fixes
2020-05-20 18:48:18 +02:00
Kevin Lu
c7c4cd5fe1
Changed pyate code example in universe.json 2020-05-20 09:11:32 -07:00
Adriane Boyd
daaa7bf451 Add option to omit extra lexeme tables in CLI 2020-05-20 15:51:44 +02:00
Adriane Boyd
8cba0e41d8 Return lowercase form as default except for PROPN 2020-05-20 15:35:08 +02:00