Matthw Honnibal
d507ac28d8
Fix shape inference
2020-05-21 20:46:10 +02:00
Ines Montani
53da6bd672
Add course to landing [ci skip]
2020-05-21 20:45:33 +02:00
Ines Montani
cb02bff0eb
Add blank:{lang} shortcut to util.load_mode
2020-05-21 20:24:07 +02:00
Matthw Honnibal
df87c32a40
Pass smaller doc sample into model initialize
2020-05-21 20:17:24 +02:00
Ines Montani
581bda9f98
Update senter test and auto-format
2020-05-21 20:17:14 +02:00
Ines Montani
0f1beb5ff2
Tidy up and avoid absolute spacy imports in core
2020-05-21 20:05:03 +02:00
svlandeg
51715b9f72
span / noun chunk has +1 because end is exclusive
2020-05-21 19:56:56 +02:00
Adriane Boyd
132b2a6898
Merge remote-tracking branch 'upstream/master-tmp' into HEAD
2020-05-21 19:50:30 +02:00
Adriane Boyd
17ee9ab53a
Fix _SP/POS=SPACE in strings serialization tests
2020-05-21 19:49:08 +02:00
Ines Montani
245f91df78
Fix merge issues
2020-05-21 19:42:13 +02:00
Matthw Honnibal
3b5cfec1fc
Tweak memory management in train_from_config
2020-05-21 19:32:04 +02:00
Matthw Honnibal
f075655deb
Fix shape inference in begin_training
2020-05-21 19:26:29 +02:00
svlandeg
84d5b7ad0a
Merge remote-tracking branch 'upstream/master' into bugfix/noun-chunks
...
# Conflicts:
# spacy/lang/el/syntax_iterators.py
# spacy/lang/en/syntax_iterators.py
# spacy/lang/fa/syntax_iterators.py
# spacy/lang/fr/syntax_iterators.py
# spacy/lang/id/syntax_iterators.py
# spacy/lang/nb/syntax_iterators.py
# spacy/lang/sv/syntax_iterators.py
2020-05-21 19:19:50 +02:00
svlandeg
f7d10da555
avoid unnecessary loop to check overlapping noun chunks
2020-05-21 19:15:57 +02:00
Matthw Honnibal
1729165e90
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-05-21 19:11:08 +02:00
Ines Montani
631e20d0c6
Fix test and schemas
2020-05-21 19:01:02 +02:00
Ines Montani
d34fc0915e
Remove serialization getter
2020-05-21 18:48:21 +02:00
Ines Montani
f44897e4c6
Update warning IDs
2020-05-21 18:39:11 +02:00
Ines Montani
24f72c669c
Merge branch 'develop' into master-tmp
2020-05-21 18:39:06 +02:00
Ines Montani
c6ec19c844
Add missing declaration
2020-05-21 17:30:05 +02:00
Matthew Honnibal
884d9b060d
Merge pull request #5466 from adrianeboyd/feature/omit-extra-lexeme-info
...
Add option to omit extra lexeme tables in CLI
2020-05-21 16:40:02 +02:00
Matthew Honnibal
e6c4c1a507
Merge pull request #5468 from adrianeboyd/feature/cli-conllu-misc-ner
...
Improve handling of NER in CoNLL-U MISC
2020-05-21 16:39:46 +02:00
Matthew Honnibal
26cd6a0229
Merge pull request #5462 from adrianeboyd/feature/lemmatizer-all-upos
...
Extend lemmatizer rules for all UPOS tags
2020-05-21 16:05:31 +02:00
Matthew Honnibal
cad9b290a2
Merge branch 'master' into feature/omit-extra-lexeme-info
2020-05-21 16:04:24 +02:00
Matthew Honnibal
1f572ce89b
Merge pull request #5473 from explosion/fix/travis-tests
...
Fix Python 2.7 compat
2020-05-21 15:56:16 +02:00
Matthew Honnibal
7902ebc63c
Rename argument: doc_or_span/obj -> doclike ( #5463 )
...
* doc_or_span -> obj
* Revert "doc_or_span -> obj"
This reverts commit 78bb9ff5e0
.
* obj -> doclike
* Refer to correct object
2020-05-21 15:17:54 +02:00
Ines Montani
a9cb2882cb
Rename argument: doc_or_span/obj -> doclike ( #5463 )
...
* doc_or_span -> obj
* Revert "doc_or_span -> obj"
This reverts commit 78bb9ff5e0
.
* obj -> doclike
* Refer to correct object
2020-05-21 15:17:39 +02:00
Ines Montani
bea863acd2
Fix naming conflict and formatting
2020-05-21 14:24:38 +02:00
Ines Montani
bd6353715a
Merge branch 'master' into fix/travis-tests
2020-05-21 14:23:04 +02:00
Ines Montani
e2fe83e35d
Refer to correct object
2020-05-21 14:20:29 +02:00
Ines Montani
b1f45c9da3
obj -> doclike
2020-05-21 14:19:58 +02:00
Ines Montani
69fb4bedf2
Revert "doc_or_span -> obj"
...
This reverts commit 78bb9ff5e0
.
2020-05-21 14:14:28 +02:00
Ines Montani
d8f3190c0a
Tidy up and auto-format
2020-05-21 14:14:01 +02:00
Ines Montani
56de520afd
Try to fix tests on Travis (2.7)
2020-05-21 14:04:57 +02:00
Kevin Lu
a3b7ae4f98
Update universe.json
2020-05-21 13:59:09 +02:00
Ines Montani
f2a131bd9a
Merge pull request #5461 from kevinlu1248/master
2020-05-21 13:53:10 +02:00
adrianeboyd
d45602bc11
Merge branch 'master' into feature/omit-extra-lexeme-info
2020-05-21 10:26:01 +02:00
svlandeg
b221bcf1ba
fixing all languages
2020-05-21 00:17:28 +02:00
svlandeg
b509a3e7fc
fix: use actual range in 'seen' instead of subtree
2020-05-20 23:06:39 +02:00
svlandeg
36a94c409a
failing test to reproduce overlapping spans problem
2020-05-20 23:06:03 +02:00
adrianeboyd
49ef06d793
Add option for base model in init-model CLI ( #5467 )
...
Intended for languages like Chinese with a custom tokenizer.
2020-05-20 18:49:11 +02:00
Adriane Boyd
4b229bfc22
Improve handling of NER in CoNLL-U MISC
2020-05-20 18:48:51 +02:00
Matthew Honnibal
609c0ba557
Fix accidentally quadratic runtime in Example.split_sents ( #5464 )
...
* Tidy up train-from-config a bit
* Fix accidentally quadratic perf in TokenAnnotation.brackets
When we're reading in the gold data, we had a nested loop where
we looped over the brackets for each token, looking for brackets
that start on that word. This is accidentally quadratic, because
we have one bracket per word (for the POS tags). So we had
an O(N**2) behaviour here that ended up being pretty slow.
To solve this I'm indexing the brackets by their starting word
on the TokenAnnotations object, and having a property to provide
the previous view.
* Fixes
2020-05-20 18:48:18 +02:00
Kevin Lu
c7c4cd5fe1
Changed pyate code example in universe.json
2020-05-20 09:11:32 -07:00
Adriane Boyd
daaa7bf451
Add option to omit extra lexeme tables in CLI
2020-05-20 15:51:44 +02:00
Adriane Boyd
8cba0e41d8
Return lowercase form as default except for PROPN
2020-05-20 15:35:08 +02:00
adrianeboyd
9393253b66
Remove peeking from Parser.begin_training ( #5456 )
...
Inspect all instances in `Parser.begin_training` rather than only the
first 1000.
2020-05-20 15:18:06 +02:00
Ines Montani
78bb9ff5e0
doc_or_span -> obj
2020-05-20 14:56:52 +02:00
Matthw Honnibal
60e8da4813
Tidy up train-from-config a bit
2020-05-20 12:56:27 +02:00
Matthw Honnibal
fda7355508
Fix train-from-config
2020-05-20 12:30:21 +02:00