Ines Montani
c685ee734a
Fix compat for v2.x branch
2020-05-22 14:22:36 +02:00
Ines Montani
65c7e82de2
Auto-format and remove 2.3 feature [ci skip]
2020-05-22 13:50:30 +02:00
Matthew Honnibal
8cb16c7120
Merge pull request #5485 from adrianeboyd/bugfix/retokenizer-merge-0-length-5450
...
Disallow merging 0-length spans
2020-05-22 13:28:35 +02:00
Adriane Boyd
e4a1b5dab1
Rename to url_match
...
Rename to `url_match` and update docs.
2020-05-22 12:41:03 +02:00
Adriane Boyd
730fa493a4
Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match
2020-05-22 12:18:00 +02:00
Adriane Boyd
71fe61fdcd
Disallow merging 0-length spans
2020-05-22 10:14:34 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
...
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Matthew Honnibal
e1cb7e838b
Merge pull request #5481 from explosion/feature/blank-shortcut-v2
...
Add blank:{lang} shortcut support to util.load_model
2020-05-22 00:08:23 +02:00
Ines Montani
ee027de032
Update universe and display of videos [ci skip]
2020-05-21 21:54:23 +02:00
Ines Montani
2250380816
Merge pull request #5482 from explosion/fix/backwards-compat-super
2020-05-21 21:51:46 +02:00
Ines Montani
891fa59009
Use backwards-compatible super()
2020-05-21 20:52:48 +02:00
Matthew Honnibal
5ce02c1b17
Merge pull request #5470 from svlandeg/bugfix/noun-chunks
...
Bugfix in noun chunks
2020-05-21 20:51:31 +02:00
Ines Montani
53da6bd672
Add course to landing [ci skip]
2020-05-21 20:45:33 +02:00
Ines Montani
cb02bff0eb
Add blank:{lang} shortcut to util.load_mode
2020-05-21 20:24:07 +02:00
Ines Montani
0f1beb5ff2
Tidy up and avoid absolute spacy imports in core
2020-05-21 20:05:03 +02:00
svlandeg
51715b9f72
span / noun chunk has +1 because end is exclusive
2020-05-21 19:56:56 +02:00
svlandeg
84d5b7ad0a
Merge remote-tracking branch 'upstream/master' into bugfix/noun-chunks
...
# Conflicts:
# spacy/lang/el/syntax_iterators.py
# spacy/lang/en/syntax_iterators.py
# spacy/lang/fa/syntax_iterators.py
# spacy/lang/fr/syntax_iterators.py
# spacy/lang/id/syntax_iterators.py
# spacy/lang/nb/syntax_iterators.py
# spacy/lang/sv/syntax_iterators.py
2020-05-21 19:19:50 +02:00
svlandeg
f7d10da555
avoid unnecessary loop to check overlapping noun chunks
2020-05-21 19:15:57 +02:00
Ines Montani
c6ec19c844
Add missing declaration
2020-05-21 17:30:05 +02:00
Matthew Honnibal
884d9b060d
Merge pull request #5466 from adrianeboyd/feature/omit-extra-lexeme-info
...
Add option to omit extra lexeme tables in CLI
2020-05-21 16:40:02 +02:00
Matthew Honnibal
26cd6a0229
Merge pull request #5462 from adrianeboyd/feature/lemmatizer-all-upos
...
Extend lemmatizer rules for all UPOS tags
2020-05-21 16:05:31 +02:00
Matthew Honnibal
cad9b290a2
Merge branch 'master' into feature/omit-extra-lexeme-info
2020-05-21 16:04:24 +02:00
Matthew Honnibal
1f572ce89b
Merge pull request #5473 from explosion/fix/travis-tests
...
Fix Python 2.7 compat
2020-05-21 15:56:16 +02:00
Matthew Honnibal
7902ebc63c
Rename argument: doc_or_span/obj -> doclike ( #5463 )
...
* doc_or_span -> obj
* Revert "doc_or_span -> obj"
This reverts commit 78bb9ff5e0
.
* obj -> doclike
* Refer to correct object
2020-05-21 15:17:54 +02:00
Ines Montani
a9cb2882cb
Rename argument: doc_or_span/obj -> doclike ( #5463 )
...
* doc_or_span -> obj
* Revert "doc_or_span -> obj"
This reverts commit 78bb9ff5e0
.
* obj -> doclike
* Refer to correct object
2020-05-21 15:17:39 +02:00
Ines Montani
bea863acd2
Fix naming conflict and formatting
2020-05-21 14:24:38 +02:00
Ines Montani
bd6353715a
Merge branch 'master' into fix/travis-tests
2020-05-21 14:23:04 +02:00
Ines Montani
e2fe83e35d
Refer to correct object
2020-05-21 14:20:29 +02:00
Ines Montani
b1f45c9da3
obj -> doclike
2020-05-21 14:19:58 +02:00
Ines Montani
69fb4bedf2
Revert "doc_or_span -> obj"
...
This reverts commit 78bb9ff5e0
.
2020-05-21 14:14:28 +02:00
Ines Montani
d8f3190c0a
Tidy up and auto-format
2020-05-21 14:14:01 +02:00
Ines Montani
56de520afd
Try to fix tests on Travis (2.7)
2020-05-21 14:04:57 +02:00
Ines Montani
f2a131bd9a
Merge pull request #5461 from kevinlu1248/master
2020-05-21 13:53:10 +02:00
adrianeboyd
d45602bc11
Merge branch 'master' into feature/omit-extra-lexeme-info
2020-05-21 10:26:01 +02:00
svlandeg
b221bcf1ba
fixing all languages
2020-05-21 00:17:28 +02:00
svlandeg
b509a3e7fc
fix: use actual range in 'seen' instead of subtree
2020-05-20 23:06:39 +02:00
svlandeg
36a94c409a
failing test to reproduce overlapping spans problem
2020-05-20 23:06:03 +02:00
adrianeboyd
49ef06d793
Add option for base model in init-model CLI ( #5467 )
...
Intended for languages like Chinese with a custom tokenizer.
2020-05-20 18:49:11 +02:00
Kevin Lu
c7c4cd5fe1
Changed pyate code example in universe.json
2020-05-20 09:11:32 -07:00
Adriane Boyd
daaa7bf451
Add option to omit extra lexeme tables in CLI
2020-05-20 15:51:44 +02:00
Adriane Boyd
8cba0e41d8
Return lowercase form as default except for PROPN
2020-05-20 15:35:08 +02:00
adrianeboyd
9393253b66
Remove peeking from Parser.begin_training ( #5456 )
...
Inspect all instances in `Parser.begin_training` rather than only the
first 1000.
2020-05-20 15:18:06 +02:00
Ines Montani
78bb9ff5e0
doc_or_span -> obj
2020-05-20 14:56:52 +02:00
Adriane Boyd
4fa9670537
Extend lemmatizer rules for all UPOS tags
2020-05-20 10:15:43 +02:00
Kevin Lu
291b9ad7b9
Update CONTRIBUTOR_AGREEMENT.md
2020-05-19 20:29:53 -07:00
Kevin Lu
9a1a535215
Create kevinlu1248.md
2020-05-19 20:25:45 -07:00
Kevin Lu
a23b3a5a50
Update CONTRIBUTOR_AGREEMENT.md
2020-05-19 20:24:24 -07:00
Kevin Lu
0a5b140235
Update universe.json
2020-05-19 20:12:21 -07:00
adrianeboyd
40e65d6f63
Fix most_similar for vectors with unused rows ( #5348 )
...
* Fix most_similar for vectors with unused rows
Address issues related to the unused rows in the vector table and
`most_similar`:
* Update `most_similar()` to search only through rows that are in use
according to `key2row`.
* Raise an error when `most_similar(n=n)` is larger than the number of
vectors in the table.
* Set and restore `_unset` correctly when vectors are added or
deserialized so that new vectors are added in the correct row.
* Set data and keys to the same length in `Vocab.prune_vectors()` to
avoid spurious entries in `key2row`.
* Fix regression test using `most_similar`
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-19 16:41:26 +02:00
adrianeboyd
70da1fd2d6
Add warning for misaligned character offset spans ( #5007 )
...
* Add warning for misaligned character offset spans
* Resolve conflict
* Filter warnings in example scripts
Filter warnings in example scripts to show warnings once, in particular
warnings about misaligned entities.
Co-authored-by: Ines Montani <ines@ines.io>
2020-05-19 16:01:18 +02:00