Baciccin
3b53617a69
Add Ligurian language
2020-03-19 21:37:01 -07:00
svlandeg
02d87a8b2b
fix showing dep arcs in streamlit script
2020-03-19 10:30:20 +01:00
Ines Montani
15af22d3f0
Merge branch 'master' into spacy.io
2020-03-17 22:22:17 +01:00
Ines Montani
80e7e1347e
Update universe.json [ci skip]
2020-03-17 22:21:34 +01:00
Ines Montani
a39fc5c7a8
Merge branch 'master' into spacy.io
2020-03-17 22:19:38 +01:00
Ines Montani
eda6eff8b1
Update universe.json [ci skip]
2020-03-17 22:19:29 +01:00
Ines Montani
16e7301d34
Merge pull request #5161 from pmbaumgartner/master
...
add gobbli to spacy-universe 🥳
2020-03-17 22:18:30 +01:00
Peter B
b04057c204
add mentions of spaCy use
2020-03-17 15:03:43 -04:00
Ines Montani
a6d99af811
Merge branch 'master' into spacy.io
2020-03-17 19:53:41 +01:00
Ines Montani
b2b01a5c8b
Update universe.json [ci skip]
2020-03-17 19:53:31 +01:00
Peter B
d2ffb406ad
add gobbli to spacy-universe 🥳
2020-03-17 08:30:29 -04:00
Ines Montani
766db5bfa5
Merge branch 'master' into spacy.io
2020-03-16 15:05:35 +01:00
Ines Montani
558032017e
Merge pull request #5157 from svlandeg/bugfix/language
...
remove unnecessary itertools call
2020-03-16 15:04:25 +01:00
Ines Montani
3944c1a65d
Merge pull request #5148 from svlandeg/fix/empty-docbin
...
Fix serialization of empty doc
2020-03-16 15:03:54 +01:00
Ines Montani
17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
...
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Ines Montani
2044216bd5
Merge pull request #5150 from sloev/master
...
add spacy_syllables to universe
2020-03-16 15:02:12 +01:00
Ines Montani
fb74679559
Merge pull request #5147 from mabraham/master
...
Fix broken link in docs
2020-03-16 14:59:52 +01:00
Ines Montani
c68f20b398
Merge pull request #5146 from adrianeboyd/bugfix/assert-docs-equal-sents
...
Fix sents comparison in test util
2020-03-16 14:59:32 +01:00
svlandeg
fba219f737
remove unnecessary itertools call
2020-03-16 08:31:36 +01:00
Alan Chan
1ae01684cf
Fill in contributor agreement
2020-03-15 03:45:20 +08:00
Alan Chan
2124be100d
Tweak run-on sentence
2020-03-15 03:45:20 +08:00
Alan Chan
7c3a4ce933
Missing word in api/cli doc
2020-03-15 03:45:20 +08:00
Alan Chan
36e3532475
Remove unfinished sentence
2020-03-15 03:45:17 +08:00
nihil
9cde7eb08c
add spacy_syllables to universe + sign contributor agreement
2020-03-13 18:09:42 +01:00
svlandeg
59000ee21d
fix serialization of empty doc + unit test
2020-03-13 16:07:56 +01:00
Mark Abraham
a0ffa346c0
Fix broken link in docs
2020-03-13 14:07:26 +01:00
Adriane Boyd
423849f94a
Fix sents comparison in test util
...
Due to changes to `Span` (#5005 ), spans from different documents are now
never equal. Check `Token.is_sent_start` values instead.
2020-03-13 09:25:23 +01:00
Ines Montani
353f8486f5
Merge branch 'master' into spacy.io
2020-03-12 14:45:33 +01:00
Matthew Honnibal
26a90f011b
Set version to v2.2.4
2020-03-12 11:30:41 +01:00
Ines Montani
c669435c62
Merge pull request #5125 from renaud/patch-1
...
small typo in code sample
2020-03-12 11:19:12 +01:00
Ines Montani
4130fef4ec
Merge pull request #5127 from svlandeg/docs/empty-doc
...
is_XXX is True if doc is empty
2020-03-12 11:18:10 +01:00
Ines Montani
3497b2973d
Merge pull request #5130 from merrcury/patch-1
...
DOC : Update LICENSE Year
2020-03-12 11:17:38 +01:00
Himanshu Garg
27d1300bdb
Create merrcury.md
2020-03-10 15:11:07 +05:30
Himanshu Garg
ba47d5a5cb
Update LICENSE Year
2020-03-10 15:03:29 +05:30
svlandeg
c4d030dbf6
remove accidental commit
2020-03-09 18:10:54 +01:00
svlandeg
1724a4f75b
additional information if doc is empty
2020-03-09 18:08:18 +01:00
Renaud Richardet
eccf6b1686
small typo in code sample
2020-03-09 14:49:11 +01:00
Adriane Boyd
0c31f03ec5
Update docs [ci skip]
2020-03-09 13:41:17 +01:00
Adriane Boyd
1139247532
Revert changes to token_match priority from #4374
...
* Revert changes to priority of `token_match` so that it has priority
over all other tokenizer patterns
* Add lookahead and potentially slow lookbehind back to the default URL
pattern
* Expand character classes in URL pattern to improve matching around
lookaheads and lookbehinds related to #4882
* Revert changes to Hungarian tokenizer
* Revert (xfail) several URL tests to their status before #4374
* Update `tokenizer.explain()` and docs accordingly
2020-03-09 12:09:41 +01:00
Ines Montani
1d6aec805d
Fix formatting and update docs for v2.2.4
2020-03-09 11:17:20 +01:00
Ines Montani
5f68004264
Port over gitignore changes from develop
...
Prevents stale files when switching branches
2020-03-09 11:05:00 +01:00
Mark Abraham
0345135167
Tokenizer to_disk and from_disk now ensure paths ( #5116 )
...
* Tokenizer to_disk and from_disk now ensure strings are converted to paths
Fixes #5115
* Sign contributor agreement
2020-03-08 13:25:56 +01:00
Yohei Tamura
31755630a7
fix typ ( #5106 )
2020-03-08 13:24:38 +01:00
adrianeboyd
9dd98a4b27
Improve Makefile ( #5105 )
...
* Explicitly upgrade pip
* Include spacy-lookups-data in pex
2020-03-08 13:24:19 +01:00
Sofie Van Landeghem
5847be6022
Tok2Vec: extract-embed-encode ( #5102 )
...
* avoid changing original config
* fix elif structure, batch with just int crashes otherwise
* tok2vec example with doc2feats, encode and embed architectures
* further clean up MultiHashEmbed
* further generalize Tok2Vec to work with extract-embed-encode parts
* avoid initializing the charembed layer with Docs (for now ?)
* small fixes for bilstm config (still does not run)
* rename to core layer
* move new configs
* walk model to set nI instead of using core ref
* fix senter overfitting test to be more similar to the training data (avoid flakey behaviour)
2020-03-08 13:23:18 +01:00
adrianeboyd
993758c58f
Remove unnecessary iterator in Language.pipe ( #5101 )
...
Remove iterator over `raw_texts` with `iterator.tee()` in
`Language.pipe` that is never consumed and consumes memory
unnecessarily.
2020-03-08 13:22:25 +01:00
Ines Montani
cd79c7bd26
Merge pull request #5110 from dhpollack/dhp/fix-minor-svg-error
...
fix typo in svg file - caused documentation build error
2020-03-06 15:32:43 +01:00
Sofie Van Landeghem
1a2b8fc264
set vector of merged entity ( #5085 )
...
* merge_entities sets the vector in the vocab for the merged token
* add unit test
* import unicode_literals
* move code to _merge function
* only set vector if vocab has non-zero vectors
2020-03-06 14:45:28 +01:00
adrianeboyd
c95ce96c44
Update sentence recognizer ( #5109 )
...
* Update sentence recognizer
* rename `sentrec` to `senter`
* use `spacy.HashEmbedCNN.v1` by default
* update to follow `Tagger` modifications
* remove component methods that can be inherited from `Tagger`
* add simple initialization and overfitting pipeline tests
* Update serialization test for senter
2020-03-06 14:45:02 +01:00
Sofie Van Landeghem
6ac9fc0619
Unit test for NEL functionality ( #5114 )
...
* empty begin_training for sentencizer
* overfitting unit test for entity linker
* fixed NEL IO by storing the entity_vector_length in the cfg
2020-03-06 14:42:23 +01:00