oculusrepairo
03ab518f28
Update examples.py ( #5820 )
...
* Update examples.py
adding factual sentences to the list
* Add missing comma separators
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-29 10:28:56 +02:00
graue70
b97dbab998
Fix typo in unit tests ( #5823 )
2020-07-27 20:18:48 +02:00
Adriane Boyd
315d7d611f
Normalize spelling for spaCy ( #5822 )
2020-07-27 10:11:23 +02:00
Martino Mensio
b57b994d38
Sentence transformers added to spaCy universe ( #5814 )
...
* fix details for spacy-universal-sentence-encoder
* added sentence-transformers
2020-07-27 10:11:15 +02:00
Nipun Sadvilkar
1bfc177b10
✏️ typo in pysbd code example ( #5821 )
2020-07-27 10:11:09 +02:00
Adriane Boyd
2880d8a555
Normalize spelling for spaCy ( #5822 )
2020-07-27 10:09:33 +02:00
Martino Mensio
2f6b8132ef
Sentence transformers added to spaCy universe ( #5814 )
...
* fix details for spacy-universal-sentence-encoder
* added sentence-transformers
2020-07-27 09:44:33 +02:00
Nipun Sadvilkar
a66ad89fcb
✏️ typo in pysbd code example ( #5821 )
2020-07-27 09:43:39 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file ( #5810 )
...
* fix the wrong hash url in adding-languages.md file
change the #101 url hash path to #language-data
* filled in the spaCy Contributor Agreement
filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Adriane Boyd
19dc42776a
Remove hard-coded GPU ID from pretrain ( #5808 )
2020-07-24 09:26:26 +02:00
Joshua Olson
6d4d5c074c
Mark Japanese documents as tagged. ( #5803 )
...
Mark the document as tagged before returning it to the user from the JapaneseTokenizer.
Fixes #5802
2020-07-23 08:57:01 +02:00
Adriane Boyd
038ff1a811
Improve warnings around normalization tables ( #5794 )
...
Provide more customized normalization table warnings when training a new
model. Only suggest installing `spacy-lookups-data` if it's not already
installed and it includes a table for this language (currently checked
in a hard-coded list).
2020-07-22 16:04:58 +02:00
Adriane Boyd
bf24f7f672
Update invalid tag maps ( #5796 )
...
* Remove copy of (old?) PTB tag map for: bn, eu
* Remove unsupported features from: hy, pl, ro, ru
2020-07-22 16:02:51 +02:00
Alec Chapman
199f3ff7de
Add VA COVID-19 NLP project to spaCy Universe ( #5777 )
...
* Update universe.json
Add cov-bsv to "resources"
* Update universe.json
* add contributor agreement
2020-07-19 13:38:42 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe ( #5777 )
...
* Update universe.json
Add cov-bsv to "resources"
* Update universe.json
* add contributor agreement
2020-07-19 13:35:31 +02:00
Adriane Boyd
597bcc629e
Improve tag map initialization and updating ( #5768 )
...
* Improve tag map initialization and updating
Generalize tag map initialization and updating so that a provided tag
map can be loaded correctly in the CLI.
* normalize provided tag map as necessary
* use the same method for initializing and overwriting the tag map
* Reinitialize cache after loading new tag map
Reinitialize the cache with the right size after loading a new tag map.
2020-07-19 11:13:39 +02:00
Adriane Boyd
7e14272096
Lower upper pin for cupy to 8.0.0 ( #5773 )
2020-07-19 11:10:11 +02:00
Adriane Boyd
cd5af72c9a
Update pkuseg version ( #5774 )
...
* Update pkuseg version in Chinese tokenizer warnings
* Update pkuseg version in `Makefile`
* Remove warning about python3.8 wheels in docs
2020-07-19 11:09:49 +02:00
Ines Montani
73098dbaf6
Add Plausible
2020-07-18 23:53:27 +02:00
Ines Montani
6f4e4aceb3
Add Plausible [ci skip]
2020-07-18 23:50:29 +02:00
Adriane Boyd
5228920e2f
Clarify warning W030 for misaligned BILUO tags ( #5761 )
2020-07-14 14:09:48 +02:00
Adriane Boyd
7ea2cc7650
Set version to 2.3.2 ( #5756 )
2020-07-13 14:55:56 +02:00
Mark Neumann
27a1cd3c63
fix meta serialization in train ( #5751 )
...
Co-authored-by: Mark Neumann <markng@allenai.org>
2020-07-12 22:06:46 +02:00
Adriane Boyd
0a62098c5f
Fix lemmatizer is_base_form for python2.7 ( #5734 )
...
* Fix lemmatizer init args for python2.7
* Move English is_base_form to a class method
* Skip test pickling PhraseMatcher for python2
2020-07-09 22:11:24 +02:00
Adriane Boyd
923affd091
Remove is_base_form from French lemmatizer ( #5733 )
...
Remove English-specific is_base_form from French lemmatizer.
2020-07-09 22:11:13 +02:00
gandersen101
9ce10207bf
Fix quote issue in spaczz universe.json
2020-07-08 11:36:18 +02:00
Ines Montani
3d83721551
Merge pull request #5723 from gandersen101/fix-spaczz-universe-typo
2020-07-08 11:35:40 +02:00
gandersen101
893133873d
Fix quote issue in spaczz universe.json
2020-07-07 19:16:28 -05:00
Ines Montani
f1653d281f
Fix and update universe.json [ci skip]
2020-07-07 21:12:56 +02:00
Ines Montani
109849bd31
Fix and update universe.json [ci skip]
2020-07-07 21:12:28 +02:00
Jonathan Besomi
f904f1f361
Add texthero to universe.json ( #5716 )
...
* Add texthero to universe.json
* Add spaCy contributor Agreement
2020-07-07 20:57:45 +02:00
gandersen101
9cfd294e59
Adding spaczz package to universe.json ( #5717 )
...
* Adding spaczz package to universe.json
* Adding contributor agreement.
2020-07-07 20:57:36 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json ( #5717 )
...
* Adding spaczz package to universe.json
* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json ( #5716 )
...
* Add texthero to universe.json
* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Mike Izbicki
7a2ca00794
fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab ( #5701 )
...
* speed up Korean nlp 100x by stopping mecab from reloading on each doc
* add contributor agreement
* rename variables to improve code readability
2020-07-06 17:03:33 +02:00
graue70
9860b8399e
Fix typo in test function docstring ( #5696 )
2020-07-05 15:49:06 +02:00
Matthew Honnibal
3e78e82a83
Experimental character-based pretraining ( #5700 )
...
* Use cosine loss in Cloze multitask
* Fix char_embed for gpu
* Call resume_training for base model in train CLI
* Fix bilstm_depth default in pretrain command
* Implement character-based pretraining objective
* Use chars loss in ClozeMultitask
* Add method to decode predicted characters
* Fix number characters
* Rescale gradients for mlm
* Fix char embed+vectors in ml
* Fix pipes
* Fix pretrain args
* Move get_characters_loss
* Fix import
* Fix import
* Mention characters loss option in pretrain
* Remove broken 'self attention' option in pretrain
* Revert "Remove broken 'self attention' option in pretrain"
This reverts commit 56b820f6af
.
* Document 'characters' objective of pretrain
2020-07-05 15:48:39 +02:00
Adriane Boyd
86d13a9fb8
Set version to 2.3.1 ( #5705 )
2020-07-03 13:38:41 +02:00
Matthias Hertel
2fb9bd795d
Fixed vocabulary in the entity linker training example ( #5676 )
...
* entity linker training example: model loading changed according to issue 5668 (https://github.com/explosion/spaCy/issues/5668 ) + vocab_path is a required argument
* contributor agreement
2020-07-03 10:24:02 +02:00
Adriane Boyd
a77c4c3465
Add strings and ENT_KB_ID to Doc serialization ( #5691 )
...
* Add strings for all writeable Token attributes to `Doc.to/from_bytes()`.
* Add ENT_KB_ID to default attributes.
2020-07-02 17:11:57 +02:00
Adriane Boyd
971826a96d
Include git commit in package and model meta ( #5694 )
...
* Include git commit in package and model meta
* Rewrite to read file in setup
* Fix file handle
2020-07-02 17:10:27 +02:00
Adriane Boyd
2bd78c39e3
Fix multiple context manages in examples ( #5690 )
2020-07-02 10:36:07 +02:00
Ines Montani
295279f74b
Update netlify.toml [ci skip]
2020-07-01 22:06:43 +02:00
Ines Montani
6bc643d2e2
Update netlify.toml [ci skip]
2020-07-01 21:34:17 +02:00
Ines Montani
0e28edd2cb
Update netlify.toml [ci skip]
2020-07-01 13:34:52 +02:00
Ines Montani
f2a932a60c
Update netlify.toml [ci skip]
2020-07-01 13:34:35 +02:00
Álvaro Abella Bascarán
7111b9de2e
Fix in docs: pipe(docs) instead of pipe(texts) ( #5680 )
...
Very minor fix in docs, specifically in this part:
```
matcher = PhraseMatcher(nlp.vocab)
> for doc in matcher.pipe(texts, batch_size=50):
> pass
```
`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:01:12 +02:00
Álvaro Abella Bascarán
ff0dbe5c64
Fix in docs: pipe(docs) instead of pipe(texts) ( #5680 )
...
Very minor fix in docs, specifically in this part:
```
matcher = PhraseMatcher(nlp.vocab)
> for doc in matcher.pipe(texts, batch_size=50):
> pass
```
`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:00:50 +02:00
Matthias Hertel
305221f3e5
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:55 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:23 +02:00