Commit Graph

1694 Commits

Author SHA1 Message Date
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-06 11:19:36 +02:00
Ines Montani
27c5795ea5 Fix version check in models directory [ci skip] 2020-09-25 09:23:29 +02:00
Marek Grzenkowicz
a26f864ed3
Clarify how to choose pretrained weights files (closes #6027) [ci skip] (#6039) 2020-09-08 21:13:50 +02:00
Ines Montani
33d9c64977 Fix outbound link and update package lock [ci skip] 2020-09-04 14:44:38 +02:00
Ines Montani
ba6cf9821f Replace docs analytics [ci skip] 2020-09-04 14:28:28 +02:00
Brad Jascob
2160aafec6
Updates spaCy Universe for amrlib (#6020)
* Updates spaCy Universe for amrlib

* Updates to doc based on feedback
2020-09-04 10:03:35 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example (#5989)
* Update suffixes example

The current example will throw `TypeError: can only concatenate list (not "tuple") to list`

* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Bram Vanroy
9e45d064bb
Update universe details spacy_conll (#5871) 2020-08-05 14:34:12 +02:00
Adriane Boyd
c62fd878a3
Allow Doc.char_span to snap to token boundaries (#5849)
* Allow Doc.char_span to snap to token boundaries

Add a `mode` option to allow `Doc.char_span` to snap to token
boundaries. The `mode` options:

* `strict`: character offsets must match token boundaries (default, same as
before)
* `inside`: all tokens completely within the character span
* `outside`: all tokens at least partially covered by the character span

Add a new helper function `token_by_char` that returns the token
corresponding to a character position in the text. Update
`token_by_start` and `token_by_end` to use `token_by_char` for more
efficient searching.

* Remove unused import

* Rename mode to alignment_mode

Rename `mode` to `alignment_mode` with the options
`strict`/`contract`/`expand`. Any unrecognized modes are silently
converted to `strict`.
2020-08-04 13:36:32 +02:00
Adriane Boyd
2880d8a555
Normalize spelling for spaCy (#5822) 2020-07-27 10:09:33 +02:00
Martino Mensio
2f6b8132ef
Sentence transformers added to spaCy universe (#5814)
* fix details for spacy-universal-sentence-encoder

* added sentence-transformers
2020-07-27 09:44:33 +02:00
Nipun Sadvilkar
a66ad89fcb
✏️ typo in pysbd code example (#5821) 2020-07-27 09:43:39 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file (#5810)
* fix the wrong hash url in adding-languages.md file

change the #101 url hash path to #language-data

* filled in the spaCy Contributor Agreement 

filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe (#5777)
* Update universe.json

Add cov-bsv to "resources"

* Update universe.json

* add contributor agreement
2020-07-19 13:35:31 +02:00
Adriane Boyd
cd5af72c9a
Update pkuseg version (#5774)
* Update pkuseg version in Chinese tokenizer warnings
* Update pkuseg version in `Makefile`
* Remove warning about python3.8 wheels in docs
2020-07-19 11:09:49 +02:00
Ines Montani
6f4e4aceb3 Add Plausible [ci skip] 2020-07-18 23:50:29 +02:00
gandersen101
893133873d Fix quote issue in spaczz universe.json 2020-07-07 19:16:28 -05:00
Ines Montani
109849bd31 Fix and update universe.json [ci skip] 2020-07-07 21:12:28 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json (#5717)
* Adding spaczz package to universe.json

* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json (#5716)
* Add texthero to universe.json

* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Matthew Honnibal
3e78e82a83
Experimental character-based pretraining (#5700)
* Use cosine loss in Cloze multitask

* Fix char_embed for gpu

* Call resume_training for base model in train CLI

* Fix bilstm_depth default in pretrain command

* Implement character-based pretraining objective

* Use chars loss in ClozeMultitask

* Add method to decode predicted characters

* Fix number characters

* Rescale gradients for mlm

* Fix char embed+vectors in ml

* Fix pipes

* Fix pretrain args

* Move get_characters_loss

* Fix import

* Fix import

* Mention characters loss option in pretrain

* Remove broken 'self attention' option in pretrain

* Revert "Remove broken 'self attention' option in pretrain"

This reverts commit 56b820f6af.

* Document 'characters' objective of pretrain
2020-07-05 15:48:39 +02:00
Álvaro Abella Bascarán
ff0dbe5c64
Fix in docs: pipe(docs) instead of pipe(texts) (#5680)
Very minor fix in docs, specifically in this part:

```
 matcher = PhraseMatcher(nlp.vocab)
>   for doc in matcher.pipe(texts, batch_size=50):
>       pass
```

`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:00:50 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:23 +02:00
Adriane Boyd
c4d0209472
Extend v2.3 migration guide (#5653)
* Extend preloaded vocab section

* Add section on tag maps
2020-06-26 14:12:29 +02:00
Adriane Boyd
fd4287c178
Fix backslashes in warnings config diff (#5640)
Fix backslashes in warnings config diff in v2.3 migration section.
2020-06-24 10:26:12 +02:00
Adriane Boyd
7ce451c211
Extend what's new in v2.3 with vocab / is_oov (#5635) 2020-06-23 16:48:59 +02:00
Adriane Boyd
bc1cb30b21
Add warnings example in v2.3 migration guide (#5627) 2020-06-22 14:37:24 +02:00
Adriane Boyd
931d80de72
Warning for sudachipy 0.4.5 (#5611) 2020-06-19 12:43:41 +02:00
Ines Montani
6d712f3e06
Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor 2020-06-16 13:49:25 -07:00
Adriane Boyd
02369f91d3 Fix spacy convert argument 2020-06-16 20:41:17 +02:00
Adriane Boyd
f0fd77648f Change example title to Dr.
Change example title to Dr. so the current model does exclude the title
in the initial example.
2020-06-16 20:36:21 +02:00
Adriane Boyd
a6abdfbc3c Fix numpy.zeros() dtype for Doc.from_array 2020-06-16 20:35:45 +02:00
Adriane Boyd
9aff317ca7 Update POS in tagging example 2020-06-16 20:26:57 +02:00
Adriane Boyd
457babfa0c Update alignment example for new gold.align 2020-06-16 20:22:03 +02:00
Ines Montani
41003a5117 Update Binder version [ci skip] 2020-06-16 17:41:23 +02:00
Ines Montani
fd89f44c0c Update Binder URL [ci skip] 2020-06-16 17:34:26 +02:00
Ines Montani
44af53bdd9 Add pkuseg warnings and auto-format [ci skip] 2020-06-16 17:13:35 +02:00
Ines Montani
a9e5b840ee Fix typos and auto-format [ci skip] 2020-06-16 16:38:45 +02:00
Ines Montani
e9d3e177f0 Merge branch 'master' into v2.3.x 2020-06-16 16:31:38 +02:00
Ines Montani
bb54f54369 Fix model accuracy table [ci skip] 2020-06-16 16:10:12 +02:00
Adriane Boyd
d5110ffbf2
Documentation updates for v2.3.0 (#5593)
* Update website models for v2.3.0

* Add docs for Chinese word segmentation

* Tighten up Chinese docs section

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Auto-format and update version

* Update matcher.md

* Update languages and sorting

* Typo in landing page

* Infobox about token_match behavior

* Add meta and basic docs for Japanese

* POS -> TAG in models table

* Add info about lookups for normalization

* Updates to API docs for v2.3

* Update adding norm exceptions for adding languages

* Add --omit-extra-lookups to CLI API docs

* Add initial draft of "What's New in v2.3"

* Add new in v2.3 tags to Chinese and Japanese sections

* Add tokenizer to migration section

* Add new in v2.3 flags to init-model

* Typo

* More what's new in v2.3

Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Martino Mensio
de00f967ce
adding spacy-universal-sentence-encoder (#5534)
* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
2020-06-08 20:26:30 +02:00
Sofie Van Landeghem
4d1ba6feb4
add tag variant for 2.3 (#5542) 2020-06-04 19:16:33 +02:00
svlandeg
5f0a91cf37 fix conv-depth parameter 2020-05-29 09:56:29 +02:00
Rajat
8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Sofie Van Landeghem
ae1c179f3a
Remove the nested quote 2020-05-23 17:58:19 +02:00
Jannis
aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00
Matthew Honnibal
f6078d866a
Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match
Revert token_match priority changes from #4374 and extend token match options
2020-05-22 14:42:51 +02:00
Ines Montani
65c7e82de2 Auto-format and remove 2.3 feature [ci skip] 2020-05-22 13:50:30 +02:00