Adriane Boyd
971826a96d
Include git commit in package and model meta ( #5694 )
...
* Include git commit in package and model meta
* Rewrite to read file in setup
* Fix file handle
2020-07-02 17:10:27 +02:00
Adriane Boyd
2bd78c39e3
Fix multiple context manages in examples ( #5690 )
2020-07-02 10:36:07 +02:00
Ines Montani
295279f74b
Update netlify.toml [ci skip]
2020-07-01 22:06:43 +02:00
Ines Montani
6bc643d2e2
Update netlify.toml [ci skip]
2020-07-01 21:34:17 +02:00
Ines Montani
0e28edd2cb
Update netlify.toml [ci skip]
2020-07-01 13:34:52 +02:00
Ines Montani
f2a932a60c
Update netlify.toml [ci skip]
2020-07-01 13:34:35 +02:00
Álvaro Abella Bascarán
7111b9de2e
Fix in docs: pipe(docs) instead of pipe(texts) ( #5680 )
...
Very minor fix in docs, specifically in this part:
```
matcher = PhraseMatcher(nlp.vocab)
> for doc in matcher.pipe(texts, batch_size=50):
> pass
```
`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:01:12 +02:00
Álvaro Abella Bascarán
ff0dbe5c64
Fix in docs: pipe(docs) instead of pipe(texts) ( #5680 )
...
Very minor fix in docs, specifically in this part:
```
matcher = PhraseMatcher(nlp.vocab)
> for doc in matcher.pipe(texts, batch_size=50):
> pass
```
`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:00:50 +02:00
Matthias Hertel
305221f3e5
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:55 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:23 +02:00
Matthew Honnibal
2d715451a2
Revert "Convert custom user_data to token extension format for Japanese tokenizer ( #5652 )" ( #5665 )
...
This reverts commit 1dd38191ec
.
2020-06-29 14:34:15 +02:00
Adriane Boyd
1dd38191ec
Convert custom user_data to token extension format for Japanese tokenizer ( #5652 )
...
* Convert custom user_data to token extension format
Convert the user_data values so that they can be loaded as custom token
extensions for `inflection`, `reading_form`, `sub_tokens`, and `lemma`.
* Reset Underscore state in ja tokenizer tests
2020-06-29 14:20:26 +02:00
Adriane Boyd
167df42cb6
Move lemmatizer is_base_form to language settings ( #5663 )
...
Move `Lemmatizer.is_base_form` to the language settings so that each
language can provide a language-specific method as
`LanguageDefaults.is_base_form`.
The existing English-specific `Lemmatizer.is_base_form` is moved to
`EnglishDefaults`.
2020-06-29 14:16:57 +02:00
Adriane Boyd
d777d9cc38
Extend v2.3 migration guide ( #5653 )
...
* Extend preloaded vocab section
* Add section on tag maps
2020-06-26 14:13:01 +02:00
Adriane Boyd
c4d0209472
Extend v2.3 migration guide ( #5653 )
...
* Extend preloaded vocab section
* Add section on tag maps
2020-06-26 14:12:29 +02:00
PluieElectrique
90c7eb0e2f
Reduce memory usage of Lookup's BloomFilter ( #5606 )
...
* Reduce memory usage of Lookup's BloomFilter
* Remove extra Table update
2020-06-26 14:09:10 +02:00
Adriane Boyd
b7107ac89f
Disregard special tag _SP in check for new tag map ( #5641 )
...
* Skip special tag _SP in check for new tag map
In `Tagger.begin_training()` check for new tags aside from `_SP` in the
new tag map initialized from the provided gold tuples when determining
whether to reinitialize the morphology with the new tag map.
* Simplify _SP check
2020-06-26 09:23:21 +02:00
Adriane Boyd
a2660bd9c6
Fix backslashes in warnings config diff ( #5640 )
...
Fix backslashes in warnings config diff in v2.3 migration section.
2020-06-24 10:26:57 +02:00
Adriane Boyd
fd4287c178
Fix backslashes in warnings config diff ( #5640 )
...
Fix backslashes in warnings config diff in v2.3 migration section.
2020-06-24 10:26:12 +02:00
Adriane Boyd
6fe6e761de
Skip vocab in component config overrides ( #5624 )
2020-06-23 23:21:11 +02:00
Adriane Boyd
4f73ced914
Extend what's new in v2.3 with vocab / is_oov ( #5635 )
2020-06-23 16:50:43 +02:00
Adriane Boyd
7ce451c211
Extend what's new in v2.3 with vocab / is_oov ( #5635 )
2020-06-23 16:48:59 +02:00
Adriane Boyd
d94e961f14
Fix polarity of Token.is_oov and Lexeme.is_oov ( #5634 )
...
Fix `Token.is_oov` and `Lexeme.is_oov` so they return `True` when the
lexeme does **not** have a vector.
2020-06-23 13:29:51 +02:00
Richard Liaw
0ef78bad93
contribute ( #5632 )
2020-06-23 08:53:58 +02:00
Adriane Boyd
fcdecefacf
Add warnings example in v2.3 migration guide ( #5627 )
2020-06-22 14:38:06 +02:00
Adriane Boyd
bc1cb30b21
Add warnings example in v2.3 migration guide ( #5627 )
2020-06-22 14:37:24 +02:00
Hiroshi Matsuda
150a39ccca
Japanese model: add user_dict entries and small refactor ( #5573 )
...
* user_dict fields: adding inflections, reading_forms, sub_tokens
deleting: unidic_tags
improve code readability around the token alignment procedure
* add test cases, replace fugashi with sudachipy in conftest
* move bunsetu.py to spaCy Universe as a pipeline component BunsetuRecognizer
* tag is space -> both surface and tag are spaces
* consider len(text)==0
2020-06-22 14:32:25 +02:00
Rameshh
c34420794a
Add Nepali Language ( #5622 )
...
* added support for nepali lang
* added examples and test files
* added spacy contributor agreement
2020-06-22 10:25:46 +02:00
Karen Hambardzumyan
66a4834e56
Some changes for Armenian ( #5616 )
...
* Fixing numericals
* We need a Armenian question sign to make the sentence a question
2020-06-22 08:50:34 +02:00
Karen Hambardzumyan
ff6a084e9c
Create mahnerak.md ( #5615 )
2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan
8120b641cc
Update lex_attrs.py ( #5608 )
2020-06-19 20:00:34 +02:00
Marat M. Yavrumyan
ccd7edf04b
Create myavrum.md ( #5612 )
2020-06-19 18:34:27 +02:00
Adriane Boyd
66889de166
Warning for sudachipy 0.4.5 ( #5611 )
2020-06-19 13:45:23 +02:00
Adriane Boyd
931d80de72
Warning for sudachipy 0.4.5 ( #5611 )
2020-06-19 12:43:41 +02:00
Ines Montani
959bc616dd
Merge branch 'master' into spacy.io
2020-06-16 22:50:11 +02:00
Ines Montani
6d712f3e06
Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor
2020-06-16 13:49:25 -07:00
Adriane Boyd
02369f91d3
Fix spacy convert argument
2020-06-16 20:41:17 +02:00
Adriane Boyd
f0fd77648f
Change example title to Dr.
...
Change example title to Dr. so the current model does exclude the title
in the initial example.
2020-06-16 20:36:21 +02:00
Adriane Boyd
a6abdfbc3c
Fix numpy.zeros() dtype for Doc.from_array
2020-06-16 20:35:45 +02:00
Adriane Boyd
9aff317ca7
Update POS in tagging example
2020-06-16 20:26:57 +02:00
Adriane Boyd
457babfa0c
Update alignment example for new gold.align
2020-06-16 20:22:03 +02:00
Ines Montani
19b9ea0436
Fix languages.json
2020-06-16 18:34:11 +02:00
Ines Montani
ed240458f6
Try and upgrade gatsby
2020-06-16 18:28:24 +02:00
Ines Montani
0faabf3325
Merge branch 'master' into spacy.io
2020-06-16 18:13:44 +02:00
Ines Montani
41003a5117
Update Binder version [ci skip]
2020-06-16 17:41:23 +02:00
Ines Montani
19be89b2ce
Merge branch 'master' into spacy.io
2020-06-16 17:36:14 +02:00
Ines Montani
fd89f44c0c
Update Binder URL [ci skip]
2020-06-16 17:34:26 +02:00
Ines Montani
ec6e35c1c2
Merge branch 'master' into spacy.io
2020-06-16 17:13:49 +02:00
Ines Montani
44af53bdd9
Add pkuseg warnings and auto-format [ci skip]
2020-06-16 17:13:35 +02:00
Ines Montani
ec26180b8f
Merge branch 'master' into spacy.io
2020-06-16 16:38:55 +02:00