Commit Graph

814 Commits

Author SHA1 Message Date
Ines Montani
7bcf9f7cfb Document new features 2020-07-09 21:10:36 +02:00
Ines Montani
ea01831f6a Update projects docs etc. 2020-07-09 19:43:25 +02:00
Ines Montani
175d34d8f9 Update sidebar menu 2020-07-09 11:44:09 +02:00
Ines Montani
9ee5b71412 Update cli.md 2020-07-09 11:44:00 +02:00
Ines Montani
9ae4040183 Update API docs 2020-07-08 13:34:35 +02:00
svlandeg
c94279ac1b remove tensors, fix predict, get_loss and set_annotations 2020-07-08 13:11:54 +02:00
svlandeg
90b100c39f remove component.Model, update constructor, losses is return value of update 2020-07-08 12:14:30 +02:00
Ines Montani
2298e129e6 Update example and training docs 2020-07-07 20:30:12 +02:00
svlandeg
2b60e894cb fix component constructors, update, begin_training, reference to GoldParse 2020-07-07 19:17:19 +02:00
svlandeg
14a796e3f9 add Example API with examples of Example usage 2020-07-07 14:46:41 +02:00
Ines Montani
bb3ee38cf9 Update WIP 2020-07-06 22:22:37 +02:00
Ines Montani
44da24ddd0 Update doc.md 2020-07-06 18:17:00 +02:00
Ines Montani
44790c1c32 Update docs and add keyword-only tag 2020-07-06 18:14:57 +02:00
Ines Montani
a35236e5f0 Update v3 docs WIP [ci skip] 2020-07-06 15:57:44 +02:00
Ines Montani
63247cbe87 Update v3 docs [ci skip] 2020-07-05 16:11:16 +02:00
Ines Montani
dc8c9d912f Update docs [ci skip] 2020-07-04 16:47:24 +02:00
Ines Montani
4498dfe99d Update docs 2020-07-04 16:25:30 +02:00
Ines Montani
1e0d54edd1 Update docs 2020-07-04 14:23:10 +02:00
Ines Montani
fe224dc2dd Merge branch 'develop' into nightly.spacy.io 2020-07-03 16:48:27 +02:00
Ines Montani
06f1ecb308 Update v3 docs 2020-07-03 16:48:21 +02:00
Ines Montani
cdf9ee1716 Add stub for Example API docs [ci skip] 2020-07-03 15:46:10 +02:00
Ines Montani
fa8e097c04 Update convert docs [ci skip] 2020-07-03 15:42:04 +02:00
Jan Jessewitsch
e4dcac4a4b
Merging multiple docs into one (#5032)
* Add static method to Doc to allow merging of multiple docs.

* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().

* Add test for Doc.from_docs() implementation.

* Fix using numpy's concatenate in Doc.from_docs.

* Replace typing's type annotations in from_docs.

* Simply remove type annotations in from_docs.

* Add documentation for Doc.from_docs to api.

* Simplify from_docs, its test and the api doc for codebase consistency.

* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.

* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.

* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.

* Add MORPH to attrs

* Update warnings calls

* Remove out-dated error from merge

* Rename space_delimiter to ensure_whitespace

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
Adriane Boyd
a723fa02a1
DocBin: add version number, missing attributes and strings (#5685)
* Add version number to DocBin

Add a version number to DocBin for future use.

* Add POS to all attributes in DocBin

* Add morph string to strings in DocBin

* Update DocBin API

* Add string for ENT_KB_ID in DocBin
2020-07-02 17:41:50 +02:00
Ines Montani
b5268955d7 Update matcher usage examples [ci skip] 2020-07-02 15:39:45 +02:00
Ines Montani
a4cfe9fc33 Remove inline notes on v2 changes [ci skip] 2020-07-01 22:29:22 +02:00
Ines Montani
fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
Ines Montani
26df4efa94 Add new in v3.0 2020-07-01 13:02:17 +02:00
Ines Montani
18a900abc2 Fix markup 2020-07-01 13:02:07 +02:00
Ines Montani
414dc7ace1 Merge branch 'spacy.io' into spacy.io-develop 2020-07-01 11:47:47 +02:00
Álvaro Abella Bascarán
7111b9de2e Fix in docs: pipe(docs) instead of pipe(texts) (#5680)
Very minor fix in docs, specifically in this part:

```
 matcher = PhraseMatcher(nlp.vocab)
>   for doc in matcher.pipe(texts, batch_size=50):
>       pass
```

`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:01:12 +02:00
Matthias Hertel
305221f3e5 Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:55 +02:00
Adriane Boyd
d777d9cc38 Extend v2.3 migration guide (#5653)
* Extend preloaded vocab section

* Add section on tag maps
2020-06-26 14:13:01 +02:00
Adriane Boyd
a2660bd9c6 Fix backslashes in warnings config diff (#5640)
Fix backslashes in warnings config diff in v2.3 migration section.
2020-06-24 10:26:57 +02:00
Adriane Boyd
4f73ced914 Extend what's new in v2.3 with vocab / is_oov (#5635) 2020-06-23 16:50:43 +02:00
Adriane Boyd
fcdecefacf Add warnings example in v2.3 migration guide (#5627) 2020-06-22 14:38:06 +02:00
Ines Montani
52728d8fa3 Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
Adriane Boyd
66889de166 Warning for sudachipy 0.4.5 (#5611) 2020-06-19 13:45:23 +02:00
Adriane Boyd
931d80de72
Warning for sudachipy 0.4.5 (#5611) 2020-06-19 12:43:41 +02:00
Ines Montani
6d712f3e06
Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor 2020-06-16 13:49:25 -07:00
Adriane Boyd
02369f91d3 Fix spacy convert argument 2020-06-16 20:41:17 +02:00
Adriane Boyd
f0fd77648f Change example title to Dr.
Change example title to Dr. so the current model does exclude the title
in the initial example.
2020-06-16 20:36:21 +02:00
Adriane Boyd
a6abdfbc3c Fix numpy.zeros() dtype for Doc.from_array 2020-06-16 20:35:45 +02:00
Adriane Boyd
9aff317ca7 Update POS in tagging example 2020-06-16 20:26:57 +02:00
Adriane Boyd
457babfa0c Update alignment example for new gold.align 2020-06-16 20:22:03 +02:00
Ines Montani
44af53bdd9 Add pkuseg warnings and auto-format [ci skip] 2020-06-16 17:13:35 +02:00
Ines Montani
a9e5b840ee Fix typos and auto-format [ci skip] 2020-06-16 16:38:45 +02:00
Adriane Boyd
d5110ffbf2
Documentation updates for v2.3.0 (#5593)
* Update website models for v2.3.0

* Add docs for Chinese word segmentation

* Tighten up Chinese docs section

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Auto-format and update version

* Update matcher.md

* Update languages and sorting

* Typo in landing page

* Infobox about token_match behavior

* Add meta and basic docs for Japanese

* POS -> TAG in models table

* Add info about lookups for normalization

* Updates to API docs for v2.3

* Update adding norm exceptions for adding languages

* Add --omit-extra-lookups to CLI API docs

* Add initial draft of "What's New in v2.3"

* Add new in v2.3 tags to Chinese and Japanese sections

* Add tokenizer to migration section

* Add new in v2.3 flags to init-model

* Typo

* More what's new in v2.3

Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
Sofie Van Landeghem
4d1ba6feb4
add tag variant for 2.3 (#5542) 2020-06-04 19:16:33 +02:00