Matthew Honnibal
b0b990e405
Fix token.conjuncts ( closes #795 ) ( #3392 )
...
* Implement conjuncts method
* Add span.conjuncts property
* Un-xfail token.conjuncts tests
* Update docs for token.conjuncts and span.conjuncts
* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Ines Montani
25cb764e64
Document new API [ci skip]
2019-03-11 15:23:53 +01:00
Ines Montani
ebcf2bb1c3
Add Doc.lang and Doc.lang_
2019-03-11 14:21:40 +01:00
Matthew Honnibal
98acf5ffe4
💫 Allow passing of config parameters to specific pipeline components ( #3386 )
...
* Add component_cfg kwarg to begin_training
* Document component_cfg arg to begin_training
* Update docs and auto-format
* Support component_cfg across Language
* Format
* Update docs and docstrings [ci skip]
* Fix begin_training
2019-03-10 23:36:47 +01:00
Ines Montani
7ba3a5d95c
💫 Make serialization methods consistent ( #3385 )
...
* Make serialization methods consistent
exclude keyword argument instead of random named keyword arguments and deprecation handling
* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani
0426689db8
💫 Improve Doc.to_json and add Doc.is_nered ( #3381 )
...
* Use default return instead of else
* Add Doc.is_nered to indicate if entities have been set
* Add properties in Doc.to_json if they were set, not if they're available
This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.
2019-03-10 15:24:34 +01:00
Ines Montani
76764fcf59
💫 Improve converters and training data file formats ( #3374 )
...
* Populate converter argument info automatically
* Add conversion option for msgpack
* Update docs
* Allow reading training data from JSONL
2019-03-08 23:15:23 +01:00
Ines Montani
296446a1c8
Tidy up and improve docs and docstrings ( #3370 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs
### Types of change
enhancement, docs
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani
fa7314b221
Clarify train_path and dev_path format (see #3366 ) [ci skip]
2019-03-07 12:23:27 +01:00
Ines Montani
e9babd9973
Update hyperparameters section (see #3352 )
2019-03-06 14:40:30 +01:00
Ines Montani
5eadf61327
Update pretraining docs on file format ( closes #3354 )
2019-03-04 16:30:13 +00:00
Ines Montani
1d4ba7678f
Auto-format [ci skip]
2019-02-27 12:07:35 +01:00
Matthew Honnibal
f1d77eb140
💫 Improve handling of missing NER tags ( closes #2603 ) ( #3341 )
...
* Improve handling of missing NER tags
GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.
Fix bug that occurred when first tag was a None value. Closes #2603 .
* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Matthew Honnibal
4a3371acd5
Make doc[0].is_sent_start == True ( closes #2869 ) ( #3340 )
...
* Make doc[0] have sent_start True. Closes #2869
* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Ines Montani
d0b3af9222
Fix remaining inaccuracies in API docs ( closes #2329 )
2019-02-24 22:21:25 +01:00
Ines Montani
62b558ab72
💫 Support lexical attributes in retokenizer attrs ( closes #2390 ) ( #3325 )
...
* Fix formatting and whitespace
* Add support for lexical attributes (closes #2390 )
* Document lexical attribute setting during retokenization
* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani
df19e2bff6
💫 Allow setting of custom attributes during retokenization ( closes #3314 ) ( #3324 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.
```python
Token.set_extension('is_musician', default=False)
doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
retokenizer.merge(doc[2:4], attrs=attrs)
assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani
1ea1bc98e7
Document regex utilities [ci skip]
2019-02-24 18:34:10 +01:00
Ines Montani
46ec5cdccc
Update TextCategorizer docs
2019-02-24 13:11:57 +01:00
Ines Montani
c03cb1cc63
Improve built-in component API docs
2019-02-24 13:11:49 +01:00
Ines Montani
250e88ef55
Fix docs example (see #2728 )
2019-02-21 14:22:06 +01:00
Ines Montani
04b4df0ec9
Remove n_threads
2019-02-17 22:25:42 +01:00
Ines Montani
e597110d31
💫 Update website ( #3285 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org ) with [Remark](https://github.com/remarkjs/remark ) and [MDX](https://mdxjs.com/ ). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/ ) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com ) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270 . Resolves #3222 . Resolves #2947 . Resolves #2837 .
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00
ines
808f7ee417
Update API documentation
2017-10-03 14:27:22 +02:00
ines
d15775c3ad
Fix typos and commands in alpha docs
2017-08-21 13:40:11 +02:00
ines
3c33003078
Port over typo corrections from #1245
2017-08-20 12:00:17 +02:00
ines
1261b01e46
Update Doc.char_span docs
2017-08-19 16:34:32 +02:00
ines
5cb0200e63
Document new Span.to_array() method
2017-08-19 12:45:28 +02:00
ines
471eed4126
Add example to Span.merge()
2017-08-19 12:45:16 +02:00
ines
404d3067b8
Document new Doc.char_span() method
2017-08-19 12:45:00 +02:00
ines
d53cbf369f
Document as_tuples kwarg on Language.pipe()
2017-08-19 12:44:50 +02:00
ines
6a37c93311
Update argument type
2017-08-19 12:44:33 +02:00
ines
4731d50220
Add break utility for long nowrap items (e.g. code)
2017-08-19 12:44:23 +02:00
ines
0aba11b64b
Update package command docs
2017-08-14 16:45:44 +02:00
ines
a29f132ffd
Change python -m spacy to spacy
...
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
ines
f085b88f9d
Add TextCategorizer API docs stub
2017-07-22 17:56:33 +02:00
ines
ab1a4e8b3c
Add Tensorizer API docs stub
2017-07-22 17:56:25 +02:00
ines
d2a7e5b8e5
Add GoldParse.cats attribute
2017-07-22 17:55:35 +02:00
ines
23d976ed00
Add Doc.cats attribute and missing v2 tag
2017-07-22 17:55:14 +02:00
Ines Montani
1ddbeddca2
Fix typo
2017-07-22 15:00:58 +02:00
Vetea
8e20cf6368
Update doc.jade
...
Just remove a duplicate 'doc ='
2017-06-08 10:35:58 +02:00
ines
9f55c0d4f6
Add Vectors class
2017-06-05 13:33:11 +02:00
ines
e204788c30
Add docs for util.load_model_from_path
2017-06-05 13:18:22 +02:00
ines
efc37ea3de
Update train CLI
2017-06-04 23:45:14 +02:00
ines
3419ecbfdd
Update docs on model shortcut links
2017-06-04 13:55:00 +02:00
ines
b0225183c2
Update displaCy defaults
2017-06-03 13:27:06 +02:00
ines
c60431357d
Port over docs typo corrections
2017-06-03 11:31:30 +02:00
ines
1bebc6392c
Add source files to pipeline components
2017-06-01 17:38:06 +02:00
ines
706cec6d58
Move annotation specs up
2017-06-01 13:02:43 +02:00
ines
77dca25c7f
Update Language API docs
2017-06-01 11:51:31 +02:00
ines
f86289566a
Update new in v2 section and add note on Matcher acceptors
2017-05-30 13:53:06 +02:00
ines
b5bfab8699
Add description
2017-05-29 15:27:16 +02:00
ines
567485a818
Fix and document model loading with pipeline and overrides
2017-05-29 14:10:10 +02:00
ines
00b2094dc3
Fix typos, long integers and tests
2017-05-29 01:09:52 +02:00
ines
606879b217
Update hash strings examples
2017-05-28 19:42:44 +02:00
ines
c7b57ea314
Update docs and change integer IDs to hash values
2017-05-28 19:25:34 +02:00
ines
0ea31d1e31
Add under construction note to pipeline components
2017-05-28 18:44:07 +02:00
ines
414193e9ba
Update docs to reflect StringStore changes
2017-05-28 18:19:11 +02:00
ines
69bda9aed7
Update text, examples, typos, wording and formatting
2017-05-28 16:41:01 +02:00
ines
eb5a8be9ad
Update language overview and add section on 'xx' lang class
2017-05-28 01:15:44 +02:00
ines
eb703f7656
Update API docs
2017-05-28 00:32:43 +02:00
ines
c1983621fb
Update util functions for model loading
2017-05-28 00:22:40 +02:00
ines
70afcfec3e
Update defaults and example
2017-05-26 14:04:31 +02:00
ines
1b982f0838
Update train command and add docs on hyperparameters
2017-05-26 14:02:38 +02:00
ines
1b9c6ded71
Update API docs and add "source" button to GH source
2017-05-26 13:40:32 +02:00
ines
d48530835a
Update API docs and fix typos
2017-05-26 12:43:16 +02:00
ines
ea9474f71c
Add version tag mixin to label new features
2017-05-26 12:42:36 +02:00
ines
353f0ef8d7
Use disable argument (list) for serialization
2017-05-26 12:33:54 +02:00
ines
0f48fb1f97
Rename processing text to production use and remove linear feature scheme
2017-05-25 00:10:33 +02:00
ines
8b86b08bed
Update usage workflows
2017-05-24 11:59:08 +02:00
ines
66088851dc
Add Doc.to_disk() and Doc.from_disk() methods
2017-05-24 11:58:17 +02:00
ines
10afb3c796
Tidy up and merge usage pages
2017-05-24 00:37:47 +02:00
ines
697d3d7cb3
Fix links to CLI docs
2017-05-24 00:36:38 +02:00
ines
a38393e2f6
Update annotation docs
2017-05-23 23:16:17 +02:00
ines
786af87ffb
Update IOB docs
2017-05-23 23:15:50 +02:00
ines
c8bde2161c
Add kwargs to spacy.load
2017-05-23 23:14:02 +02:00
ines
0a8a2d2f6d
Remove tip infoboxes from annotation docs
2017-05-23 23:13:51 +02:00
ines
e6acd3bbf2
Fix matcher tests and matcher docs
2017-05-23 11:36:02 +02:00
ines
f497cf60b2
Update formatting
2017-05-23 11:32:25 +02:00
ines
a23f487b06
Tidy up displaCy and add "manual" option
...
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
ines
dddad5bf26
Update util.prints docs
2017-05-22 13:54:52 +02:00
ines
d5a6a9a6a9
Use string values for attrs in Matcher docs
2017-05-22 13:54:45 +02:00
ines
54f04a9fe0
Update API docs with changes in spacy.gold and spacy.language
2017-05-22 12:29:30 +02:00
ines
fc3ec733ea
Reduce complexity in CLI
...
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
ines
2c5cfe8bbf
Update docstrings and API docs for StringStore
2017-05-21 14:18:58 +02:00
ines
251346b59f
Fix typos and formatting
2017-05-21 14:18:46 +02:00
ines
075f5ff87a
Update docstrings and API docs for GoldParse
2017-05-21 13:53:46 +02:00
ines
465a1dd710
Add BILUO scheme to annotation docs
2017-05-21 13:53:34 +02:00
ines
c9f04f3cd0
Add note on automated processes to download command
2017-05-21 13:23:39 +02:00
ines
8ab59515b2
Fix typo and use consistent description for from_bytes
2017-05-21 13:18:39 +02:00
ines
c5a653fa48
Update docstrings and API docs for Tokenizer
2017-05-21 13:18:14 +02:00
ines
d82ae9a585
Change "function" to "callable" in docs
2017-05-21 13:17:40 +02:00
ines
ee3fdffffb
Move attributes and remove deprecated methods
2017-05-21 01:18:31 +02:00
ines
1cb2c86f9a
Update CLI docs
2017-05-21 01:13:05 +02:00
ines
272a8981c3
Add model tag to spacy.load API docs
2017-05-21 01:12:43 +02:00
ines
3871157d84
Update spacy.util documentation
2017-05-21 01:12:09 +02:00
ines
da12aee0c1
Update spacy.load with note on get_lang_class
2017-05-21 00:19:26 +02:00
ines
27de0834b2
Update docstrings and API docs for Lexeme
2017-05-20 15:13:42 +02:00
ines
7ed8a92ed1
Update docstrings and API docs for Token
2017-05-20 15:13:33 +02:00
ines
4ed6a36622
Update docstrings and API docs for Matcher
2017-05-20 14:43:10 +02:00
ines
39f36539f6
Update docstrings and API docs for Matcher
2017-05-20 14:32:34 +02:00
ines
c00ff257be
Update docstrings and API docs for Matcher
2017-05-20 14:26:10 +02:00
ines
463e3cc80f
Remove resize_vectors and vectors_length
2017-05-20 14:02:14 +02:00
ines
f0cc642bb9
Update docstrings and API docs for Vocab
2017-05-20 14:00:41 +02:00
Matthew Honnibal
a93276bb78
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-20 13:55:12 +02:00
Matthew Honnibal
ce9234f593
Update Matcher API
2017-05-20 13:54:53 +02:00
ines
8b14476253
Fix typo
2017-05-20 13:00:13 +02:00
ines
6557ff9e85
Update example
2017-05-20 13:00:07 +02:00
ines
fea4925f41
Reorganise API docs navigation
2017-05-20 12:59:57 +02:00
ines
b2678372c7
Add API docs for top-level spaCy functions
...
i.e. spacy.load(), spacy.info(), spacy.explain()
2017-05-20 12:59:44 +02:00
ines
797f10ab16
Update formatting
2017-05-20 12:59:16 +02:00
ines
e10c48210d
Update Matcher API and workflow to reflect new API
...
on_match is now the second positional argument, to easily allow a
variable number of patterns while keeping the method clean and readable.
2017-05-20 12:59:03 +02:00
ines
eb521af267
Fix formatting
2017-05-20 12:58:15 +02:00
ines
7973912114
Update CLI docs
2017-05-20 12:58:05 +02:00
ines
5163a4513e
Update API docs
2017-05-20 01:43:48 +02:00
ines
e3256e7406
Update Matcher API docs
2017-05-20 01:38:34 +02:00
ines
0cabf9e13f
Fix model tag
2017-05-20 01:38:14 +02:00
ines
fe5d8819ea
Update Matcher docstrings and API docs
2017-05-19 21:47:06 +02:00
ines
c8580da686
Update "requires model" tags
2017-05-19 20:24:46 +02:00
ines
c3e903e4c2
Update examples and API docs
2017-05-19 19:59:02 +02:00
ines
e9e62b01b0
Update docstrings and API docs for Token
2017-05-19 18:47:56 +02:00
ines
62ceec4fc6
Update docstrings and API docs for Span
2017-05-19 18:47:46 +02:00
ines
23f9a3ccc8
Update docstrings and API docs for Doc
2017-05-19 18:47:39 +02:00
ines
2c8c9dc0c9
Update docstrings and API docs for Language
2017-05-19 18:47:24 +02:00
ines
0791f0aae6
Update docstrings and API docs for Span class
2017-05-19 00:31:31 +02:00
ines
5b68579eb8
Use returns/yields instead of return/yield
2017-05-19 00:02:34 +02:00
ines
b687ad109d
Update docstrings and API docs for Doc class
2017-05-18 23:59:44 +02:00
ines
d42bc16868
Update docstrings and API docs for Language class
2017-05-18 23:57:38 +02:00
ines
b87066ff10
Update docstrings and API docs for Doc class
2017-05-18 22:17:41 +02:00
ines
476b8209fe
Update docs with new Jupyter auto-detection
2017-05-18 14:58:17 +02:00
ines
02a4841e7b
Move CLI docs to API reference
2017-05-17 12:04:03 +02:00
ines
d7244ae72d
Add docs on collapse_punct option
2017-05-15 13:51:33 +02:00
ines
c33bdeb564
Use uppercase for entity types
2017-05-15 01:24:57 +02:00
ines
cf7e5ed534
Use American spelling for "visualizers"
...
Kinda sucks because we normally use British spelling, but it just looks
weird and confusing otherwise... same with tokenizer and all other
library internals. So this is sort of the "official policy" for now.
2017-05-14 23:29:36 +02:00
ines
fe5a5086e1
Fix typo
2017-05-14 23:27:56 +02:00
ines
1ae07da18f
Add API docs for spacy.displacy (see #1058 )
2017-05-14 19:31:23 +02:00
ines
b462076d80
Merge load_lang_class and get_lang_class
2017-05-14 01:31:10 +02:00
ines
1465c6c221
Add API docs for util functions
2017-05-13 21:23:12 +02:00
ines
19879cb693
Update alpha support docs
2017-05-12 15:57:49 +02:00
ines
63d79947c8
Update title in navigation
2017-05-12 15:40:43 +02:00
ines
531ee1373b
Rename "Language models" to "Languages" in API
2017-05-12 15:38:56 +02:00
ines
fac3566aac
Add descriptions to POS tagging scheme
2017-05-03 20:11:02 +02:00
ines
1570b83ee5
Add spacy.explain() note to NER annotation scheme
2017-05-03 20:11:02 +02:00
ines
219369bb7d
Add detailed docs for dependency label annotations
2017-05-03 20:11:02 +02:00
ines
f9384b0fbd
Update alpha languages and add aside for tokenizer dependencies
2017-05-03 09:58:31 +02:00
Yasuaki Uechi
0e7a9b9fac
Add Japanese to 'Alpha support’ section
2017-05-03 13:56:45 +09:00
ines
034ec5710b
Fix typo and add Norwegian to alpha languages
2017-04-27 11:24:21 +02:00
ines
375edf0bb5
Add list of models and include French
2017-04-26 20:50:27 +02:00
ines
ddd5194088
Update Language docs and docstrings
2017-04-17 01:52:13 +02:00
ines
aad80a291f
Add save_to_directory method to API docs
2017-04-17 01:40:34 +02:00
ines
13df2d6a60
Add documentation for spaCy's JSON format
2017-03-26 15:56:15 +02:00
ines
a5fc5fb0db
Add Hebrew to list of alpha languages
2017-03-25 10:22:46 +01:00
ines
9600cd1b9e
Fix download commands
2017-03-25 10:22:05 +01:00
ines
d25f17f139
Add Bengali to list of languages (see #865 )
2017-03-01 15:59:21 +01:00
ines
2b07ab7db4
Add feature scheme to API docs (see #857 , #739 )
2017-02-24 18:26:32 +01:00
Ines Montani
49a102aff3
Merge pull request #841 from jondoughty/patch-1
...
Updated Token class documentation
2017-02-16 23:47:51 +01:00
Jon Doughty
12a8757343
Update token.jade
2017-02-16 10:55:33 -08:00
nycmonkey
8946a2a496
Fix typo in IOB integer to letter map
...
ent_iob value for an ent.iob_ value of 'B' should be 3, not B
2017-02-16 13:49:57 -05:00
ines
a44da8fb34
Update language models and alpha support overview
2017-02-04 13:49:05 +01:00
Hidekazu Oiwa
7806ebafd2
Fix the span doc typo
...
Fix the typo in the span API doc.
It explains the `end` of the span as the `start_char` description.
2017-01-17 20:37:14 -08:00
jktong
df0aeff379
Correct typo "chldren" in doc.jade
2017-01-16 09:34:59 -05:00
Ines Montani
d677db6277
Change "Multi-language support" to amber for spaCy
2017-01-03 21:24:35 +01:00
Ines Montani
d1585959d9
Add Hungarian to alpha support overview
2016-12-27 22:31:41 +01:00
Ines Montani
71c00db8a5
Update language models page
2016-12-21 00:54:54 +01:00
Ines Montani
ddf5c5bb61
Generalise dependency parsing annotation specs beyond English ( closes #657 )
2016-12-19 13:42:44 +01:00
Ines Montani
6a793251c8
Add aside on spaCy's custom pronoun lemma
2016-12-19 13:41:47 +01:00
Ines Montani
614ca6fb41
Split annotation specs into files to they can be included in different places
2016-12-18 17:42:10 +01:00
David Edwards
278199dd2c
Update index.jade
2016-12-15 13:40:53 -08:00
Ines Montani
ada007cb73
Fix formatting for consistency
2016-11-25 15:53:40 +01:00
Ines Montani
19f27cc6ef
Use consistent entity tables across docs
2016-11-25 15:48:50 +01:00
Mark Amery
b4e1dc0e3f
Fix a bunch of missing spaces of the website
2016-11-20 17:02:45 +00:00
Ines Montani
5e4e5b600f
Update language models docs
2016-11-05 02:50:55 +01:00
Ines Montani
c748474a9e
Fix formatting
2016-11-03 01:52:31 +01:00
Ines Montani
2515b32a74
Add documentation for Tokenizer API (see #600 )
2016-11-02 23:18:02 +01:00
Ines Montani
2c65c15d7a
Fix typo
2016-11-02 11:25:09 +01:00
Ines Montani
823e47d946
Add language models to API docs ( fixes #598 )
2016-11-02 11:24:13 +01:00
Ines Montani
201445b3b8
Fix benchmarks intro
2016-10-31 20:55:59 +01:00
Ines Montani
7615b41bff
Update to new website
2016-10-31 19:04:15 +01:00