Ines Montani
114cb18892
Improve wording
2019-07-17 15:27:53 +02:00
Ines Montani
7522beef9e
Add "Things to try" prompts
2019-07-17 15:25:02 +02:00
Ines Montani
9f02e3c027
Adjust example
...
Not actually supported in this alignment interpretation
2019-07-17 15:13:50 +02:00
Ines Montani
1ea472468a
Add usage docs for aligning tokenization
2019-07-17 15:08:33 +02:00
Ines Montani
f97a555445
Add API documentation
2019-07-17 14:30:04 +02:00
pmbaumgartner
9a86d95ea2
fix custom attribute links
2019-07-14 20:23:54 -04:00
Ines Montani
40cd03fc35
Improve EntityRuler serialization
2019-07-10 12:25:45 +02:00
Ines Montani
8721849423
Update Scorer.ents_per_type
2019-07-10 11:19:28 +02:00
Ines Montani
ebe58e7fa1
Document gold.docs_to_json [ci skip]
2019-07-10 10:27:33 +02:00
Ines Montani
881f5bc401
Auto-format
2019-07-10 10:27:29 +02:00
Björn Böing
205c73a589
Update tokenizer and doc init example ( #3939 )
...
* Fix Doc.to_json hyperlink
* Update tokenizer and doc init examples
* Change "matchin rules" to "punctuation rules"
* Auto-format
2019-07-10 10:16:48 +02:00
Björn Böing
04982ccc40
Update pretrain to prevent unintended overwriting of weight fil… ( #3902 )
...
* Update pretrain to prevent unintended overwriting of weight files for #3859
* Add '--epoch-start' to pretrain docs
* Add mising pretrain arguments to bash example
* Update doc tag for v2.1.5
2019-07-09 21:48:30 +02:00
Joshua Smith
2eb925bd05
Added an argument to EntityRuler
constructor to pass attrs to… ( #3919 )
...
* Perserve flags in EntityRuler
The EntityRuler (explosion/spaCy#3526 ) does not preserve
overwrite flags (or `ent_id_sep`) when serialized. This
commit adds support for serialization/deserialization preserving
overwrite and ent_id_sep flags.
* add signed contributor agreement
* flake8 cleanup
mostly blank line issues.
* mark test from the issue as needing a model
The test from the issue needs some language model for serialization
but the test wasn't originally marked correctly.
* Adds `phrase_matcher_attr` to allow args to PhraseMatcher
This is an added arg to pass to the `PhraseMatcher`. For example,
this allows creation of a case insensitive phrase matcher when the
`EntityRuler` is created. References explosion/spaCy#3822
* remove unneeded model loading
The model didn't need to be loaded, and I replaced it with
a change that doesn't require it (using existings fixtures)
* updated docstring for new argument
* updated docs to reflect new argument to the EntityRuler constructor
* change tempdir handling to be compatible with python 2.7
* return conflicted code to entityruler
Some stuff got cut out because of merge conflicts, this
returns that code for the phrase_matcher_attr.
* fixed typo in the code added back after conflicts
* flake8 compliance
When I deconflicted the branch there were some flake8 issues
introduced. This resolves the spacing problems.
* test changes: attempts to fix flaky test in python3.5
These tests seem to be alittle flaky in 3.5 so I changed the check to avoid
the comparisons that seem to be fail sometimes.
2019-07-09 20:09:17 +02:00
Ines Montani
d361e380b8
Fix matcher callback example ( closes #3862 )
2019-06-26 14:47:26 +02:00
Guillaume Claret
d7a519a922
Typo ( #3865 )
...
* Typo
* Add contributor agreement
2019-06-20 10:31:19 +02:00
Björn Böing
ebf5a04d6c
Update pretrain docs and add unsupported loss_func error ( #3860 )
...
* Add error to `get_vectors_loss` for unsupported loss function of `pretrain`
* Add missing "--loss-func" argument to pretrain docs. Update pretrain plac annotations to match docs.
* Add missing quotation marks
2019-06-20 10:30:44 +02:00
Alejandro Alcalde
4866a7ee9e
Changed learning rate by its param name. ( #3855 )
...
* Changed learning rate by its param name.
I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful.
* Signing SCA
2019-06-20 10:29:20 +02:00
Ines Montani
81c12640ab
Auto-format [ci skip]
2019-06-16 14:33:20 +02:00
Greg Werner
9041a72d7f
Update tokenizer.md for construction example ( #3790 )
...
* Update tokenizer.md for construction example
Self contained example. You should really say what nlp is so that the example will work as is
* Update CONTRIBUTOR_AGREEMENT.md
* Restore contributor agreement
* Adjust construction examples
2019-06-16 14:32:56 +02:00
BreakBB
d8573ee715
Update error raising for CLI pretrain to fix #3840 ( #3843 )
...
* Add check for empty input file to CLI pretrain
* Raise error if JSONL is not a dict or contains neither `tokens` nor `text` key
* Skip empty values for correct pretrain keys and log a counter as warning
* Add tests for CLI pretrain core function make_docs.
* Add a short hint for the `tokens` key to the CLI pretrain docs
* Add success message to CLI pretrain
* Update model loading to fix the tests
* Skip empty values and do not create docs out of it
2019-06-16 13:22:57 +02:00
Motoki Wu
9c064e6ad9
Add resume logic to spacy pretrain ( #3652 )
...
* Added ability to resume training
* Add to readmee
* Remove duplicate entry
2019-06-12 13:29:23 +02:00
Ramanan Balakrishnan
eb12703d10
minor fix to broken link in documentation ( #3819 ) [ci skip]
2019-06-04 11:15:35 +02:00
Ines Montani
0c74506c9c
Fix typos in docs ( closes #3802 ) [ci skip]
2019-06-01 11:35:01 +02:00
Nipun Sadvilkar
1f13005751
Incorrect Token attribute ent_iob_ description ( #3800 )
...
* Incorrect Token attribute ent_iob_ description
* Add spaCy contributor agreement
2019-05-31 16:50:45 +02:00
Ramanan Balakrishnan
26c37c5a4d
fix all references to BILUO annotation format ( #3797 )
2019-05-31 12:19:19 +02:00
mak
89379a7fa4
Corrected example model URL in requirements.txt ( #3786 )
...
The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).
2019-05-29 10:51:55 +02:00
Ines Montani
7634812172
Document Language.evaluate
2019-05-24 14:06:36 +02:00
Ines Montani
45e6855550
Update Language.update docs
2019-05-24 14:06:26 +02:00
Ines Montani
b78a8dc1d2
Update Scorer and add API docs
2019-05-24 14:06:04 +02:00
Ines Montani
321c9f5acc
Fix lex_id docs ( closes #3743 )
2019-05-16 23:15:58 +02:00
Ines Montani
f96af8526a
Merge branch 'spacy.io' [ci skip]
2019-05-11 23:03:56 +02:00
Ines Montani
7534f7cb44
Fix return value of Language.update ( closes #3692 )
2019-05-11 18:40:19 +02:00
devforfu
21af12eb53
Make "text" key in JSONL format optional when "tokens" key is provided ( #3721 )
...
* Fix issue with forcing text key when it is not required
* Extending the docs to reflect the new behavior
2019-05-11 15:41:29 +02:00
Ines Montani
6cfa1e1f47
Fix DependencyParser.predict docs ( resolves #3561 )
2019-05-11 15:37:54 +02:00
Ines Montani
25f5592d57
Improve Token.prob and Lexeme.prob docs ( resolves #3701 )
2019-05-11 15:23:41 +02:00
Aaron Kub
719a15f23d
fixing regex matcher examples ( #3708 ) ( #3719 )
2019-05-10 14:23:52 +02:00
Ines Montani
65b55f1aaa
Add version tag to --base-model
argument ( closes #3720 )
2019-05-10 14:06:47 +02:00
Ines Montani
505c9e0e19
Add util.filter_spans helper ( #3686 )
2019-05-08 02:33:40 +02:00
张晓飞
ba1ff00370
update response after calling add_pipe ( #3661 )
...
* update response after calling add_pipe
component:print_info is appened in the last, so need show it at the end of pipeline
* Create henry860916.md
2019-05-01 12:02:18 +02:00
Ramiro Gómez
8ee4100f8f
Remove dangling M ( #3657 )
...
I assume this is a typo. Sorry if it has a meaning that I'm not aware of.
2019-04-29 19:44:43 +02:00
Amit Chaudhary
167d63af31
Fix broken link to Dive Into Python 3 website ( #3656 )
...
* Fix broken link to Dive Into Python 3 website
* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Ivan Tham
fa94f83697
Improve redundant variable name ( #3643 )
...
* Improve redundant variable name
* Apply suggestions from code review
Co-Authored-By: pickfire <pickfire@riseup.net>
2019-04-26 16:50:14 +02:00
Ines Montani
ec0d840ab5
Document early stopping
2019-04-22 14:31:32 +02:00
Ines Montani
1d567913f9
Update spacy evaluate example
2019-04-22 14:28:42 +02:00
Ines Montani
7917ce2f73
Make flag shortcut consistent and document
2019-04-22 14:23:44 +02:00
Ines Montani
52658c80d5
Allow jupyter=False to override Jupyter mode ( closes #3598 )
2019-04-22 14:18:32 +02:00
Motoki Wu
8e2cef49f3
Add save after --save-every
batches for spacy pretrain
( #3510 )
...
<!--- Provide a general summary of your changes in the title. -->
When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches.
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
To test...
Save this file to `sample_sents.jsonl`
```
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
```
Then run `--save-every 2` when pretraining.
```bash
spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2
```
And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name.
At the end the training, you should see these files (`ls here/`):
```bash
config.json model2.bin model5.bin model8.bin
log.jsonl model2.temp.bin model5.temp.bin model8.temp.bin
model0.bin model3.bin model6.bin model9.bin
model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin
model1.bin model4.bin model7.bin
model1.temp.bin model4.temp.bin model7.temp.bin
```
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
This is a new feature to `spacy pretrain`.
🌵 **Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error).**
```
Processing matcher.pyx
[Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx'
Traceback (most recent call last):
File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module>
run(args.root)
File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run
process(base, filename, db)
File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process
preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd
func(*args)
File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx
raise Exception("Cython failed")
Exception: Cython failed
Traceback (most recent call last):
File "setup.py", line 276, in <module>
setup_package()
File "setup.py", line 209, in setup_package
generate_cython(root, "spacy")
File "setup.py", line 132, in generate_cython
raise RuntimeError("Running cythonize failed")
RuntimeError: Running cythonize failed
```
Edit: Fixed! after deleting all `.cpp` files: `find spacy -name "*.cpp" | xargs rm`
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-22 14:10:16 +02:00
Ines Montani
0dce4585b1
Add course to 101
2019-04-19 15:59:51 +02:00
Ines Montani
2efc87c382
Remove unused image
2019-04-19 15:48:12 +02:00
Ines Montani
38395d9518
Merge branch 'spacy.io'
2019-04-19 15:26:20 +02:00
Ines Montani
7ac5bb0a7b
Update landing and feature overview
2019-04-19 15:23:08 +02:00
fizban99
f2f2df6e78
entity types for colors should be in uppercase ( #3599 )
...
although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.
2019-04-17 11:22:56 +02:00
Ines Montani
5289dd1356
Fix formatting
2019-04-13 17:58:26 +02:00
Ines Montani
9e7deeaf48
Remove Datacamp
2019-04-13 17:46:32 +02:00
Santiago Castro
86e4b68aa9
Fix website docs for Vectors.from_glove ( #3565 )
...
* Fix website docs for Vectors.from_glove
* Add myself as a contributor
2019-04-10 15:23:27 +02:00
Bharat Raghunathan
72820896d4
Fix typo in web docs cli.md ( #3559 )
2019-04-09 11:40:03 +02:00
pierremonico
0d26bfe677
Removes duplicate in table ( #3550 )
...
* Removes duplicate in table
Just fixing typos.
* Remove newline
Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Ines Montani
2f0f439c54
Remove non-existent example ( closes #3533 )
2019-04-03 09:59:17 +02:00
Samuel Kane
06a1846379
fix(util): fix decaying function output ( #3495 )
...
* fix(util): fix decaying function output
* fix(util): better test and adhere to code standards
* fix(util): correct variable name, pytestify test, update website text
2019-03-28 13:24:47 +01:00
Bharat Raghunathan
1db3e47509
DOC: Update tokenizer docs to include default value for batch_size in pipe ( #3492 )
2019-03-28 12:48:02 +01:00
Ines Montani
200d8bdb3c
Merge branch 'spacy.io' [ci skip]
2019-03-23 16:46:34 +01:00
Ines Montani
1e5b917d75
Fix formatting [ci skip]
2019-03-23 16:45:50 +01:00
Matthew Honnibal
6c783f8045
Bug fixes and options for TextCategorizer ( #3472 )
...
* Fix code for bag-of-words feature extraction
The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).
* Support 'bow' architecture for TextCategorizer
This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.
* Fix size limits in train_textcat example
* Explain architectures better in docs
2019-03-23 16:44:44 +01:00
Ines Montani
06bf130890
💫 Add better and serializable sentencizer ( #3471 )
...
* Add better serializable sentencizer component
* Replace default factory
* Add tests
* Tidy up
* Pass test
* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani
b532386a60
Fix typo [ci skip]
2019-03-22 18:36:17 +01:00
Ines Montani
5073ce63fd
Merge branch 'spacy.io' [ci skip]
2019-03-22 15:17:11 +01:00
Ines Montani
0712efc6b3
Update version requirements [ci skip]
2019-03-21 10:23:54 +01:00
Ines Montani
dac8f8ff99
Update Span.__init__ docs (see #3445 ) [ci skip]
2019-03-20 17:24:17 +01:00
Ines Montani
d4eed4a84f
Add note on unicode build to troubleshooting guide (see #3421 ) [ci skip]
2019-03-19 10:27:02 +01:00
Ines Montani
08284f3a11
💫 v2.1.0 launch updates (only merge on launch!) ( #3414 )
...
* Update README.md
* Use production docsearch [ci skip]
* Add option to exclude pages from search
2019-03-18 16:07:26 +01:00
Ines Montani
a611b32fbf
Update model docs [ci skip]
2019-03-17 11:48:18 +01:00
Matthew Honnibal
62afa64a8d
Expose batch size and length caps on CLI for pretrain ( #3417 )
...
Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`.
Also improve CLI output.
Closes #3216
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:38:45 +01:00
Ines Montani
2c5dd4d602
Update Vectors.find docs [ci skip]
2019-03-16 17:10:57 +01:00
Ines Montani
cbcba699dd
Fix missing ids
2019-03-14 17:56:53 +01:00
Ines Montani
4cfe4aa224
Fix small issues in the docs [ci skip]
2019-03-12 22:57:15 +01:00
Ines Montani
ba7eb2d131
Update section [ci skip]
2019-03-12 16:18:34 +01:00
Ines Montani
cecc31b765
Don't auto-slugify accordion links [ci skip]
2019-03-12 15:30:49 +01:00
Ines Montani
72fb324d95
Add vector training script to bin [ci skip]
2019-03-12 12:07:56 +01:00
Ines Montani
3abf0e6b9f
Replace dev-resources links with real examples
2019-03-12 12:07:40 +01:00
Ines Montani
59c0620487
Auto-format
2019-03-12 12:07:11 +01:00
Ines Montani
cdd418b93e
Auto-format [ci skip]
2019-03-11 17:10:50 +01:00
Matthew Honnibal
b0b990e405
Fix token.conjuncts ( closes #795 ) ( #3392 )
...
* Implement conjuncts method
* Add span.conjuncts property
* Un-xfail token.conjuncts tests
* Update docs for token.conjuncts and span.conjuncts
* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Ines Montani
25cb764e64
Document new API [ci skip]
2019-03-11 15:23:53 +01:00
Ines Montani
ebcf2bb1c3
Add Doc.lang and Doc.lang_
2019-03-11 14:21:40 +01:00
Ines Montani
7c05ca01e8
💫 Support mutable default values for extension attributes ( #3389 )
...
* Support mutable default values in extensions
* Update documentation
2019-03-11 12:50:44 +01:00
Matthew Honnibal
98acf5ffe4
💫 Allow passing of config parameters to specific pipeline components ( #3386 )
...
* Add component_cfg kwarg to begin_training
* Document component_cfg arg to begin_training
* Update docs and auto-format
* Support component_cfg across Language
* Format
* Update docs and docstrings [ci skip]
* Fix begin_training
2019-03-10 23:36:47 +01:00
Ines Montani
8dbf1e9037
Also fix #3387 on develop
2019-03-10 23:36:28 +01:00
Ines Montani
7ba3a5d95c
💫 Make serialization methods consistent ( #3385 )
...
* Make serialization methods consistent
exclude keyword argument instead of random named keyword arguments and deprecation handling
* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani
9a8f169e5c
Update v2-1.md
2019-03-10 18:58:51 +01:00
Ines Montani
0426689db8
💫 Improve Doc.to_json and add Doc.is_nered ( #3381 )
...
* Use default return instead of else
* Add Doc.is_nered to indicate if entities have been set
* Add properties in Doc.to_json if they were set, not if they're available
This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.
2019-03-10 15:24:34 +01:00
Ines Montani
76764fcf59
💫 Improve converters and training data file formats ( #3374 )
...
* Populate converter argument info automatically
* Add conversion option for msgpack
* Update docs
* Allow reading training data from JSONL
2019-03-08 23:15:23 +01:00
Ines Montani
296446a1c8
Tidy up and improve docs and docstrings ( #3370 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs
### Types of change
enhancement, docs
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani
fa7314b221
Clarify train_path and dev_path format (see #3366 ) [ci skip]
2019-03-07 12:23:27 +01:00
Ines Montani
e9babd9973
Update hyperparameters section (see #3352 )
2019-03-06 14:40:30 +01:00
Ines Montani
48a206a95f
Fix displaCy visualizations in docs ( closes #3357 ) [ci skip]
2019-03-06 13:20:44 +01:00
Ines Montani
5eadf61327
Update pretraining docs on file format ( closes #3354 )
2019-03-04 16:30:13 +00:00
Ines Montani
1d4ba7678f
Auto-format [ci skip]
2019-02-27 12:07:35 +01:00
Matthew Honnibal
f1d77eb140
💫 Improve handling of missing NER tags ( closes #2603 ) ( #3341 )
...
* Improve handling of missing NER tags
GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.
Fix bug that occurred when first tag was a None value. Closes #2603 .
* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Ines Montani
c478a2ccb6
Update backwards incompat [ci skip]
2019-02-27 11:56:56 +01:00
Matthew Honnibal
4a3371acd5
Make doc[0].is_sent_start == True ( closes #2869 ) ( #3340 )
...
* Make doc[0] have sent_start True. Closes #2869
* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Ines Montani
1b6238101a
Add table explaining training metrics [ closes #2644 ]
2019-02-25 10:03:43 +01:00
Ines Montani
d0b3af9222
Fix remaining inaccuracies in API docs ( closes #2329 )
2019-02-24 22:21:25 +01:00
Ines Montani
62b558ab72
💫 Support lexical attributes in retokenizer attrs ( closes #2390 ) ( #3325 )
...
* Fix formatting and whitespace
* Add support for lexical attributes (closes #2390 )
* Document lexical attribute setting during retokenization
* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani
aa52305461
Improve pipeline model and meta example [ci skip]
2019-02-24 18:45:39 +01:00
Ines Montani
df19e2bff6
💫 Allow setting of custom attributes during retokenization ( closes #3314 ) ( #3324 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.
```python
Token.set_extension('is_musician', default=False)
doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
retokenizer.merge(doc[2:4], attrs=attrs)
assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani
403b9cd58b
Add docs on adding to existing tokenizer rules [ci skip]
2019-02-24 18:35:19 +01:00
Ines Montani
1ea1bc98e7
Document regex utilities [ci skip]
2019-02-24 18:34:10 +01:00
Ines Montani
46ec5cdccc
Update TextCategorizer docs
2019-02-24 13:11:57 +01:00
Ines Montani
c03cb1cc63
Improve built-in component API docs
2019-02-24 13:11:49 +01:00
Ines Montani
383e2e1f12
Update Python versions [ci skip]
2019-02-24 11:49:45 +01:00
Ines Montani
b624cb4b89
Update v2-1.md
2019-02-24 11:49:27 +01:00
Ines Montani
250e88ef55
Fix docs example (see #2728 )
2019-02-21 14:22:06 +01:00
Ines Montani
0fc908d7a5
Add note on merging speed in v2.1 (see #3300 ) [ci skip]
2019-02-21 12:34:18 +01:00
Ines Montani
236aa94ded
Update v2-1.md
2019-02-21 12:33:56 +01:00
Sofie
9a478b6db8
Clean up of char classes, few tokenizer fixes and faster default French tokenizer ( #3293 )
...
* splitting up latin unicode interval
* removing hyphen as infix for French
* adding failing test for issue 1235
* test for issue #3002 which now works
* partial fix for issue #2070
* keep the hyphen as infix for French (as it was)
* restore french expressions with hyphen as infix (as it was)
* added succeeding unit test for Issue #2656
* Fix issue #2822 with custom Italian exception
* Fix issue #2926 by allowing numbers right before infix /
* splitting up latin unicode interval
* removing hyphen as infix for French
* adding failing test for issue 1235
* test for issue #3002 which now works
* partial fix for issue #2070
* keep the hyphen as infix for French (as it was)
* restore french expressions with hyphen as infix (as it was)
* added succeeding unit test for Issue #2656
* Fix issue #2822 with custom Italian exception
* Fix issue #2926 by allowing numbers right before infix /
* remove duplicate
* remove xfail for Issue #2179 fixed by Matt
* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Ines Montani
57ae71ea95
Add docs on serializing the pipeline (see #3289 ) [ci skip]
2019-02-18 14:13:29 +01:00
Ines Montani
38e4422c0d
Improve matcher example ( resolves #3287 )
2019-02-18 13:26:37 +01:00
Ines Montani
660cfe44c5
Fix formatting
2019-02-18 13:26:22 +01:00
Ines Montani
212ff359ef
Fix links [ci skip]
2019-02-17 22:25:50 +01:00
Ines Montani
04b4df0ec9
Remove n_threads
2019-02-17 22:25:42 +01:00
Ines Montani
e597110d31
💫 Update website ( #3285 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org ) with [Remark](https://github.com/remarkjs/remark ) and [MDX](https://mdxjs.com/ ). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/ ) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com ) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270 . Resolves #3222 . Resolves #2947 . Resolves #2837 .
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00
ines
808f7ee417
Update API documentation
2017-10-03 14:27:22 +02:00
ines
3f4fd2c5d5
Update usage documentation
2017-10-03 14:26:20 +02:00
Reza Gharibi
0461b82158
Fix typos
2017-09-27 03:56:20 +03:30
Reza Gharibi
fa1844b132
Fix typo
2017-09-27 03:55:54 +03:30
Reza Gharibi
b5dd7e7cc4
Fix typo
2017-09-27 03:55:28 +03:30
Ines Montani
b8e81daccf
Fix typo ( closes #1312 )
2017-09-14 12:49:59 +02:00
ines
d15775c3ad
Fix typos and commands in alpha docs
2017-08-21 13:40:11 +02:00
ines
3c33003078
Port over typo corrections from #1245
2017-08-20 12:00:17 +02:00
ines
1261b01e46
Update Doc.char_span docs
2017-08-19 16:34:32 +02:00
ines
5cb0200e63
Document new Span.to_array() method
2017-08-19 12:45:28 +02:00
ines
471eed4126
Add example to Span.merge()
2017-08-19 12:45:16 +02:00
ines
404d3067b8
Document new Doc.char_span() method
2017-08-19 12:45:00 +02:00
ines
d53cbf369f
Document as_tuples kwarg on Language.pipe()
2017-08-19 12:44:50 +02:00
ines
6a37c93311
Update argument type
2017-08-19 12:44:33 +02:00
ines
4731d50220
Add break utility for long nowrap items (e.g. code)
2017-08-19 12:44:23 +02:00
ines
0aba11b64b
Update package command docs
2017-08-14 16:45:44 +02:00
ines
a29f132ffd
Change python -m spacy to spacy
...
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
Nikolai Kruglikov
08e443e083
Fix small typo in documentation
2017-08-14 12:19:04 +02:00
ines
ab8ffbaab7
Add text classification to v2 overview
2017-07-22 17:56:51 +02:00
ines
f085b88f9d
Add TextCategorizer API docs stub
2017-07-22 17:56:33 +02:00
ines
ab1a4e8b3c
Add Tensorizer API docs stub
2017-07-22 17:56:25 +02:00
ines
0fb89dd204
Add text classification usage guide template
2017-07-22 17:56:07 +02:00
ines
d05ab1b3a0
Add text classification to 101 overview and change order
2017-07-22 17:55:53 +02:00
ines
d2a7e5b8e5
Add GoldParse.cats attribute
2017-07-22 17:55:35 +02:00
ines
23d976ed00
Add Doc.cats attribute and missing v2 tag
2017-07-22 17:55:14 +02:00
Ines Montani
1ddbeddca2
Fix typo
2017-07-22 15:00:58 +02:00
Jarle Mathiesen
f20533ec0c
fix small typo
2017-06-24 12:31:33 +02:00
Savva Kolbachev
800a8faff4
Changed the capital of Lithuania to Vilnius
...
Hi,
There is a typo about the capital of Lithuania.
Vilnius is the capital of Lithuania https://en.wikipedia.org/wiki/Vilnius
Ljubljana is the capital of Slovenia https://en.wikipedia.org/wiki/Ljubljana
2017-06-12 23:27:00 +03:00
Ines Montani
57f64b9e1c
Merge pull request #1124 from v3t3a/patch-3
...
docs - Fix url error for Displacy Ent visualizer
2017-06-12 21:20:32 +02:00
Ines Montani
b2a28028cf
Merge pull request #1115 from v3t3a/patch-2
...
docs - Add read() method when opening file (Lightning tour)
2017-06-12 21:19:25 +02:00
Ines Montani
fe8d136ae0
Merge pull request #1114 from v3t3a/patch-1
...
docs - Update doc.jade (Just remove a duplicate 'doc =')
2017-06-12 21:19:02 +02:00
Vetea
eae1f7b19c
Fix url error for Displacy Ent visualizer
2017-06-12 14:30:02 +02:00
ines
49026a1346
Fix typos in example (see #1105 )
2017-06-08 19:15:50 +02:00
Vetea
cc3aee1189
Add read() method when opening file
...
Add read() method for
to avoid :
```TypeError: Argument 'string' has incorrect type (expected str, got _io.TextIOWrapper)```
Test with:
spaCy : v2.0.0 Alpha
python : 3.5.2+ (default, Sep 22 2016, 12:18:14)
2017-06-08 11:27:09 +02:00
Vetea
8e20cf6368
Update doc.jade
...
Just remove a duplicate 'doc ='
2017-06-08 10:35:58 +02:00
ines
6b799bac54
Fix formatting and details
2017-06-06 14:37:49 +02:00
ines
fd9ae0f0e0
Update v2 comparison table
2017-06-05 16:39:11 +02:00
ines
a3f9745a14
Update similarity usage guide and examples
2017-06-05 15:37:33 +02:00
ines
fd35d910b8
Update v2 docs and benchmarks
2017-06-05 14:13:38 +02:00
ines
9f55c0d4f6
Add Vectors class
2017-06-05 13:33:11 +02:00
ines
040553ca59
Update architecture and features table
2017-06-05 13:33:01 +02:00
ines
e204788c30
Add docs for util.load_model_from_path
2017-06-05 13:18:22 +02:00
ines
efc37ea3de
Update train CLI
2017-06-04 23:45:14 +02:00
ines
505d43b832
Update norms example
2017-06-04 23:33:26 +02:00
ines
f8e93b6d0a
Update norms example
2017-06-04 23:24:29 +02:00
ines
a857b2b511
Update norms example
2017-06-04 23:21:37 +02:00
ines
47d066b293
Add under construction
2017-06-04 23:17:54 +02:00
ines
e9816daa6a
Add details on syntax iterators
2017-06-04 23:16:33 +02:00
ines
990cb81556
Add info on syntax iterators
2017-06-04 21:47:22 +02:00
ines
e4eb33daf7
Add links to production use guide
2017-06-04 20:56:58 +02:00
ines
63cd539d04
Add more details on model packages and requirements.txt (see #1099 )
2017-06-04 20:52:10 +02:00
ines
97ff83d163
Fix docs on model loading
2017-06-04 20:44:59 +02:00
ines
b6002db797
Add v2 label
2017-06-04 18:53:03 +02:00
ines
468ff1a7dd
Update v2 docs and add benchmarks stub
2017-06-04 15:34:28 +02:00
Matthew Honnibal
23fd6b1782
Add intro narrative for v2
2017-06-04 15:10:37 +02:00
ines
3419ecbfdd
Update docs on model shortcut links
2017-06-04 13:55:00 +02:00
ines
586e901143
Add v2 intro stub
2017-06-04 13:42:37 +02:00
ines
4f8f62d9b3
Merge branch 'v2-docs-edits' into develop
2017-06-04 13:40:58 +02:00
ines
809903dcad
Fix link and update wording
2017-06-04 13:29:20 +02:00
ines
22dd18c364
Remove redundant CPU commands
2017-06-04 13:29:13 +02:00
ines
1d6377218a
Update architecture blurb and move other info
2017-06-04 13:28:58 +02:00
ines
7a66c9f039
Fix formatting
2017-06-04 13:14:00 +02:00
Matthew Honnibal
f2c4a9f690
Edits to spacy-101 page
2017-06-04 13:10:27 +02:00
Matthew Honnibal
aca53b95e1
Link architecture blurb
2017-06-04 13:10:06 +02:00
Matthew Honnibal
64ca5123bb
Add Architecture 101 blurb
2017-06-04 13:09:19 +02:00
Matthew Honnibal
e77ed953f4
Update GPU instructions
2017-06-04 12:03:22 +02:00
ines
1d3b012e56
Update adding languages docs and add 101
2017-06-03 23:54:23 +02:00
ines
a3715a81d5
Update adding languages guide
2017-06-03 22:16:38 +02:00
ines
ec6d2bc81d
Add table of contents mixin
2017-06-03 22:16:26 +02:00
ines
9acf8686f7
Update note on compact mode issues
2017-06-03 13:31:16 +02:00
ines
b0225183c2
Update displaCy defaults
2017-06-03 13:27:06 +02:00
ines
c60431357d
Port over docs typo corrections
2017-06-03 11:31:30 +02:00
ines
c6dc2fafc0
Add Spanish and move example sentences to meta
2017-06-01 17:49:56 +02:00
ines
1bebc6392c
Add source files to pipeline components
2017-06-01 17:38:06 +02:00
ines
b577ed79ee
Move social image logic out to function and move files
2017-06-01 14:27:44 +02:00
ines
5e60b09dcd
Fix custom tokenizer example
2017-06-01 13:02:50 +02:00
ines
706cec6d58
Move annotation specs up
2017-06-01 13:02:43 +02:00
ines
8274dffad6
Update NER training draft
2017-06-01 12:51:36 +02:00
ines
04fac3f52a
Add NER training example code
2017-06-01 12:47:47 +02:00
ines
7f5e7e7320
Fix typo
2017-06-01 12:47:36 +02:00
ines
4a927154d8
Update v2 docs
2017-06-01 11:56:32 +02:00
ines
03bbb96db8
Remove outdated examples
2017-06-01 11:56:02 +02:00
ines
789e69b73f
Update training guide
2017-06-01 11:53:23 +02:00
ines
2f40d6e7e7
Add training 101
2017-06-01 11:53:16 +02:00
ines
abed463bbb
Update serialization 101
2017-06-01 11:52:58 +02:00
ines
72380c952a
Update training section in NER guide and add links
2017-06-01 11:52:49 +02:00
ines
77dca25c7f
Update Language API docs
2017-06-01 11:51:31 +02:00
ines
22b1f72870
Add spaCy 101 intro
2017-05-31 12:44:09 +02:00
ines
a18b95ca12
Update docs on testing
2017-05-31 12:43:40 +02:00
ines
981196c181
Fix typo
2017-05-31 11:34:31 +02:00
ines
f86289566a
Update new in v2 section and add note on Matcher acceptors
2017-05-30 13:53:06 +02:00
ines
ce4e45d0bb
Update 101 intro
2017-05-29 22:15:06 +02:00
ines
b5bfab8699
Add description
2017-05-29 15:27:16 +02:00
ines
687ed28340
Update processing pipelines guide
2017-05-29 14:21:00 +02:00
ines
d5992f408f
Update note on vocab consistency
2017-05-29 14:14:26 +02:00
ines
567485a818
Fix and document model loading with pipeline and overrides
2017-05-29 14:10:10 +02:00
ines
a2134951f2
Update 101 and add note on pipeline order and tensors
2017-05-29 11:45:32 +02:00
ines
17b635eaab
Update alpha docs note and fix typo
2017-05-29 11:09:24 +02:00
ines
fbe105f1eb
Add note on L in long integers in Python 2
2017-05-29 11:05:05 +02:00
ines
9d74810f6f
Update examples
2017-05-29 01:09:52 +02:00
ines
42cf414138
Update Matcher example
2017-05-29 01:09:52 +02:00
ines
00b2094dc3
Fix typos, long integers and tests
2017-05-29 01:09:52 +02:00
ines
d71c6db76e
Add missing Chainer install for GPU if building spaCy from source
2017-05-28 23:34:59 +02:00
ines
e0f9ccdaa3
Update texts and rename vectorizer to tensorizer
2017-05-28 23:26:13 +02:00
ines
606879b217
Update hash strings examples
2017-05-28 19:42:44 +02:00
ines
c7b57ea314
Update docs and change integer IDs to hash values
2017-05-28 19:25:34 +02:00
ines
738b4f7187
Add quickstart options and docs for GPU
2017-05-28 19:20:11 +02:00
ines
4c00cb8c8b
Update 101 and add community/FAQ and table of contents
2017-05-28 18:45:49 +02:00
ines
0ea31d1e31
Add under construction note to pipeline components
2017-05-28 18:44:07 +02:00
ines
8a148b6563
Fix code, links and formatting
2017-05-28 18:29:16 +02:00
ines
414193e9ba
Update docs to reflect StringStore changes
2017-05-28 18:19:11 +02:00
ines
69bda9aed7
Update text, examples, typos, wording and formatting
2017-05-28 16:41:01 +02:00
ines
f8185b8e11
Rename vocab-stringsotre to vocab
2017-05-28 16:37:14 +02:00
ines
10d05c2b92
Fix typos, wording and formatting
2017-05-28 01:30:12 +02:00
ines
eb5a8be9ad
Update language overview and add section on 'xx' lang class
2017-05-28 01:15:44 +02:00
ines
eb703f7656
Update API docs
2017-05-28 00:32:43 +02:00
ines
c1983621fb
Update util functions for model loading
2017-05-28 00:22:40 +02:00
ines
db116cbeda
Update tokenization 101 and add illustration
2017-05-28 00:22:40 +02:00
ines
b03fb2d7b0
Update 101 and usage docs
2017-05-28 00:22:40 +02:00
ines
ae11c8d60f
Add emoji sentiment to lightning tour matcher example
2017-05-27 20:02:20 +02:00
ines
22bf5f63bf
Update Matcher docs and add social media analysis example
2017-05-27 17:58:18 +02:00
ines
0d33ead507
Fix initialisation of Doc in lightning tour example
2017-05-27 17:58:06 +02:00
ines
e05bcd6aa8
Update docs to reflect flattened model meta.json
...
Don't use "setup" key and instead, keep "lang" on root level and add
"pipeline".
2017-05-27 17:57:46 +02:00
ines
70afcfec3e
Update defaults and example
2017-05-26 14:04:31 +02:00
ines
1b982f0838
Update train command and add docs on hyperparameters
2017-05-26 14:02:38 +02:00
ines
1b9c6ded71
Update API docs and add "source" button to GH source
2017-05-26 13:40:32 +02:00
ines
93ee5c4a52
Update serialization info
2017-05-26 13:22:45 +02:00
ines
f122d82f29
Update usage docs and ddd "under construction"
2017-05-26 13:17:48 +02:00
ines
286c3d0719
Update usage and 101 docs
2017-05-26 12:46:29 +02:00
ines
6d76c1ea16
Add 101 for Vocab, Lexeme and StringStore
2017-05-26 12:45:01 +02:00
ines
d48530835a
Update API docs and fix typos
2017-05-26 12:43:16 +02:00
ines
ea9474f71c
Add version tag mixin to label new features
2017-05-26 12:42:36 +02:00
ines
353f0ef8d7
Use disable argument (list) for serialization
2017-05-26 12:33:54 +02:00
ines
9063654a1a
Add Training 101 stub
2017-05-25 11:18:02 +02:00
ines
b2324be3e9
Fix typos, text, examples and formatting
2017-05-25 11:17:21 +02:00
ines
dcb10da615
Update and fix lightning tour examples
2017-05-25 11:15:56 +02:00
ines
4b5540cc63
Rewrite examples in lightning tour
2017-05-25 01:58:33 +02:00
ines
87c976e04c
Update model tag
2017-05-25 01:58:22 +02:00
ines
fe2b0b8b8d
Update migrating docs
2017-05-25 00:56:35 +02:00
ines
709ea58990
Tidy up workflows
2017-05-25 00:56:16 +02:00
ines
d122bbc908
Rewrite custom tokenizer docs
2017-05-25 00:30:21 +02:00
ines
0f48fb1f97
Rename processing text to production use and remove linear feature scheme
2017-05-25 00:10:33 +02:00
ines
419d265ff0
Add section on disabling pipeline components
2017-05-25 00:10:06 +02:00
ines
9efa662345
Update dependency parse docs and add note on disabling parser
2017-05-25 00:09:51 +02:00
ines
9337866dae
Add aside to pipeline 101 table
2017-05-24 22:46:18 +02:00
ines
c25f3133ca
Update section on new v2.0 features
2017-05-24 20:54:37 +02:00
ines
f4658ff053
Rewrite usage workflow on saving and loading
2017-05-24 20:54:02 +02:00
ines
764bfa3239
Add section on using displaCy in a web app
2017-05-24 20:53:43 +02:00
ines
4f396236f6
Update saving and loading docs
2017-05-24 19:25:49 +02:00
ines
8aaed8bea7
Add pipelines 101 and rewrite pipelines workflow
2017-05-24 19:25:13 +02:00
ines
54885b5e88
Add serialization 101
2017-05-24 19:24:40 +02:00
ines
8b86b08bed
Update usage workflows
2017-05-24 11:59:08 +02:00
ines
66088851dc
Add Doc.to_disk() and Doc.from_disk() methods
2017-05-24 11:58:17 +02:00
ines
10afb3c796
Tidy up and merge usage pages
2017-05-24 00:37:47 +02:00
ines
990a70732a
Move installation troubleshooting to installation docs
2017-05-24 00:37:21 +02:00
ines
697d3d7cb3
Fix links to CLI docs
2017-05-24 00:36:38 +02:00
ines
4fb5fb7218
Update v2 docs
2017-05-23 23:40:04 +02:00
ines
e6d88dfe08
Add features table to 101
2017-05-23 23:38:33 +02:00
ines
7ef7f0b42c
Add linguistic annotations 101 content
2017-05-23 23:37:51 +02:00
ines
9ed6b48a49
Update dependency parse workflow
2017-05-23 23:34:39 +02:00
ines
fe24267948
Update usage docs meta and navigation
2017-05-23 23:19:20 +02:00
ines
af348025ec
Update word vectors & similarity workflow
2017-05-23 23:19:09 +02:00
ines
b6c62baab3
Update What's new in v2 docs
2017-05-23 23:18:53 +02:00
ines
b6209e2427
Update POS tagging workflow
2017-05-23 23:18:08 +02:00
ines
43258d6b0a
Update NER workflow
2017-05-23 23:17:57 +02:00
ines
61cf2bba55
Fix code example
2017-05-23 23:17:37 +02:00
ines
1c06ef3542
Update spaCy architecture
2017-05-23 23:17:25 +02:00
ines
a433e5012a
Update adding languages docs
2017-05-23 23:16:44 +02:00
ines
3523715d52
Add spaCy 101 components
2017-05-23 23:16:31 +02:00
ines
a38393e2f6
Update annotation docs
2017-05-23 23:16:17 +02:00
ines
786af87ffb
Update IOB docs
2017-05-23 23:15:50 +02:00
ines
3aff883434
Add displaCy examples to lightning tour
2017-05-23 23:15:39 +02:00
ines
6ef09d7ed8
Change save_to_directory to to_disk
2017-05-23 23:15:31 +02:00
ines
c8bde2161c
Add kwargs to spacy.load
2017-05-23 23:14:02 +02:00
ines
0a8a2d2f6d
Remove tip infoboxes from annotation docs
2017-05-23 23:13:51 +02:00
ines
e6acd3bbf2
Fix matcher tests and matcher docs
2017-05-23 11:36:02 +02:00
ines
f497cf60b2
Update formatting
2017-05-23 11:32:25 +02:00
ines
4cd26bcb83
Update docs on rule-based matching and add examples
2017-05-22 19:04:02 +02:00
ines
701cba1524
Update models documentation with notes
2017-05-22 18:53:14 +02:00
ines
a23f487b06
Tidy up displaCy and add "manual" option
...
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
ines
aa9c3bd464
Fix formatting
2017-05-22 13:55:01 +02:00
ines
dddad5bf26
Update util.prints docs
2017-05-22 13:54:52 +02:00
ines
d5a6a9a6a9
Use string values for attrs in Matcher docs
2017-05-22 13:54:45 +02:00
ines
54f04a9fe0
Update API docs with changes in spacy.gold and spacy.language
2017-05-22 12:29:30 +02:00
ines
fc3ec733ea
Reduce complexity in CLI
...
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
ines
cc569a348d
Add quickstart widget to models and update docs
...
Add global variable for models and generate all model listings
programmatically
2017-05-21 20:55:52 +02:00
ines
2c5cfe8bbf
Update docstrings and API docs for StringStore
2017-05-21 14:18:58 +02:00
ines
251346b59f
Fix typos and formatting
2017-05-21 14:18:46 +02:00
ines
075f5ff87a
Update docstrings and API docs for GoldParse
2017-05-21 13:53:46 +02:00
ines
465a1dd710
Add BILUO scheme to annotation docs
2017-05-21 13:53:34 +02:00
ines
c9f04f3cd0
Add note on automated processes to download command
2017-05-21 13:23:39 +02:00
ines
8ab59515b2
Fix typo and use consistent description for from_bytes
2017-05-21 13:18:39 +02:00
ines
c5a653fa48
Update docstrings and API docs for Tokenizer
2017-05-21 13:18:14 +02:00
ines
d82ae9a585
Change "function" to "callable" in docs
2017-05-21 13:17:40 +02:00
ines
ee3fdffffb
Move attributes and remove deprecated methods
2017-05-21 01:18:31 +02:00
ines
1cb2c86f9a
Update CLI docs
2017-05-21 01:13:05 +02:00
ines
272a8981c3
Add model tag to spacy.load API docs
2017-05-21 01:12:43 +02:00
ines
3871157d84
Update spacy.util documentation
2017-05-21 01:12:09 +02:00
ines
da12aee0c1
Update spacy.load with note on get_lang_class
2017-05-21 00:19:26 +02:00
ines
924e8506de
Move Defaults subclass to module scope (necessary for pickling)
2017-05-20 19:02:27 +02:00
ines
27de0834b2
Update docstrings and API docs for Lexeme
2017-05-20 15:13:42 +02:00
ines
7ed8a92ed1
Update docstrings and API docs for Token
2017-05-20 15:13:33 +02:00
ines
4ed6a36622
Update docstrings and API docs for Matcher
2017-05-20 14:43:10 +02:00
ines
39f36539f6
Update docstrings and API docs for Matcher
2017-05-20 14:32:34 +02:00
ines
c00ff257be
Update docstrings and API docs for Matcher
2017-05-20 14:26:10 +02:00
ines
463e3cc80f
Remove resize_vectors and vectors_length
2017-05-20 14:02:14 +02:00
ines
b218c1964a
Update "What's new in v2.0" docs
2017-05-20 14:00:41 +02:00
ines
f0cc642bb9
Update docstrings and API docs for Vocab
2017-05-20 14:00:41 +02:00
Matthew Honnibal
a93276bb78
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-20 13:55:12 +02:00
Matthew Honnibal
ce9234f593
Update Matcher API
2017-05-20 13:54:53 +02:00
ines
8b14476253
Fix typo
2017-05-20 13:00:13 +02:00
ines
6557ff9e85
Update example
2017-05-20 13:00:07 +02:00
ines
fea4925f41
Reorganise API docs navigation
2017-05-20 12:59:57 +02:00
ines
b2678372c7
Add API docs for top-level spaCy functions
...
i.e. spacy.load(), spacy.info(), spacy.explain()
2017-05-20 12:59:44 +02:00
ines
797f10ab16
Update formatting
2017-05-20 12:59:16 +02:00
ines
e10c48210d
Update Matcher API and workflow to reflect new API
...
on_match is now the second positional argument, to easily allow a
variable number of patterns while keeping the method clean and readable.
2017-05-20 12:59:03 +02:00
ines
eb521af267
Fix formatting
2017-05-20 12:58:15 +02:00
ines
7973912114
Update CLI docs
2017-05-20 12:58:05 +02:00
ines
9edc7fb0ba
Update Matcher API docs
2017-05-20 12:27:22 +02:00
ines
5163a4513e
Update API docs
2017-05-20 01:43:48 +02:00
ines
784347160d
Rewrite rule-based matching workflow
2017-05-20 01:38:55 +02:00
ines
7f9539da27
Fix old download command and formatting
2017-05-20 01:38:43 +02:00
ines
e3256e7406
Update Matcher API docs
2017-05-20 01:38:34 +02:00
ines
0cabf9e13f
Fix model tag
2017-05-20 01:38:14 +02:00
ines
fe5d8819ea
Update Matcher docstrings and API docs
2017-05-19 21:47:06 +02:00
ines
c8580da686
Update "requires model" tags
2017-05-19 20:24:46 +02:00
ines
c3e903e4c2
Update examples and API docs
2017-05-19 19:59:02 +02:00
ines
e9e62b01b0
Update docstrings and API docs for Token
2017-05-19 18:47:56 +02:00
ines
62ceec4fc6
Update docstrings and API docs for Span
2017-05-19 18:47:46 +02:00
ines
23f9a3ccc8
Update docstrings and API docs for Doc
2017-05-19 18:47:39 +02:00
ines
2c8c9dc0c9
Update docstrings and API docs for Language
2017-05-19 18:47:24 +02:00
ines
0791f0aae6
Update docstrings and API docs for Span class
2017-05-19 00:31:31 +02:00
ines
5b68579eb8
Use returns/yields instead of return/yield
2017-05-19 00:02:34 +02:00
ines
b687ad109d
Update docstrings and API docs for Doc class
2017-05-18 23:59:44 +02:00
ines
d42bc16868
Update docstrings and API docs for Language class
2017-05-18 23:57:38 +02:00
ines
b87066ff10
Update docstrings and API docs for Doc class
2017-05-18 22:17:41 +02:00
ines
476b8209fe
Update docs with new Jupyter auto-detection
2017-05-18 14:58:17 +02:00
ines
11f52b8b83
Add headline to installation details and move aside
2017-05-17 12:04:03 +02:00
ines
533bb63816
Implement quickstart widget
2017-05-17 12:04:03 +02:00
ines
9df9a87d03
Add visualizer usage example
2017-05-17 12:04:03 +02:00
ines
6364a9be9d
Add What's new and spaCy 101 stubs
2017-05-17 12:04:03 +02:00
ines
f4ae1e8750
Add section on adding titles to documents
2017-05-17 12:04:03 +02:00
ines
02a4841e7b
Move CLI docs to API reference
2017-05-17 12:04:03 +02:00
ines
accf05b0a9
Update visualizers docs
2017-05-15 14:37:01 +02:00
ines
d7244ae72d
Add docs on collapse_punct option
2017-05-15 13:51:33 +02:00
ines
6d7986b7bc
Update docs
2017-05-15 01:46:33 +02:00
ines
c6e8d55dcb
Update NER workflow with new displaCy
2017-05-15 01:42:11 +02:00
ines
860a60e251
Fix explanation
2017-05-15 01:31:11 +02:00
ines
5c044cb670
Add visualizers usage docs
2017-05-15 01:25:18 +02:00
ines
c33bdeb564
Use uppercase for entity types
2017-05-15 01:24:57 +02:00
ines
3d37564a09
Remove resources from navigation for now
...
Not sure what to do with this page... maybe merge it with something
else?
2017-05-14 23:29:58 +02:00
ines
cf7e5ed534
Use American spelling for "visualizers"
...
Kinda sucks because we normally use British spelling, but it just looks
weird and confusing otherwise... same with tokenizer and all other
library internals. So this is sort of the "official policy" for now.
2017-05-14 23:29:36 +02:00
ines
fe5a5086e1
Fix typo
2017-05-14 23:27:56 +02:00
ines
1ae07da18f
Add API docs for spacy.displacy (see #1058 )
2017-05-14 19:31:23 +02:00
ines
b462076d80
Merge load_lang_class and get_lang_class
2017-05-14 01:31:10 +02:00
ines
1465c6c221
Add API docs for util functions
2017-05-13 21:23:12 +02:00
ines
144161c58c
Update links to dev resources
2017-05-13 21:23:02 +02:00
ines
0095d5322b
Update adding languages docs
2017-05-13 18:54:10 +02:00
ines
1d94c0e98a
Update table of contents
2017-05-13 15:42:51 +02:00
ines
a48e21755e
Add section on testing language tokenizers
2017-05-13 15:39:27 +02:00
ines
2f54fefb5d
Update adding languages docs
2017-05-13 14:54:58 +02:00
ines
3665acc0de
Update adding languages docs
2017-05-13 12:39:36 +02:00
ines
3454f2aca8
Update showcase
2017-05-13 03:32:03 +02:00
ines
67726d1837
Update data model docs
2017-05-13 03:10:56 +02:00
ines
915b50c736
Update adding languages docs
2017-05-13 03:10:50 +02:00
ines
19879cb693
Update alpha support docs
2017-05-12 15:57:49 +02:00
ines
63d79947c8
Update title in navigation
2017-05-12 15:40:43 +02:00
ines
531ee1373b
Rename "Language models" to "Languages" in API
2017-05-12 15:38:56 +02:00
ines
c4d2c3cac7
Update adding languages docs
2017-05-12 15:38:17 +02:00
ines
fac3566aac
Add descriptions to POS tagging scheme
2017-05-03 20:11:02 +02:00
ines
1570b83ee5
Add spacy.explain() note to NER annotation scheme
2017-05-03 20:11:02 +02:00
ines
219369bb7d
Add detailed docs for dependency label annotations
2017-05-03 20:11:02 +02:00
ines
f9384b0fbd
Update alpha languages and add aside for tokenizer dependencies
2017-05-03 09:58:31 +02:00
Yasuaki Uechi
0e7a9b9fac
Add Japanese to 'Alpha support’ section
2017-05-03 13:56:45 +09:00
Ines Montani
fb96f88b59
Update info on CoNLL format and include link
2017-04-27 14:36:08 +02:00
M. Z. Ferdous (Imran)
c9f9203d5f
fix typo, CONLL format
...
tried to google about connlu format. Saw there is conll format, not connlu.
2017-04-27 16:48:54 +06:00
ines
5aa49971f9
Add French example to models docs
2017-04-27 12:08:47 +02:00
ines
034ec5710b
Fix typo and add Norwegian to alpha languages
2017-04-27 11:24:21 +02:00
ines
100846bed3
Fix typo in model list
2017-04-26 21:40:17 +02:00
ines
375edf0bb5
Add list of models and include French
2017-04-26 20:50:27 +02:00
ines
4eacd72bc3
Move list of models to own file
2017-04-26 20:50:27 +02:00
ines
c2006166d3
Update list of available models and info
2017-04-26 16:03:41 +02:00
ines
e6bdf5bc5c
Update adding language / training docs (see #966 )
...
Add data examples and more info on training and CLI commands
2017-04-26 14:01:19 +02:00
ines
ae2b77db1b
Fix info on naming conventions
2017-04-26 14:01:19 +02:00
Julien Chaumond
f997bceb07
Make object of the deep learning tutorial clearer
...
This is a great tutorial, but I think it is weirdly explained in the current form. The largest part of the code is about implementing the actual sentiment analysis model, not about counting entities. (which is not even present in the `deep_learning_keras.py` script in `examples`)
2017-04-24 11:55:41 +02:00
ines
2bfec1a4f8
Add note on languages with non-latin characters (see #996 )
2017-04-23 15:58:40 +02:00
ines
ddd5194088
Update Language docs and docstrings
2017-04-17 01:52:13 +02:00
ines
2ab394d655
Fix whitespace
2017-04-17 01:45:00 +02:00
ines
7f776258f0
Add link to API docs
2017-04-17 01:41:46 +02:00
ines
aad80a291f
Add save_to_directory method to API docs
2017-04-17 01:40:34 +02:00
ines
c6c3162c50
Fix lightning tour example ( closes #889 )
2017-04-17 00:00:30 +02:00
ines
de5062711b
Update adding languages workflow to reflect changes in __init__.py
2017-04-16 22:26:46 +02:00
ines
e4dd645c37
Update link
2017-04-16 20:37:46 +02:00
ines
dea79224ed
Remove saving & loading docs and link to new workflow
2017-04-16 20:37:45 +02:00
ines
c365795bf6
Update navigation
2017-04-16 20:37:45 +02:00
ines
5bbbb7674b
Add training examples to tutorials
2017-04-16 20:37:45 +02:00
ines
17e9743388
Add saving & loading models docs
2017-04-16 20:37:45 +02:00
ines
b15bdb5279
Update training docs
2017-04-16 20:37:45 +02:00
ines
5cb17b9f33
Add NER training docs
2017-04-16 20:37:45 +02:00
ines
d29c825ca4
Update docs for package command
2017-04-16 13:37:24 +02:00
ines
cf558e37c3
Update adding languages docs with new commands
2017-04-13 13:52:11 +02:00
Sohil
328678c7e9
Extra brace ")" creating error
...
There is an extra closing brace `)` which is creating error while running example.
2017-04-13 17:12:28 +05:30
ines
1f501af602
Add file name shadowing module issue to troubleshooting guide (see #953 )
2017-04-07 16:21:32 +02:00
ines
2f38c1d77f
Add documentation for new convert and model commands
2017-04-07 13:27:55 +02:00
ines
f33c4cbae1
Add --no-cache-dir error to troubleshooting docs (see #958 )
2017-04-07 10:22:18 +02:00
ines
d6bbc3ffcd
Fix formatting
2017-04-07 10:22:18 +02:00
ines
2c36a61ec5
Add spacyr to libraries
2017-04-03 18:12:38 +02:00
ines
e210496f78
Update Windows compiler docs
2017-03-29 10:35:20 +02:00
ines
13df2d6a60
Add documentation for spaCy's JSON format
2017-03-26 15:56:15 +02:00
ines
5901c8f7f0
Update spacy train CLI documentation
2017-03-26 15:33:48 +02:00
ines
afd839f64b
Add pip and conda badges to installation docs
2017-03-26 14:11:31 +02:00
ines
9a481c9f42
Add "Troubleshooting" section
2017-03-26 13:42:36 +02:00
ines
d4a86b6394
Update formatting
2017-03-26 13:42:19 +02:00
ines
1dae97b2f6
Fix typos
2017-03-26 11:14:44 +02:00
ines
a5fc5fb0db
Add Hebrew to list of alpha languages
2017-03-25 10:22:46 +01:00
ines
9600cd1b9e
Fix download commands
2017-03-25 10:22:05 +01:00
ines
fa6e3cefbb
Simplify package command docs
2017-03-21 11:35:29 +01:00
ines
49bbfdaac1
Add info on CLI to docs on own models
2017-03-21 11:25:01 +01:00
ines
09b24bc5a9
Add docs for package command
2017-03-21 11:19:21 +01:00
ines
81b28ca606
Update models docs with info on retraining own models
2017-03-20 18:01:55 +01:00
ines
ef5e261387
Add spacy_api project by @kootenpv to showcase
2017-03-19 12:49:40 +01:00
ines
fa1f2040a5
Use correct code block language
2017-03-18 18:19:50 +01:00
ines
ff277140f9
Add CLI docs
2017-03-18 15:24:50 +01:00
ines
e635e1f6f4
Update docs to reflect new commands
2017-03-18 15:24:42 +01:00
ines
e9d8d756fc
Fix typo in pytest flags
2017-03-18 15:24:20 +01:00
ines
3926ffdb70
Update models docs
2017-03-17 19:26:37 +01:00
ines
76c0ea6cc6
Update models docs
2017-03-17 17:01:16 +01:00
ines
b322f31521
Update models docs
2017-03-17 16:09:56 +01:00
ines
7f25f64acc
Update lightning tour
2017-03-17 13:11:00 +01:00
ines
e461fafd14
Update example
2017-03-16 23:23:35 +01:00
ines
f4df9463f2
Fix wording
2017-03-16 22:21:46 +01:00
ines
08b0fb62cc
Update models docs
2017-03-16 22:09:43 +01:00
ines
0b5c664b04
Update resources
2017-03-16 21:59:26 +01:00
ines
807139ae61
Update installation docs and add models quickstart aside
2017-03-16 21:53:44 +01:00
ines
ec75c781b9
Add docs page for models
2017-03-16 21:53:31 +01:00
ines
4c53eed35a
Remove sputnik from dependencies and docs
2017-03-15 17:39:25 +01:00
ines
758335452d
Update installation instructions and fix formatting
2017-03-08 11:36:00 +01:00
ines
004c4c9566
Update installation docs
...
Include conda and virtualenv info for pip, add instructions for
downloading models manually and add details and fab commands to
"Compile from source" section.
2017-03-07 18:52:22 +01:00
yalei
27c0e6226b
Edit example code
...
The original code forget to import the `random` module and the `EntityRecognizer` module.
2017-03-07 18:07:40 +08:00
ines
d25f17f139
Add Bengali to list of languages (see #865 )
2017-03-01 15:59:21 +01:00
ines
2b07ab7db4
Add feature scheme to API docs (see #857 , #739 )
2017-02-24 18:26:32 +01:00
ines
8ddad178f6
Add book and tutorial
2017-02-24 18:26:32 +01:00
Ines Montani
49a102aff3
Merge pull request #841 from jondoughty/patch-1
...
Updated Token class documentation
2017-02-16 23:47:51 +01:00
Jon Doughty
12a8757343
Update token.jade
2017-02-16 10:55:33 -08:00
nycmonkey
8946a2a496
Fix typo in IOB integer to letter map
...
ent_iob value for an ent.iob_ value of 'B' should be 3, not B
2017-02-16 13:49:57 -05:00
John Gamboa
e31894b800
Fixes example 3 of entity recognition (see issue #832 )
2017-02-16 11:19:53 +01:00
Stefan Bunk
2bf19d4735
Fix error in pipeline loading documentation
...
The cell for the `vocab` parameter is not displayed, making it seem as if the explanation belongs to the previous param.
2017-02-10 12:06:55 +01:00
Stefan Bunk
e972b2fa87
Fix error in matching documentation
...
LOWER and IS_PUNCT are members of `spacy` and not of the `Matcher` class.
2017-02-07 16:52:01 +01:00
Matthew Honnibal
9aaa2c5633
Fix entity recognition example ( closes #803 )
2017-02-05 11:23:12 +01:00
ines
a44da8fb34
Update language models and alpha support overview
2017-02-04 13:49:05 +01:00
Ines Montani
651bf411e0
Add tutorial
2017-01-26 13:48:38 +01:00
Ines Montani
da3aca4020
Fix formatting
2017-01-26 13:48:29 +01:00
Hidekazu Oiwa
7806ebafd2
Fix the span doc typo
...
Fix the typo in the span API doc.
It explains the `end` of the span as the `start_char` description.
2017-01-17 20:37:14 -08:00
Kevin Gao
7ec710af0e
Fix Custom Tokenizer docs
...
- Fix mismatched quotations
- Make it more clear where ORTH, LEMMA, and POS symbols come from
- Make strings consistent
- Fix lemma_ assertion s/-PRON-/me/
2017-01-17 10:38:14 -08:00
Jason Kessler
9fa6f9fb40
Origin of spacy.matcher attributes
...
Make it clear that Matcher attributes live in spacy.matcher.attrs.
2017-01-16 13:31:35 -06:00
jktong
df0aeff379
Correct typo "chldren" in doc.jade
2017-01-16 09:34:59 -05:00
Ines Montani
57919566b8
Add Jupyter notebooks repo to resources list
2017-01-05 20:50:08 +01:00
Ines Montani
d677db6277
Change "Multi-language support" to amber for spaCy
2017-01-03 21:24:35 +01:00
Ines Montani
1b82756cc7
Tidy up and fix formatting and consistency
2017-01-02 00:29:24 +01:00
Ines Montani
e3d84572f2
Fix ents input format example
2017-01-01 12:28:37 +01:00
Guy Rosin
acdd2fc9a6
Tiny code typo
2016-12-31 14:53:05 +02:00
Ines Montani
d1585959d9
Add Hungarian to alpha support overview
2016-12-27 22:31:41 +01:00
Ines Montani
b7becaec85
Fix typo
2016-12-25 15:23:32 +01:00
Ines Montani
207555fae7
Fix spelling
2016-12-23 21:36:01 +01:00
Ines Montani
48b03b4001
Fix formatting and wording
2016-12-23 14:36:03 +01:00
Ines Montani
cc051ddc15
Add resources page to usage docs
2016-12-23 14:36:03 +01:00
Ines Montani
d1a2846750
Document DET_LEMMA
2016-12-21 18:18:35 +01:00
Ines Montani
71c00db8a5
Update language models page
2016-12-21 00:54:54 +01:00
aikramer2
349143faa2
update to training doc
2016-12-20 12:01:16 -08:00
Ines Montani
a2525c76ee
Reformat word frequencies section in "adding languages" workflow
2016-12-19 17:18:38 +01:00
Ines Montani
ddf5c5bb61
Generalise dependency parsing annotation specs beyond English ( closes #657 )
2016-12-19 13:42:44 +01:00
Ines Montani
6a793251c8
Add aside on spaCy's custom pronoun lemma
2016-12-19 13:41:47 +01:00
Ines Montani
d0c15730c4
Fix link
2016-12-19 13:09:45 +01:00
Ines Montani
a9c0e77b80
Fix typo
2016-12-19 13:09:45 +01:00
Ines Montani
fa65c6b54c
Add "Adding languages" workflow ( closes #562 )
2016-12-18 23:54:19 +01:00
Ines Montani
1cddb7da36
Add "Part-of-speech tagging" workflow ( closes #581 )
2016-12-18 23:54:19 +01:00
Ines Montani
ac597b58f6
Update showcase
2016-12-18 23:54:18 +01:00
Ines Montani
614ca6fb41
Split annotation specs into files to they can be included in different places
2016-12-18 17:42:10 +01:00
Ines Montani
ce8bf08223
Fix formatting
2016-12-18 17:40:20 +01:00