Ines Montani
48a206a95f
Fix displaCy visualizations in docs ( closes #3357 ) [ci skip]
2019-03-06 13:20:44 +01:00
Ines Montani
5eadf61327
Update pretraining docs on file format ( closes #3354 )
2019-03-04 16:30:13 +00:00
Ines Montani
1d4ba7678f
Auto-format [ci skip]
2019-02-27 12:07:35 +01:00
Matthew Honnibal
f1d77eb140
💫 Improve handling of missing NER tags ( closes #2603 ) ( #3341 )
...
* Improve handling of missing NER tags
GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.
Fix bug that occurred when first tag was a None value. Closes #2603 .
* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Ines Montani
c478a2ccb6
Update backwards incompat [ci skip]
2019-02-27 11:56:56 +01:00
Matthew Honnibal
4a3371acd5
Make doc[0].is_sent_start == True ( closes #2869 ) ( #3340 )
...
* Make doc[0] have sent_start True. Closes #2869
* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Ines Montani
1b6238101a
Add table explaining training metrics [ closes #2644 ]
2019-02-25 10:03:43 +01:00
Ines Montani
d0b3af9222
Fix remaining inaccuracies in API docs ( closes #2329 )
2019-02-24 22:21:25 +01:00
Ines Montani
62b558ab72
💫 Support lexical attributes in retokenizer attrs ( closes #2390 ) ( #3325 )
...
* Fix formatting and whitespace
* Add support for lexical attributes (closes #2390 )
* Document lexical attribute setting during retokenization
* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani
aa52305461
Improve pipeline model and meta example [ci skip]
2019-02-24 18:45:39 +01:00
Ines Montani
df19e2bff6
💫 Allow setting of custom attributes during retokenization ( closes #3314 ) ( #3324 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.
```python
Token.set_extension('is_musician', default=False)
doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
retokenizer.merge(doc[2:4], attrs=attrs)
assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani
403b9cd58b
Add docs on adding to existing tokenizer rules [ci skip]
2019-02-24 18:35:19 +01:00
Ines Montani
1ea1bc98e7
Document regex utilities [ci skip]
2019-02-24 18:34:10 +01:00
Ines Montani
46ec5cdccc
Update TextCategorizer docs
2019-02-24 13:11:57 +01:00
Ines Montani
c03cb1cc63
Improve built-in component API docs
2019-02-24 13:11:49 +01:00
Ines Montani
383e2e1f12
Update Python versions [ci skip]
2019-02-24 11:49:45 +01:00
Ines Montani
b624cb4b89
Update v2-1.md
2019-02-24 11:49:27 +01:00
Ines Montani
250e88ef55
Fix docs example (see #2728 )
2019-02-21 14:22:06 +01:00
Ines Montani
0fc908d7a5
Add note on merging speed in v2.1 (see #3300 ) [ci skip]
2019-02-21 12:34:18 +01:00
Ines Montani
236aa94ded
Update v2-1.md
2019-02-21 12:33:56 +01:00
Sofie
9a478b6db8
Clean up of char classes, few tokenizer fixes and faster default French tokenizer ( #3293 )
...
* splitting up latin unicode interval
* removing hyphen as infix for French
* adding failing test for issue 1235
* test for issue #3002 which now works
* partial fix for issue #2070
* keep the hyphen as infix for French (as it was)
* restore french expressions with hyphen as infix (as it was)
* added succeeding unit test for Issue #2656
* Fix issue #2822 with custom Italian exception
* Fix issue #2926 by allowing numbers right before infix /
* splitting up latin unicode interval
* removing hyphen as infix for French
* adding failing test for issue 1235
* test for issue #3002 which now works
* partial fix for issue #2070
* keep the hyphen as infix for French (as it was)
* restore french expressions with hyphen as infix (as it was)
* added succeeding unit test for Issue #2656
* Fix issue #2822 with custom Italian exception
* Fix issue #2926 by allowing numbers right before infix /
* remove duplicate
* remove xfail for Issue #2179 fixed by Matt
* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Ines Montani
57ae71ea95
Add docs on serializing the pipeline (see #3289 ) [ci skip]
2019-02-18 14:13:29 +01:00
Ines Montani
38e4422c0d
Improve matcher example ( resolves #3287 )
2019-02-18 13:26:37 +01:00
Ines Montani
660cfe44c5
Fix formatting
2019-02-18 13:26:22 +01:00
Ines Montani
212ff359ef
Fix links [ci skip]
2019-02-17 22:25:50 +01:00
Ines Montani
04b4df0ec9
Remove n_threads
2019-02-17 22:25:42 +01:00
Ines Montani
e597110d31
💫 Update website ( #3285 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org ) with [Remark](https://github.com/remarkjs/remark ) and [MDX](https://mdxjs.com/ ). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/ ) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com ) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270 . Resolves #3222 . Resolves #2947 . Resolves #2837 .
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00
ines
808f7ee417
Update API documentation
2017-10-03 14:27:22 +02:00
ines
3f4fd2c5d5
Update usage documentation
2017-10-03 14:26:20 +02:00
Reza Gharibi
0461b82158
Fix typos
2017-09-27 03:56:20 +03:30
Reza Gharibi
fa1844b132
Fix typo
2017-09-27 03:55:54 +03:30
Reza Gharibi
b5dd7e7cc4
Fix typo
2017-09-27 03:55:28 +03:30
Ines Montani
b8e81daccf
Fix typo ( closes #1312 )
2017-09-14 12:49:59 +02:00
ines
d15775c3ad
Fix typos and commands in alpha docs
2017-08-21 13:40:11 +02:00
ines
3c33003078
Port over typo corrections from #1245
2017-08-20 12:00:17 +02:00
ines
1261b01e46
Update Doc.char_span docs
2017-08-19 16:34:32 +02:00
ines
5cb0200e63
Document new Span.to_array() method
2017-08-19 12:45:28 +02:00
ines
471eed4126
Add example to Span.merge()
2017-08-19 12:45:16 +02:00
ines
404d3067b8
Document new Doc.char_span() method
2017-08-19 12:45:00 +02:00
ines
d53cbf369f
Document as_tuples kwarg on Language.pipe()
2017-08-19 12:44:50 +02:00
ines
6a37c93311
Update argument type
2017-08-19 12:44:33 +02:00
ines
4731d50220
Add break utility for long nowrap items (e.g. code)
2017-08-19 12:44:23 +02:00
ines
0aba11b64b
Update package command docs
2017-08-14 16:45:44 +02:00
ines
a29f132ffd
Change python -m spacy to spacy
...
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
Nikolai Kruglikov
08e443e083
Fix small typo in documentation
2017-08-14 12:19:04 +02:00
ines
ab8ffbaab7
Add text classification to v2 overview
2017-07-22 17:56:51 +02:00
ines
f085b88f9d
Add TextCategorizer API docs stub
2017-07-22 17:56:33 +02:00
ines
ab1a4e8b3c
Add Tensorizer API docs stub
2017-07-22 17:56:25 +02:00
ines
0fb89dd204
Add text classification usage guide template
2017-07-22 17:56:07 +02:00
ines
d05ab1b3a0
Add text classification to 101 overview and change order
2017-07-22 17:55:53 +02:00
ines
d2a7e5b8e5
Add GoldParse.cats attribute
2017-07-22 17:55:35 +02:00
ines
23d976ed00
Add Doc.cats attribute and missing v2 tag
2017-07-22 17:55:14 +02:00
Ines Montani
1ddbeddca2
Fix typo
2017-07-22 15:00:58 +02:00
Jarle Mathiesen
f20533ec0c
fix small typo
2017-06-24 12:31:33 +02:00
Savva Kolbachev
800a8faff4
Changed the capital of Lithuania to Vilnius
...
Hi,
There is a typo about the capital of Lithuania.
Vilnius is the capital of Lithuania https://en.wikipedia.org/wiki/Vilnius
Ljubljana is the capital of Slovenia https://en.wikipedia.org/wiki/Ljubljana
2017-06-12 23:27:00 +03:00
Ines Montani
57f64b9e1c
Merge pull request #1124 from v3t3a/patch-3
...
docs - Fix url error for Displacy Ent visualizer
2017-06-12 21:20:32 +02:00
Ines Montani
b2a28028cf
Merge pull request #1115 from v3t3a/patch-2
...
docs - Add read() method when opening file (Lightning tour)
2017-06-12 21:19:25 +02:00
Ines Montani
fe8d136ae0
Merge pull request #1114 from v3t3a/patch-1
...
docs - Update doc.jade (Just remove a duplicate 'doc =')
2017-06-12 21:19:02 +02:00
Vetea
eae1f7b19c
Fix url error for Displacy Ent visualizer
2017-06-12 14:30:02 +02:00
ines
49026a1346
Fix typos in example (see #1105 )
2017-06-08 19:15:50 +02:00
Vetea
cc3aee1189
Add read() method when opening file
...
Add read() method for
to avoid :
```TypeError: Argument 'string' has incorrect type (expected str, got _io.TextIOWrapper)```
Test with:
spaCy : v2.0.0 Alpha
python : 3.5.2+ (default, Sep 22 2016, 12:18:14)
2017-06-08 11:27:09 +02:00
Vetea
8e20cf6368
Update doc.jade
...
Just remove a duplicate 'doc ='
2017-06-08 10:35:58 +02:00
ines
6b799bac54
Fix formatting and details
2017-06-06 14:37:49 +02:00
ines
fd9ae0f0e0
Update v2 comparison table
2017-06-05 16:39:11 +02:00
ines
a3f9745a14
Update similarity usage guide and examples
2017-06-05 15:37:33 +02:00
ines
fd35d910b8
Update v2 docs and benchmarks
2017-06-05 14:13:38 +02:00
ines
9f55c0d4f6
Add Vectors class
2017-06-05 13:33:11 +02:00
ines
040553ca59
Update architecture and features table
2017-06-05 13:33:01 +02:00
ines
e204788c30
Add docs for util.load_model_from_path
2017-06-05 13:18:22 +02:00
ines
efc37ea3de
Update train CLI
2017-06-04 23:45:14 +02:00
ines
505d43b832
Update norms example
2017-06-04 23:33:26 +02:00
ines
f8e93b6d0a
Update norms example
2017-06-04 23:24:29 +02:00
ines
a857b2b511
Update norms example
2017-06-04 23:21:37 +02:00
ines
47d066b293
Add under construction
2017-06-04 23:17:54 +02:00
ines
e9816daa6a
Add details on syntax iterators
2017-06-04 23:16:33 +02:00
ines
990cb81556
Add info on syntax iterators
2017-06-04 21:47:22 +02:00
ines
e4eb33daf7
Add links to production use guide
2017-06-04 20:56:58 +02:00
ines
63cd539d04
Add more details on model packages and requirements.txt (see #1099 )
2017-06-04 20:52:10 +02:00
ines
97ff83d163
Fix docs on model loading
2017-06-04 20:44:59 +02:00
ines
b6002db797
Add v2 label
2017-06-04 18:53:03 +02:00
ines
468ff1a7dd
Update v2 docs and add benchmarks stub
2017-06-04 15:34:28 +02:00
Matthew Honnibal
23fd6b1782
Add intro narrative for v2
2017-06-04 15:10:37 +02:00
ines
3419ecbfdd
Update docs on model shortcut links
2017-06-04 13:55:00 +02:00
ines
586e901143
Add v2 intro stub
2017-06-04 13:42:37 +02:00
ines
4f8f62d9b3
Merge branch 'v2-docs-edits' into develop
2017-06-04 13:40:58 +02:00
ines
809903dcad
Fix link and update wording
2017-06-04 13:29:20 +02:00
ines
22dd18c364
Remove redundant CPU commands
2017-06-04 13:29:13 +02:00
ines
1d6377218a
Update architecture blurb and move other info
2017-06-04 13:28:58 +02:00
ines
7a66c9f039
Fix formatting
2017-06-04 13:14:00 +02:00
Matthew Honnibal
f2c4a9f690
Edits to spacy-101 page
2017-06-04 13:10:27 +02:00
Matthew Honnibal
aca53b95e1
Link architecture blurb
2017-06-04 13:10:06 +02:00
Matthew Honnibal
64ca5123bb
Add Architecture 101 blurb
2017-06-04 13:09:19 +02:00
Matthew Honnibal
e77ed953f4
Update GPU instructions
2017-06-04 12:03:22 +02:00
ines
1d3b012e56
Update adding languages docs and add 101
2017-06-03 23:54:23 +02:00
ines
a3715a81d5
Update adding languages guide
2017-06-03 22:16:38 +02:00
ines
ec6d2bc81d
Add table of contents mixin
2017-06-03 22:16:26 +02:00
ines
9acf8686f7
Update note on compact mode issues
2017-06-03 13:31:16 +02:00
ines
b0225183c2
Update displaCy defaults
2017-06-03 13:27:06 +02:00
ines
c60431357d
Port over docs typo corrections
2017-06-03 11:31:30 +02:00
ines
c6dc2fafc0
Add Spanish and move example sentences to meta
2017-06-01 17:49:56 +02:00
ines
1bebc6392c
Add source files to pipeline components
2017-06-01 17:38:06 +02:00
ines
b577ed79ee
Move social image logic out to function and move files
2017-06-01 14:27:44 +02:00
ines
5e60b09dcd
Fix custom tokenizer example
2017-06-01 13:02:50 +02:00
ines
706cec6d58
Move annotation specs up
2017-06-01 13:02:43 +02:00
ines
8274dffad6
Update NER training draft
2017-06-01 12:51:36 +02:00
ines
04fac3f52a
Add NER training example code
2017-06-01 12:47:47 +02:00
ines
7f5e7e7320
Fix typo
2017-06-01 12:47:36 +02:00
ines
4a927154d8
Update v2 docs
2017-06-01 11:56:32 +02:00
ines
03bbb96db8
Remove outdated examples
2017-06-01 11:56:02 +02:00
ines
789e69b73f
Update training guide
2017-06-01 11:53:23 +02:00
ines
2f40d6e7e7
Add training 101
2017-06-01 11:53:16 +02:00
ines
abed463bbb
Update serialization 101
2017-06-01 11:52:58 +02:00
ines
72380c952a
Update training section in NER guide and add links
2017-06-01 11:52:49 +02:00
ines
77dca25c7f
Update Language API docs
2017-06-01 11:51:31 +02:00
ines
22b1f72870
Add spaCy 101 intro
2017-05-31 12:44:09 +02:00
ines
a18b95ca12
Update docs on testing
2017-05-31 12:43:40 +02:00
ines
981196c181
Fix typo
2017-05-31 11:34:31 +02:00
ines
f86289566a
Update new in v2 section and add note on Matcher acceptors
2017-05-30 13:53:06 +02:00
ines
ce4e45d0bb
Update 101 intro
2017-05-29 22:15:06 +02:00
ines
b5bfab8699
Add description
2017-05-29 15:27:16 +02:00
ines
687ed28340
Update processing pipelines guide
2017-05-29 14:21:00 +02:00
ines
d5992f408f
Update note on vocab consistency
2017-05-29 14:14:26 +02:00
ines
567485a818
Fix and document model loading with pipeline and overrides
2017-05-29 14:10:10 +02:00
ines
a2134951f2
Update 101 and add note on pipeline order and tensors
2017-05-29 11:45:32 +02:00
ines
17b635eaab
Update alpha docs note and fix typo
2017-05-29 11:09:24 +02:00
ines
fbe105f1eb
Add note on L in long integers in Python 2
2017-05-29 11:05:05 +02:00
ines
9d74810f6f
Update examples
2017-05-29 01:09:52 +02:00
ines
42cf414138
Update Matcher example
2017-05-29 01:09:52 +02:00
ines
00b2094dc3
Fix typos, long integers and tests
2017-05-29 01:09:52 +02:00
ines
d71c6db76e
Add missing Chainer install for GPU if building spaCy from source
2017-05-28 23:34:59 +02:00
ines
e0f9ccdaa3
Update texts and rename vectorizer to tensorizer
2017-05-28 23:26:13 +02:00
ines
606879b217
Update hash strings examples
2017-05-28 19:42:44 +02:00
ines
c7b57ea314
Update docs and change integer IDs to hash values
2017-05-28 19:25:34 +02:00
ines
738b4f7187
Add quickstart options and docs for GPU
2017-05-28 19:20:11 +02:00
ines
4c00cb8c8b
Update 101 and add community/FAQ and table of contents
2017-05-28 18:45:49 +02:00
ines
0ea31d1e31
Add under construction note to pipeline components
2017-05-28 18:44:07 +02:00
ines
8a148b6563
Fix code, links and formatting
2017-05-28 18:29:16 +02:00
ines
414193e9ba
Update docs to reflect StringStore changes
2017-05-28 18:19:11 +02:00
ines
69bda9aed7
Update text, examples, typos, wording and formatting
2017-05-28 16:41:01 +02:00
ines
f8185b8e11
Rename vocab-stringsotre to vocab
2017-05-28 16:37:14 +02:00
ines
10d05c2b92
Fix typos, wording and formatting
2017-05-28 01:30:12 +02:00
ines
eb5a8be9ad
Update language overview and add section on 'xx' lang class
2017-05-28 01:15:44 +02:00
ines
eb703f7656
Update API docs
2017-05-28 00:32:43 +02:00
ines
c1983621fb
Update util functions for model loading
2017-05-28 00:22:40 +02:00
ines
db116cbeda
Update tokenization 101 and add illustration
2017-05-28 00:22:40 +02:00
ines
b03fb2d7b0
Update 101 and usage docs
2017-05-28 00:22:40 +02:00
ines
ae11c8d60f
Add emoji sentiment to lightning tour matcher example
2017-05-27 20:02:20 +02:00
ines
22bf5f63bf
Update Matcher docs and add social media analysis example
2017-05-27 17:58:18 +02:00
ines
0d33ead507
Fix initialisation of Doc in lightning tour example
2017-05-27 17:58:06 +02:00
ines
e05bcd6aa8
Update docs to reflect flattened model meta.json
...
Don't use "setup" key and instead, keep "lang" on root level and add
"pipeline".
2017-05-27 17:57:46 +02:00