Ines Montani
350c8d25b0
Add EntityRecognizer.label property
2018-11-18 00:06:26 +01:00
Ines Montani
017bc2ef2f
Expose TextCategorizer via __all__
2018-11-18 00:06:13 +01:00
Ines Montani
b4581435f6
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-11-16 13:08:22 +01:00
Ines Montani
e2f75eb492
Fix message formatting
2018-11-16 13:08:20 +01:00
Matthew Honnibal
c89fd19f66
Hack broken pipe error for Python2
2018-11-16 02:22:05 +01:00
Matthew Honnibal
2874b8efd8
Fix tok2vec loading in spacy train
2018-11-15 23:34:54 +00:00
Matthew Honnibal
2ddd428834
Fix pretrain script
2018-11-15 23:34:35 +00:00
Matthew Honnibal
09a0227656
Temporarily add a script to load reddit
2018-11-15 23:18:35 +00:00
Matthew Honnibal
f8afaa0c1c
Fix pretrain
2018-11-15 22:46:53 +00:00
Matthew Honnibal
6af6950e46
Fix pretrain
2018-11-15 22:45:36 +00:00
Matthew Honnibal
3e7b214e57
Make pretrain script work with stream from stdin
2018-11-15 22:44:07 +00:00
Matthew Honnibal
8fdb9bc278
💫 Add experimental ULMFit/BERT/Elmo-like pretraining ( #2931 )
...
* Add 'spacy pretrain' command
* Fix pretrain command for Python 2
* Fix pretrain command
* Fix pretrain command
2018-11-15 22:17:16 +01:00
Ines Montani
02fc73ca53
💫 Create random IDs for SVGs to prevent ID clashes ( #2927 )
...
Resolves #2924 .
## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-15 11:40:10 +01:00
Ines Montani
e89708c3eb
💫 Allow matching non-ORTH attributes in PhraseMatcher ( #2925 )
...
* Allow matching non-orth attributes in PhraseMatcher (see #1971 )
Usage: PhraseMatcher(nlp.vocab, attr='POS')
* Allow attr argument to be int
* Fix formatting
* Fix typo
2018-11-15 03:00:58 +01:00
Matthew Honnibal
7ed9124a45
Fix Python2 error on example
2018-11-14 19:35:17 +01:00
Ines Montani
0d5b142c78
Fix typos and whitespace
2018-11-14 19:12:34 +01:00
Ines Montani
bd1b0e396a
Add deprecation warning for PhraseMatcher max_length
2018-11-14 19:10:46 +01:00
Ines Montani
64257bf3a7
Fix formatting
2018-11-14 19:10:21 +01:00
Ines Montani
b3cadd5b81
Delete _matcher2_notes.py
2018-11-14 16:19:12 +01:00
mauryaland
87ce435aff
Check if the word is in one of the regular lists specific to each POS ( #2886 )
2018-11-14 15:58:43 +01:00
Ines Montani
dfcc8f02af
Fix image [ci skip]
...
Twitter URL doesn't work on live site
2018-11-14 01:01:33 +01:00
Ines Montani
1aa91e926f
Minor formatting changes [ci skip]
2018-11-13 23:59:59 +01:00
Francisco Aranda
be99f1cac5
Include universe spec for spacy-wordnet component ( #2919 )
...
* feat: include universe spec for spacy-wordnet component
* chore: include spaCy contributor agreement
2018-11-13 23:54:46 +01:00
Daniel Hershcovich
d3d419ecc0
Allow input text of length up to max_length, inclusive ( #2922 )
2018-11-13 16:46:29 +01:00
mikelibg
75e7d503b7
Removed space in docs + added contributor indo ( #2909 )
...
* - removed unneeded space in documentation
* - added contributor info
2018-11-08 14:18:25 +01:00
Matthew Honnibal
5fc98ade04
Set version to 2.1.0a2
2018-11-08 09:56:56 +01:00
Ines Montani
11db4d2f27
Add script to validate universe json [ci skip]
2018-11-06 12:50:41 +01:00
Ines Montani
a9fda638a9
Add spacy-raspberry to universe ( closes #2889 )
2018-11-06 12:45:50 +01:00
Ines Montani
c235ddf44f
Add spacy-js to universe [ci-skip]
2018-11-06 12:45:03 +01:00
Matthew Honnibal
09aa616182
Make pretraining script work without GPU
2018-11-04 17:09:52 +01:00
Matthew Honnibal
bc8cda818c
Improve pretrain textcat example
2018-11-04 00:17:09 +00:00
Matthew Honnibal
3e7a96f99d
Improve pretrain textcat example
2018-11-03 17:44:12 +00:00
Matthew Honnibal
c87c50af62
Rename new example
2018-11-03 13:09:46 +00:00
Matthew Honnibal
8e8ccc0f92
Work on pretraining script
2018-11-03 12:53:25 +00:00
Matthew Honnibal
ad44982f01
Fix dropout in tensorizer, update comment
2018-11-03 12:46:58 +00:00
Matthew Honnibal
0127f10ba3
Improve train tensorizer script
2018-11-03 10:54:20 +00:00
Matthew Honnibal
ba365ae1c9
Normalize gradient by number of words in tensorizer
2018-11-03 10:53:22 +00:00
Matthew Honnibal
dac3f1b280
Improve Tensorizer
2018-11-03 10:52:50 +00:00
Matthew Honnibal
baf7feae68
Add tensorizer training example
2018-11-02 23:30:06 +00:00
Matthew Honnibal
2527ba68e5
Fix tensorizer
2018-11-02 23:29:54 +00:00
Matthew Honnibal
db08b168a3
Set version to 2.0.17
2018-10-29 23:22:18 +01:00
Suraj Rajan
0bf14082a4
Added more constucts for dependency tree matcher ( #2836 )
2018-10-29 23:21:39 +01:00
Matthew Honnibal
e2ae25d6f5
Try setting older regex version, to align with conda
2018-10-29 13:39:00 +01:00
Matthew Honnibal
a2745d310e
Revert "Update regex version"
...
This reverts commit 62358dd867
.
2018-10-28 16:38:56 +01:00
Matthew Honnibal
62358dd867
Update regex version
2018-10-28 16:27:50 +01:00
Matthew Honnibal
d4fa9af56f
Set version to 2.0.17.dev0
2018-10-28 16:15:26 +01:00
Matthew Honnibal
5a4aeb96b7
Add example showing a fix-up rule for space entities
2018-10-28 16:06:00 +01:00
Matthew Honnibal
b2e2bba8b0
Fix missing comma
2018-10-28 00:09:16 +02:00
Wannaphong Phatthiyaphaibun
2d2765fd8a
Change PyThaiNLP Url ( #2876 )
2018-10-27 14:46:07 +02:00
Matthew Honnibal
817e1fc5e5
Fix out-of-bounds access in NER training
...
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 01:12:50 +02:00