Ole Henrik Skogstrøm
c21efea9bb
Add sent property to token ( #2521 )
...
* Add sent property to token
* Refactored and cleaned up copy paste errors.
2018-07-06 15:54:15 +02:00
ines
38e07ade4c
Add test for custom tokenizer serialization ( resolves #2494 )
2018-07-06 12:40:51 +02:00
ines
c2581f9172
Tidy up tokenizer test
2018-07-06 12:40:28 +02:00
Matthew Honnibal
43dcaa473e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-07-06 12:36:42 +02:00
Matthew Honnibal
6c8d627733
Fix tokenizer deserialization
2018-07-06 12:36:33 +02:00
ines
c001d46153
Tidy up
2018-07-06 12:33:42 +02:00
Matthew Honnibal
63f5651f8d
Fix tokenizer serialization
2018-07-06 12:32:11 +02:00
Matthew Honnibal
e1569fda4e
Fix compile error in matcher
2018-07-06 12:29:23 +02:00
Matthew Honnibal
f5b2076700
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-07-06 12:23:14 +02:00
Matthew Honnibal
1a2f61725c
Fix tokenizer serialization
2018-07-06 12:23:04 +02:00
ines
9e09477b2f
Remove unused import
2018-07-06 12:18:17 +02:00
ines
26f04a6ac3
Fix Matcher tests and add test for any token with operator
2018-07-06 12:17:50 +02:00
Matthew Honnibal
f5703b7a91
Clean up unused stuff in matcher
2018-07-06 12:16:44 +02:00
Matthew Honnibal
08c362d541
Suppress compiler warning about unreachable code
2018-07-06 11:31:22 +02:00
Matthew Honnibal
8ae1bec8bf
Fix init_model
2018-07-05 14:02:06 +02:00
Matthew Honnibal
7b09a4ca49
Fix lemmatization
2018-07-05 13:56:02 +02:00
Matthew Honnibal
ec41ceb383
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-07-05 13:49:42 +02:00
Matthew Honnibal
4eb3405df7
Fix lemmatizer ordering, re Issue #1387
2018-07-05 13:49:29 +02:00
ines
63666af328
Merge branch 'master' into develop
2018-07-04 14:52:25 +02:00
ines
8feb7cfe2d
Remove model dependency from French lemmatizer tests
2018-07-04 14:46:45 +02:00
kleinay
a82c3153ad
fix issue #2452 - displacy arrow direction is always forward ( #2506 ) ( closes #2452 )
...
<!--- Provide a general summary of your changes in the title. -->
Referring #2452 , fixing displacy arrow directions to match the input.
## Description
The fix is simply replacing `direction is 'left'` with `direction == 'left'` to include the case `direction` is a `str` and not a `unicode`.
### Types of change
bug fix
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-07-04 14:12:08 +02:00
Bùi Trung Chí
9af46b4f1b
Fix loading tokenizer with custom prefix search ( #2495 )
...
* Add contributor agreement
* Fix loading tokenizer with cutom prefix search
2018-07-04 12:56:07 +02:00
Matthew Honnibal
dee8bdb900
Fix init-model for npz vectors
2018-07-04 02:29:48 +02:00
Matthew Honnibal
59d655e8d0
Fix model init from jsonl
2018-07-04 01:30:40 +02:00
Matthew Honnibal
1e38bea6e9
Save vectors init
2018-07-03 23:55:04 +02:00
Matthew Honnibal
6692833887
Fix init_model
2018-07-03 23:24:11 +02:00
Matthew Honnibal
4a38a26cb5
Fix init_model
2018-07-03 22:57:11 +02:00
Matthew Honnibal
019d09e3c3
Fix init model
2018-07-03 22:16:44 +02:00
Matthew Honnibal
2543f8c93a
Support .npz vectors in init-model command
2018-07-03 21:42:16 +02:00
Matthew Honnibal
86aad11939
Fix init_model arg
2018-07-03 17:00:42 +02:00
Matthew Honnibal
eff42d36e3
Fix init model command
2018-07-03 16:32:23 +02:00
Matthew Honnibal
97487122ea
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-07-03 15:44:37 +02:00
Matthew Honnibal
6a89faf12e
Add support for jsonl-formatted lexical attributes to init-model command.
2018-07-03 12:22:56 +02:00
Matthew Honnibal
2ec2192000
Revert #1389 : Don't overrule rules when lemma exception is present
2018-06-29 19:43:02 +02:00
Matthew Honnibal
01ace9734d
Make pipeline work on empty docs
2018-06-29 19:21:38 +02:00
Matthew Honnibal
a1b05048d0
Fix tagger when doc is empty
2018-06-29 16:05:40 +02:00
Matthew Honnibal
3786942ff1
Fix tagger when docs are empty
2018-06-29 15:13:45 +02:00
ines
526be40823
Add test for 46d8a66
2018-06-29 14:33:12 +02:00
ines
f08c871adf
Fix typo in Language.from_disk
2018-06-29 14:32:16 +02:00
Matthew Honnibal
46d8a66fef
Fix tokenizer serialization if token_match is None
2018-06-29 14:24:46 +02:00
Matthew Honnibal
e0860bcfb3
Fix bug when docs are empty
2018-06-29 13:56:29 +02:00
Matthew Honnibal
a4d2b0c293
Fix bug when docs are empty
2018-06-29 13:44:25 +02:00
Matthew Honnibal
c83fccfe2a
Fix output of best model
2018-06-25 23:05:56 +02:00
Matthew Honnibal
5a65418c40
Fix handling of unseen labels in tagger
2018-06-25 22:28:59 +02:00
Matthew Honnibal
5b56aad4c2
Fix handling of unseen labels in tagger
2018-06-25 22:24:54 +02:00
Matthew Honnibal
3aabf621a3
Fix handling of unknown tags in tagger update
2018-06-25 22:01:02 +02:00
Matthew Honnibal
69c900f003
Fix init-model if no vectors provided
2018-06-25 18:26:02 +02:00
Matthew Honnibal
664f89327a
Fix init-model if no vectors provided
2018-06-25 17:58:45 +02:00
Matthew Honnibal
c4698f5712
Don't collate model unless training succeeds
2018-06-25 16:36:42 +02:00
Ole Henrik Skogstrøm
d16cb6bee6
Accept Span to displacy render ( #2478 ) ( closes #2477 )
...
* Add Span to displacy render
* Fix span support, errors and add tests
2018-06-25 14:55:16 +02:00
Matthew Honnibal
24dfbb8a28
Fix model collation
2018-06-25 14:35:24 +02:00
Matthew Honnibal
62237755a4
Import shutil
2018-06-25 13:40:17 +02:00
Matthew Honnibal
a040fca99e
Import json into cli.train
2018-06-25 11:50:37 +02:00
Matthew Honnibal
2c703d99c2
Fix collation of best models
2018-06-25 01:21:34 +02:00
Matthew Honnibal
9d6a1c57f2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-06-24 23:40:06 +02:00
Matthew Honnibal
2c80b7c013
Collate best model after training
2018-06-24 23:39:52 +02:00
Muhammad Irfan
f33c703066
Add Urdu Language Support ( #2430 )
...
* added Urdu language support.
* added Urdu language tests.
* modified conftest.py for Urdu language support.
* added spacy contributor agreement.
2018-06-22 11:14:03 +02:00
himkt
14d9007efd
fix wrong indexing ( #2416 )
...
* fix wrong indexing
* add agreement
2018-06-19 10:20:57 +02:00
Aliia E
428bae66b5
Add Tatar Language Support ( #2444 )
...
* add Tatar lang support
* add Tatar letters
* add Tatar tests
* sign contributor agreement
* sign contributor agreement [x]
* remove comments from Language class
* remove all template comments
2018-06-19 10:17:53 +02:00
Cory Hurst
446f5ec41b
Silent keyword in info function in init ( #2459 )
...
* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue #2196
* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue #2196
* contributor agreement
2018-06-18 12:24:21 +02:00
ines
778e5f4da3
Merge branch 'master' into develop
2018-06-11 00:38:04 +02:00
himkt
57311d5d47
replace janome with mecab in the documentation and the test ( #2415 )
...
* Add links to Reddit data (see #2401 )
* replace janome with mecab in the documentation and the test
* add the assignment
2018-06-11 00:33:13 +02:00
Nour Shalabi
a169b79092
Additions to Arabic stop words. ( #2422 )
...
* Additions to Arabic stop words.
* Create nourshalabi.md
2018-06-08 02:33:23 +02:00
ines
a0017e4909
Merge branch 'master' into develop
2018-05-30 14:10:47 +02:00
ines
b8ef9c1000
Fix model names in conftest (see #2379 )
2018-05-30 14:10:20 +02:00
ines
4a62486340
Merge branch 'master' into develop
2018-05-30 13:01:01 +02:00
Maciej
c7d53348d7
Fix bug in CLI iob and ner converter ( #2392 ) ( fixes #2385 )
...
* issue_2385 add tests for iob_to_biluo converter function
* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter
* issue_2385 add test to fix b char bug
* add contributor agreement
* fill contributor agreement
2018-05-30 12:28:44 +02:00
ines
3c3a175018
Merge branch 'master' into develop
2018-05-28 18:37:09 +02:00
ansgar-t
9732988951
escape html in displacy.render ( #2378 ) ( closes #2361 )
...
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp; , &lt; , &gt; , &quot; in before rendering svg
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361 )
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
ines
f7103babd9
Only overwrite warnings filter if set explicitly ( resolves #2369 )
...
This way, pre-defined warning filters are respected and users are still able to use the fine-grained warning settings if they like.
2018-05-26 18:44:15 +02:00
ines
330c039106
Merge branch 'master' into develop
2018-05-26 18:30:52 +02:00
James Messinger
4515e96e90
Better formatting for spacy train
CLI ( #2357 )
...
* Better formatting for `spacy train` CLI
Changed to use fixed-spaces rather than tabs to align table headers and data.
### Before:
```
Itn. P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token %
0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4
1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1
2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9
```
### After:
```
Itn. Dep Loss NER Loss UAS NER P. NER R. NER F. Tag % Token % CPU WPS GPU WPS
0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4
1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1
2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9
```
* Added contributor file
2018-05-25 13:08:45 +02:00
Aristo Rinjuang
432ede04af
adding more words and rephrasing ( #2351 )
...
* adding more words and rephrasing
* adding a contributor
* tokenizer bugs solved
2018-05-24 11:40:57 +02:00
Jani Monoses
ec62cadf4c
Updates to Romanian support ( #2354 )
...
* Add back Romanian in conftest
* Romanian lex_attr
* More tokenizer exceptions for Romanian
* Add tests for some Romanian tokenizer exceptions
2018-05-24 11:40:00 +02:00
Matthew Honnibal
5d281cf302
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-05-22 20:50:59 +02:00
Matthew Honnibal
ce458c2428
Fix spacy requirement constraint in package template
2018-05-22 20:50:46 +02:00
Ines Montani
862da5e793
Support pipeline factories via entry points ( #2348 )
2018-05-22 18:29:45 +02:00
Matthew Honnibal
d5af38f80c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-05-21 17:42:55 +02:00
Matthew Honnibal
ee33de8652
Fix unpickling of NER parser
2018-05-21 17:42:40 +02:00
ines
f9dbcac8e4
Merge branch 'master' into develop
2018-05-21 02:29:29 +02:00
cclauss
f7dcaa1f6b
Simplify is_config() and normalize_string_keys() ( #2305 )
...
* Simplify is_config() and normalize_string_keys()
* Use __in__ to avoid the nested _ands_ and _ors_.
* Dict comprehension directly tracks with the doc string
* Keep more basic loop in normalize_string_keys
* Whitespace
2018-05-21 01:54:35 +02:00
Ines Montani
cae4457c38
💫 Add .similarity warnings for no vectors and option to exclude warnings ( #2197 )
...
* Add logic to filter out warning IDs via environment variable
Usage: SPACY_WARNING_EXCLUDE=W001,W007
* Add warnings for empty vectors
* Add warning if no word vectors are used in .similarity methods
For example, if only tensors are available in small models – should hopefully clear up some confusion around this
* Capture warnings in tests
* Rename SPACY_WARNING_EXCLUDE to SPACY_WARNING_IGNORE
2018-05-21 01:22:38 +02:00
Matthew Honnibal
b096b22c20
Merge pull request #2247 from skrcode/1480
...
1480 - Implement Fast-Text vectors with subword features
2018-05-21 01:16:21 +02:00
Matthew Honnibal
f3b4f6a4ec
Merge setup.py
2018-05-20 23:21:00 +02:00
Ines Montani
d4cc736b7c
💫 Improve model downloads: check for existing install, customise pip and use requests library again ( #2346 )
...
* Go back to using requests instead of urllib (closes #2320 )
Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.
* Only download model if not installed (see #1456 )
Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.
* Pass additional options to pip when installing model (resolves #1456 )
Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:
python -m spacy download en --user
* Add CLI option to enable installing model package dependencies
* Revert "Add CLI option to enable installing model package dependencies"
This reverts commit 9336ffe695
.
* Update documentation
2018-05-20 20:26:56 +02:00
Matthew Honnibal
3eb446e0a5
Require thinc 6.11.1 and prepare for release to spacy-nightly
2018-05-20 19:00:34 +02:00
Matthew Honnibal
bdc23dd8c1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-05-20 18:59:24 +02:00
ines
5401c55c75
Merge branch 'master' into develop
2018-05-20 16:49:40 +02:00
ines
b59e3b157f
Don't require attrs argument in Doc.retokenize and allow both ints and unicode ( resolves #2304 )
2018-05-20 15:15:37 +02:00
ines
5768df4f09
Add SimpleFrozenDict util to use as default function argument
2018-05-20 15:13:37 +02:00
Matthew Honnibal
7431e9c87f
Fix parser for GPU
2018-05-19 17:24:34 +00:00
Matthew Honnibal
401213fb1f
Only warn about unnamed vectors if non-zero sized.
2018-05-19 18:51:55 +02:00
Matthew Honnibal
74d5c625b3
Use rising beam update prob
2018-05-16 20:11:59 +02:00
Matthew Honnibal
544ae7f1db
Merge branch 'develop' into feature/refactor-parser
2018-05-16 02:06:49 +02:00
Matthew Honnibal
d1b27fe5aa
Revert "Improve dynamic oracle when values are missing in parse"
...
This reverts commit f56bd4736b
.
2018-05-16 00:31:52 +02:00
Matthew Honnibal
83acaa0358
Add missing name attribute for parser
2018-05-15 19:01:53 +02:00
Matthew Honnibal
f328c195ca
Fix size limits in training data
2018-05-15 19:01:41 +02:00
Matthew Honnibal
8446b35ce0
Fix parser model loading
2018-05-15 18:43:46 +02:00
Matthew Honnibal
dc1a479fbd
Merge branch 'develop' into feature/refactor-parser
2018-05-15 18:39:21 +02:00
Matthew Honnibal
546dd99cdf
Merge master into develop -- mostly Arabic and website
2018-05-15 18:14:28 +02:00