Bram Vanroy
718704022a
Changes to spacy_conll in universe ( #4914 )
...
* Update information on spacy_conll
* Typo fix
2020-01-16 01:56:39 +01:00
Matthew Honnibal
1785eebfe0
Merge pull request #4909 from svlandeg/bugfix/cnn_window
...
bugfix typo conv_window
2020-01-14 11:23:14 +01:00
svlandeg
ee828d5a9a
bugfix typo conv_window
2020-01-14 09:02:58 +01:00
Sofie Van Landeghem
c70ccd543d
Friendly error warning for NEL example script ( #4881 )
...
* make model positional arg and raise error if no vectors
* small doc fixes
2020-01-14 01:51:14 +01:00
adrianeboyd
d2f3a44b42
Improve train CLI sentrec scoring ( #4892 )
...
* reorder to metrics to prioritize F over P/R
* add sentrec to model metrics
2020-01-08 16:52:14 +01:00
adrianeboyd
e55fa1899a
Report length of dev dataset correctly ( #4891 )
2020-01-08 16:51:51 +01:00
adrianeboyd
e1b493ae85
Add sentrec shortcut to Language ( #4890 )
2020-01-08 16:51:24 +01:00
adrianeboyd
d24bca62f6
Add CJK to character classes ( #4884 )
...
* Add CJK character class as uncased
* Incorporate Chinese URL test case
Un-xfail Chinese URL test instance
2020-01-08 16:50:19 +01:00
Preston Badeer
b216ff43c9
Update vectors-similarity.md ( #4889 )
...
These links are broken on the website, due to quotes around the URLs.
2020-01-08 16:49:40 +01:00
adrianeboyd
aef83e8070
Mark most Hungarian tokenizer test cases as slow ( #4883 )
...
* Mark most Hungarian tokenizer test cases as slow
Mark most Hungarian tokenizer test cases as slow to reduce the runtime
of the test suite in ordinary usage:
* for normal tests: run default tests plus 10% of the detailed tests
* for slow tests: run all tests
* Rework to mark individual tests as slow
2020-01-08 12:34:06 +01:00
Sofie Van Landeghem
7b96a5e10f
Reduce mem usage in training Entity Linker ( #4811 )
...
* move nlp processing for el pipe to batch training instead of preprocessing
* adding dev eval back in, and limit in articles instead of entities
* use pipe whenever possible
* few more small doc changes
* access dev data through generator
* tqdm description
* small fixes
* update documentation
2020-01-06 14:59:50 +01:00
Sofie Van Landeghem
6e9b61b49d
add warning in debug_data for punctuation in entities ( #4853 )
2020-01-06 14:59:28 +01:00
adrianeboyd
d652ff215d
Add trailing whitespace to multiline test text ( #4877 )
2020-01-06 14:58:59 +01:00
adrianeboyd
de69bc6509
Fix and improve URL pattern ( #4882 )
...
* match domains longer than `hostname.domain.tld` like `www.foo.co.uk`
* expand allowed characters in domain names while only matching
lowercase TLDs so that "this.That" isn't matched as a URL and can be
split on the period as an infix (relevant for at least English, German,
and Tatar)
2020-01-06 14:58:30 +01:00
Sofie Van Landeghem
a1b22e90cd
serialize ENT_ID ( #4852 )
...
* expand serialization test for custom token attribute
* add failing test for issue 4849
* define ENT_ID as attr and use in doc serialization
* fix few typos
2020-01-06 14:57:34 +01:00
Geoffrey Gordon Ashbrook
53929138d7
remove extra word typo ( #4875 )
...
"let you find you"
2020-01-06 12:37:42 +01:00
Ines Montani
db81604d54
Merge branch 'master' into spacy.io
2020-01-04 01:52:28 +01:00
Ines Montani
400257a802
Update index.md [ci skip]
2020-01-04 01:52:18 +01:00
Sofie Van Landeghem
581eeed98b
Warning goldparse ( #4851 )
...
* label in span not writable anymore
* Revert "label in span not writable anymore"
This reverts commit ab442338c8
.
* provide more friendly error msg for parsing file
2020-01-01 13:16:48 +01:00
Ines Montani
83e0a6f3e3
Modernize plac commands for Python 3 ( #4836 )
2020-01-01 13:15:46 +01:00
Al Johri
1aa2d4dac9
stop rendering mathjax by default in displacy ( #4840 )
...
* stop rendering mathjax by default in displacy
* Replace f-string and add comment
Co-authored-by: Ines Montani <ines@ines.io>
2020-01-01 13:15:05 +01:00
Anastasiia Iurshina
db9257559c
Adds script shebang ( #4846 )
2019-12-29 14:25:05 +01:00
Anastasiia Iurshina
1830a12578
Fixes typos ( #4843 )
...
* Fixes typos
* Fixes typo
* Contributor agreement
2019-12-29 14:24:13 +01:00
Ivan Echevarria
ef13e0c038
Add n_process to Language.pipe documentation ( #4842 ) [ci skip]
...
* Add n_process to documentation
* Auto-format and add default [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2019-12-29 14:23:33 +01:00
Al Johri
fd4a7bd2b7
sign contributor agreement for AlJohri ( #4839 ) [ci skip]
2019-12-29 14:17:28 +01:00
Ines Montani
401946d480
Un-xfail passing tests
2019-12-25 18:02:20 +01:00
Ines Montani
a892821c51
More formatting changes
2019-12-25 17:59:52 +01:00
Ines Montani
c22f075509
Update pydantic version pin [ci skip]
2019-12-25 17:29:53 +01:00
Ines Montani
33a2682d60
Add better schemas and validation using Pydantic ( #4831 )
...
* Remove unicode declarations
* Remove Python 3.5 and 2.7 from CI
* Don't require pathlib
* Replace compat helpers
* Remove OrderedDict
* Use f-strings
* Set Cython compiler language level
* Fix typo
* Re-add OrderedDict for Table
* Update setup.cfg
* Revert CONTRIBUTING.md
* Add better schemas and validation using Pydantic
* Revert lookups.md
* Remove unused import
* Update spacy/schemas.py
Co-Authored-By: Sebastián Ramírez <tiangolo@gmail.com>
* Various small fixes
* Fix docstring
Co-authored-by: Sebastián Ramírez <tiangolo@gmail.com>
2019-12-25 12:39:49 +01:00
Ines Montani
db55577c45
Drop Python 2.7 and 3.5 ( #4828 )
...
* Remove unicode declarations
* Remove Python 3.5 and 2.7 from CI
* Don't require pathlib
* Replace compat helpers
* Remove OrderedDict
* Use f-strings
* Set Cython compiler language level
* Fix typo
* Re-add OrderedDict for Table
* Update setup.cfg
* Revert CONTRIBUTING.md
* Revert lookups.md
* Revert top-level.md
* Small adjustments and docs [ci skip]
2019-12-22 01:53:56 +01:00
Ines Montani
3431ac42de
Fix typo
2019-12-21 21:17:45 +01:00
Ines Montani
21b6d6e0a8
Fix typo
2019-12-21 21:17:31 +01:00
Ines Montani
de33b6d566
Merge branch 'master' into develop
2019-12-21 21:15:46 +01:00
Ines Montani
7c69d30de5
Tidy up and expect warning
2019-12-21 21:14:52 +01:00
Sofie Van Landeghem
732142bf28
facilitate larger training files ( #4827 )
...
* add warning for large file and change start var to long
* type for file_length
2019-12-21 21:12:19 +01:00
Ines Montani
d17e7dca9e
Fix problems caused by merge conflict
2019-12-21 19:57:41 +01:00
Ines Montani
947dba7141
Merge branch 'master' into develop
2019-12-21 19:04:43 +01:00
Ines Montani
cb4145adc7
Tidy up and auto-format
2019-12-21 19:04:17 +01:00
Ines Montani
158b98a3ef
Merge branch 'master' into develop
2019-12-21 18:55:03 +01:00
Olamilekan Wahab
a741de7cf6
Adding support for Yoruba Language ( #4614 )
...
* Adding Support for Yoruba
* test text
* Updated test string.
* Fixing encoding declaration.
* Adding encoding to stop_words.py
* Added contributor agreement and removed iranlowo.
* Added removed test files and removed iranlowo to keep project bare.
* Returned CONTRIBUTING.md to default state.
* Added delted conftest entries
* Tidy up and auto-format
* Revert CONTRIBUTING.md
Co-authored-by: Ines Montani <ines@ines.io>
2019-12-21 14:11:50 +01:00
Ines Montani
554fbb04b0
Merge branch 'master' into spacy.io
2019-12-21 14:10:37 +01:00
Ines Montani
1b838d1313
Divide models into core and starters [ci skip]
2019-12-21 14:10:22 +01:00
Ines Montani
0750d59e5a
Allow setting ner_missing_tag on docs_to_json
2019-12-21 13:47:21 +01:00
Ines Montani
1bb11953e8
Merge branch 'master' into spacy.io
2019-12-20 23:00:31 +01:00
Sofie Van Landeghem
8ebbb85117
Documentation for PhraseMatcher constructor ( #4826 )
...
* add max_length as argument for init PhraseMatcher
* improve error message too
2019-12-20 23:00:04 +01:00
Sofie Van Landeghem
12158c1e3a
Restore tqdm imports ( #4804 )
...
* set 4.38.0 to minimal version with color bug fix
* set imports back to proper place
* add upper range for tqdm
2019-12-16 13:12:19 +01:00
Ines Montani
ae9fac2d87
Merge branch 'master' into spacy.io
2019-12-13 15:57:49 +01:00
Ines Montani
c466e02466
Update universe [ci skip]
2019-12-13 15:57:39 +01:00
Sofie Van Landeghem
557dcf5659
NEL requires sentences to be set ( #4801 )
2019-12-13 15:55:18 +01:00
tamuhey
1707e77c5e
add char_span to Span ( #4793 )
2019-12-13 15:54:58 +01:00