Matthew Honnibal
4b123952aa
Add option for improved NER feature extraction ( #4671 )
...
* Support option of three NER features
* Expose nr_feature parser model setting
* Give feature tokens better name
* Test nr_feature=3 for NER
* Format
2019-11-19 15:03:14 +01:00
Elijah Rippeth
5ad5c4b44a
Add initial Korean support ( #4660 )
...
* add hangul and jamo char classes.
* add initial Korean lexical attributes.
* add contributor agreement
2019-11-18 12:56:07 +01:00
Ines Montani
e8b9cee6fd
Make example consistent with model ( closes #4587 ) [ci skip]
2019-11-18 12:41:48 +01:00
Ines Montani
e01a1a237f
Auto-format [ci skip]
2019-11-18 12:41:31 +01:00
adrianeboyd
62e00fd9da
Update tokenization usage docs ( #4666 )
...
Update pseudo-code and algorithm description to correspond to current
tokenizer behavior.
Add more examples for customizing tokenizers while preserving the
existing defaults.
Minor edits / clarifications.
2019-11-18 12:35:13 +01:00
Ines Montani
5adcb352e9
Adjust order of docs sections [ci skip]
2019-11-17 16:08:56 +01:00
Ines Montani
74b951fe61
Fix xpassing tests ( #4657 )
...
* Ignore internal warnings
* Un-xfail passing tests
* Skip instead of xfail
2019-11-16 20:20:53 +01:00
Ines Montani
3bd15055ce
Fix bug in Language.evaluate for components without .pipe ( #4662 )
2019-11-16 20:20:37 +01:00
adrianeboyd
bdfb696677
Fix conllu2json converter to output all sentences ( #4656 )
...
Make sure that the last batch of sentences is output if n_sents > 1.
2019-11-15 17:08:32 +01:00
Ines Montani
d64cfce546
Remove unnecessary newline replace
2019-11-15 16:19:01 +01:00
Christoph Purschke
433748e867
Fix basic language support for Luxembourgish (by adding punctuation.py) ( #4648 )
...
* Update __init__.py
* Create punctuation.py
* Update tokenizer_exceptions.py
* Create questoph.md
* Update questoph.md
* Update test_text.py
* Update test_text.py
* Update test_text.py
* Update test_text.py
2019-11-15 16:16:47 +01:00
Ines Montani
e5b25a9cee
Update azure-pipelines.yml
2019-11-15 02:02:25 +01:00
Ines Montani
57af7c9d7f
Don't upgrade pip
2019-11-15 01:51:56 +01:00
Ines Montani
463a056e85
Merge branch 'master' of https://github.com/explosion/spaCy
2019-11-15 01:50:58 +01:00
Ines Montani
64f34d97b1
Use newer pip to try fix wheel selection on 3.8 Windows
2019-11-15 01:50:55 +01:00
Ines Montani
e30d08410a
Add CI for Python 3.8 ( #4479 )
...
* Add 3.8 classifier
* Update azure-pipelines.yml
* Remove 3.8 warning from docs [ci skip]
2019-11-15 01:13:48 +01:00
Ines Montani
98b9d387c9
Auto-format [ci skip]
2019-11-15 00:33:44 +01:00
f11r
877971860e
Fix assert in sentencizer documentation. ( #4639 )
2019-11-13 15:24:14 +01:00
Ines Montani
9d5ff177c4
Work around Markdown rendering issue surfaced in #4600 [ci skip]
2019-11-11 17:12:08 +01:00
adrianeboyd
91f89f9693
Fix realloc in retokenizer.split() ( #4606 )
...
Always realloc to a size larger than `doc.max_length` in
`retokenizer.split()` (or cymem will throw errors).
2019-11-11 16:26:46 +01:00
adrianeboyd
f415e9b7d1
Set extensions when write_conllu() is called in UD train script ( #4618 )
...
* Set extensions when write_conllu() is called
`run_eval.py` uses the `write_conllu()` function from `ud_train.py` by
itself, so it needs to set the token extensions if necessary.
* Switch from try to if
2019-11-11 16:25:03 +01:00
adrianeboyd
0b9a5f4074
Rework Chinese language initialization and tokenization ( #4619 )
...
* Rework Chinese language initialization
* Create a `ChineseTokenizer` class
* Modify jieba post-processing to handle whitespace correctly
* Modify non-jieba character tokenization to handle whitespace correctly
* Add a `create_tokenizer()` method to `ChineseDefaults`
* Load lexical attributes
* Update Chinese tag_map for UD v2
* Add very basic Chinese tests
* Test tokenization with and without jieba
* Test `like_num` attribute
* Fix try_jieba_import()
* Fix zh code formatting
2019-11-11 14:23:21 +01:00
adrianeboyd
4d85f67eee
Minor updates to language example sentences ( #4608 )
...
* Add punctuation to Spanish example sentences
* Combine multilanguage examples for lang xx
* Add punctuation to nb examples
2019-11-07 22:34:58 +01:00
Priscilla de Abreu Lopes
39e79fcc86
Bugfix/dep matcher issue 4590 ( #4601 )
...
* add contributor agreement for prilopes
* add test for issue #4590
* fix on_match params for DependencyMacther (#4590 )
2019-11-07 12:01:06 +01:00
Ines Montani
09cec3e41b
Replace function registries with catalogue ( #4584 )
...
* Replace functions registries with catalogue
* Update __init__.py
* Fix test
* Revert unrelated flag [ci skip]
2019-11-07 11:45:22 +01:00
adrianeboyd
0f8678c0b1
Fix DocBin.merge() example ( #4599 )
2019-11-07 11:26:48 +01:00
walterhenry
5563c42ef5
Fixed typo: Added space between "recognize" and "various" ( #4600 )
2019-11-06 23:06:36 +01:00
Ines Montani
828ef27a32
Add warnings about 3.8 ( resolves #4593 ) [ci skip]
2019-11-05 18:30:11 +01:00
Ines Montani
fed53b1552
Update README.md
2019-11-05 18:26:47 +01:00
Ines Montani
83381018d3
Add load_from_docbin example [ci skip]
...
TODO: upload the file somewhere
2019-11-05 11:52:43 +01:00
Sofie Van Landeghem
4ec7623288
Fix conllu script ( #4579 )
...
* force extensions to avoid clash between example scripts
* fix arg order and default file encoding
* add example config for conllu script
* newline
* move extension definitions to main function
* few more encodings fixes
2019-11-04 20:31:26 +01:00
Matthew Honnibal
4e43c0ba93
Fix multiprocessing for as_tuples=True ( #4582 )
2019-11-04 20:29:03 +01:00
Ines Montani
4b95587ad4
Update universe.json [ci skip]
2019-11-04 13:55:55 +01:00
Yash Patadia
0c396aeed4
add dframcy to universe.json ( #4580 )
2019-11-04 13:53:23 +01:00
Ines Montani
3ec231f7e1
Reorganise install_requires
2019-11-04 02:39:28 +01:00
Ines Montani
cf4ec88b38
Use latest wasabi
2019-11-04 02:38:45 +01:00
Ines Montani
d82630d7c1
Revert "Update azure-pipelines.yml"
...
This reverts commit ed1060cf59
.
2019-11-03 17:48:54 +01:00
Ines Montani
ed1060cf59
Update azure-pipelines.yml
2019-11-03 17:48:26 +01:00
Ines Montani
6ec119d976
Add error in debug-data if no dev docs are available (see #4575 )
2019-11-02 16:08:11 +01:00
adrianeboyd
56ad3a3988
Add LAS per dependency to Scorer ( #4560 )
2019-10-31 21:18:16 +01:00
Matthew Honnibal
de98d66f87
Set version to v2.2.2
2019-10-31 15:53:31 +01:00
Matthw Honnibal
55f2241d72
Merge branch 'master' of https://github.com/explosion/spaCy
2019-10-31 15:37:52 +01:00
Ines Montani
df4c9ae3dc
Fix formatting [ci skip]
2019-10-31 15:10:25 +01:00
Ines Montani
59358d9b71
Remove box-decoration-break from entities in displacy ( #4564 )
2019-10-31 15:09:43 +01:00
Matthw Honnibal
8b9954d1b7
Set version to v2.2.2.dev5
2019-10-31 15:06:19 +01:00
Ines Montani
2c107f02a4
Auto-format [ci skip]
2019-10-31 15:01:56 +01:00
Matthew Honnibal
e82306937e
Put Tok2Vec refactor behind feature flag ( #4563 )
...
* Add back pre-2.2.2 tok2vec
* Add simple tok2vec tests
* Add simple tok2vec tests
* Reformat
* Fix CharacterEmbed in new tok2vec
* Fix legacy tok2vec
* Resolve circular imports
* Fix test for Python 2
2019-10-31 15:01:15 +01:00
Ines Montani
828108a57f
Update README.md [ci skip]
2019-10-31 13:23:25 +01:00
Ines Montani
5e9849b60f
Auto-format [ci skip]
2019-10-30 19:27:18 +01:00
Ines Montani
afe4a428f7
Fix pipeline analysis on remove pipe ( #4557 )
...
Validate *after* component is removed, not before
2019-10-30 19:04:17 +01:00