Ines Montani
864a697e63
Merge branch 'develop' into master-tmp
2020-09-04 13:15:36 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example ( #5989 )
...
* Update suffixes example
The current example will throw `TypeError: can only concatenate list (not "tuple") to list`
* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Shashank
450720aca2
Added support for Sanskrit language ( #5956 )
...
* Added support for Sanskrit language
* Added tests for lexical attribute like_num
2020-08-25 10:56:29 +02:00
idoshr
b10c7bc56e
Hebrew like num ( #5952 )
...
* Update stop_words.py
Hebrew STOP WORDS
* Update stop_words.py
* contributor
* contributor
* add some common domain extentions
support human number 1K/1M....
* support human number 1K/1M....
* hebrew number tokenize
1K/1M implement in EN
* test human tokenize fix
* test
* heb like num
revert human number change
* heb like num
2020-08-24 14:30:05 +02:00
Attila Szász
669dc70822
Create tilusnet.md ( #5914 )
2020-08-12 22:46:08 +02:00
Adam Bittlingmayer
7b33b2854f
Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct ( #5910 )
...
* Add Armenian sentence-final verchaket
* Add Greek and Arabic question marks, and contributor agreement
* Check box
2020-08-12 15:36:14 +02:00
graue70
49e690bde1
Fix typos in comments ( #5904 )
...
* Fix typo in comment
* Fix typo
* Add spaCy Contributor Agreement
2020-08-12 15:35:25 +02:00
holubvl3
d16c0f2c3a
Create holubvl3 ( #5845 )
...
* Create holubvl3
* Rename holubvl3 to holubvl3.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-07-30 17:40:31 +02:00
Gustavo Zadrozny Leyendecker
90b958fd01
Fix on EntityRendered to support break lines (after last entity) ( closes #5838 )
2020-07-29 18:48:39 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file ( #5810 )
...
* fix the wrong hash url in adding-languages.md file
change the #101 url hash path to #language-data
* filled in the spaCy Contributor Agreement
filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Joshua Olson
6d4d5c074c
Mark Japanese documents as tagged. ( #5803 )
...
Mark the document as tagged before returning it to the user from the JapaneseTokenizer.
Fixes #5802
2020-07-23 08:57:01 +02:00
Ines Montani
644074b954
Merge branch 'develop' into master-tmp
2020-07-20 14:58:04 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe ( #5777 )
...
* Update universe.json
Add cov-bsv to "resources"
* Update universe.json
* add contributor agreement
2020-07-19 13:35:31 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json ( #5717 )
...
* Adding spaczz package to universe.json
* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json ( #5716 )
...
* Add texthero to universe.json
* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Mike Izbicki
7a2ca00794
fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab ( #5701 )
...
* speed up Korean nlp 100x by stopping mecab from reloading on each doc
* add contributor agreement
* rename variables to improve code readability
2020-07-06 17:03:33 +02:00
Sebastián Ramírez
b985cc4025
📄 Add spaCy Contributor Agreement
2020-07-01 20:57:21 +02:00
Ines Montani
414dc7ace1
Merge branch 'spacy.io' into spacy.io-develop
2020-07-01 11:47:47 +02:00
Matthias Hertel
305221f3e5
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:55 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:23 +02:00
PluieElectrique
90c7eb0e2f
Reduce memory usage of Lookup's BloomFilter ( #5606 )
...
* Reduce memory usage of Lookup's BloomFilter
* Remove extra Table update
2020-06-26 14:09:10 +02:00
Richard Liaw
0ef78bad93
contribute ( #5632 )
2020-06-23 08:53:58 +02:00
Rameshh
c34420794a
Add Nepali Language ( #5622 )
...
* added support for nepali lang
* added examples and test files
* added spacy contributor agreement
2020-06-22 10:25:46 +02:00
Karen Hambardzumyan
ff6a084e9c
Create mahnerak.md ( #5615 )
2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan
ccd7edf04b
Create myavrum.md ( #5612 )
2020-06-19 18:34:27 +02:00
Arvind Srinivasan
aa5b40fa64
Added Tamil Example Sentences ( #5583 )
...
* Added Examples for Tamil Sentences
#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107
#### Type of Change
This is an enhancement.
* Accepting spaCy Contributor Agreement
* Signed on my behalf as an individual
2020-06-13 15:56:26 +02:00
theudas
fa46e0bef2
Added Parameter to NEL to take n sentences into account ( #5548 )
...
* added setting for neighbour sentence in NEL
* added spaCy contributor agreement
* added multi sentence also for training
* made the try-except block smaller
2020-06-12 02:03:23 +02:00
Sofie Van Landeghem
18c6dc8093
removing label both on comment and on close
2020-06-11 14:09:40 +02:00
Jones Martins
28db7dd5d9
Add missing pronoums/determiners ( #5569 )
...
* Add missing pronoums/determiners
* Add test for missing pronoums
* Add contributor file
2020-06-10 18:47:04 +02:00
Sofie Van Landeghem
12c1965070
set delay to 7 days
2020-06-10 10:46:12 +02:00
Sofie Van Landeghem
86112d2168
update issue manager's version
2020-06-09 08:57:38 +02:00
Martino Mensio
de00f967ce
adding spacy-universal-sentence-encoder ( #5534 )
...
* adding spacy-universal-sentence-encoder
* update affiliation
* updated code example
2020-06-08 20:26:30 +02:00
Sofie Van Landeghem
d1799da200
bot for answered issues ( #5563 )
...
* add tiangolo's issue manager
* fix formatting
* spaces, tabs, who knows
* formatting
* I'll get this right at some point
* maybe one more space ?
2020-06-08 19:47:32 +02:00
Hiroshi Matsuda
456bf47f51
fix a bug causing mis-alignments ( #5560 )
2020-06-08 15:49:34 +02:00
Leo
7d5a89661e
contributor agreement signed ( #5525 )
2020-05-31 20:13:39 +02:00
Rajat
8b8efa1b42
update spacy universe with my project ( #5497 )
...
* added contextualSpellCheck in spacy universe meta
* removed extra formatting by code
* updated with permanent links
* run json linter used by spacy
* filled SCA
* updated the description
2020-05-25 11:30:23 +02:00
Jannis
aa53ce6996
Documentation Typo Fix ( #5492 )
...
* Fix typo
Change 'realize' to 'realise'
* Add contributer agreement
2020-05-22 19:50:26 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
...
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Kevin Lu
291b9ad7b9
Update CONTRIBUTOR_AGREEMENT.md
2020-05-19 20:29:53 -07:00
Kevin Lu
9a1a535215
Create kevinlu1248.md
2020-05-19 20:25:45 -07:00
Kevin Lu
a23b3a5a50
Update CONTRIBUTOR_AGREEMENT.md
2020-05-19 20:24:24 -07:00
Ines Montani
a41e28ceba
Merge pull request #5436 from ilivans/fix_errors_with_codes
2020-05-18 10:45:56 +02:00
Ilkyu Ju
72a25c9cef
Very minor issues in Korean example sentences ( #5446 )
...
* Add contributor agreement
* Improve ko translation of example sentences
I fixed unnatural translations and word spacing errors.
* Update osori.md
2020-05-17 13:43:34 +02:00
Ilia Ivanov
ee8fe37474
Add ilivans' contributor agreement
2020-05-14 15:59:06 +02:00
Vishnu Priya VR
9ce059dd06
Limiting noun_chunks for specific languages ( #5396 )
...
* Limiting noun_chunks for specific langauges
* Limiting noun_chunks for specific languages
Contributor Agreement
* Addressing review comments
* Removed unused fixtures and imports
* Add fa_tokenizer in test suite
* Use fa_tokenizer in test
* Undo extraneous reformatting
Co-authored-by: adrianeboyd <adrianeboyd@gmail.com>
2020-05-14 12:58:06 +02:00
Travis Hoppe
d4cc18b746
Added author information for NLPre ( #5414 )
...
* Add author links for NLPre and update category
* Add contributor statement
2020-05-08 11:28:54 +02:00
Samuel Rodríguez Medina
8602daba85
Swedish like_num ( #5371 )
...
* Sign contributor agreement.
* Add like_num functionality to Swedish.
* Update spacy/tests/lang/sv/test_lex_attrs.py
Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update contributor agreement
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-29 21:25:22 +02:00
adrianeboyd
a6e521cd79
Add is_sent_end token property ( #5375 )
...
Reconstruction of the original PR #4697 by @MiniLau.
Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Louis Guitton
a27c4014f5
Add mlflow to spaCy universe ( #5352 )
...
* Add mlflow to universe
* Use mlflow black logo
2020-04-29 10:18:03 +02:00
Michael
5b5528ff2e
Add !=3.4.*
to python_requires ( #5344 )
...
Missed in 80d554f2e2
2020-04-27 22:02:09 +02:00
Punitvara
b2b7e1f37a
This PR adds Gujarati Language class along with ( #5355 )
...
* This PR adds Gujarati Language class along with
- stop words
* Add test for gu tokenizer
2020-04-27 11:07:37 +02:00
sabiqueqb
fc91660aa2
Gh 5339 language class for malayalam ( #5342 )
...
* Initialize Malayalam Language class
* Add lex_attrs and examples for Malayalam
* Add spaCy Contributor Agreement
* Add test for ml tokenizer
2020-04-27 09:45:08 +02:00
Mike
481574cbc8
[minor doc change] embedding vis. link is broken in website/docs/usage/examples.md
( #5325 )
...
* The embedding vis. link is broken
The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?
* contributor agreement
* Update Mlawrence95.md
* Update website/docs/usage/examples.md
Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-21 20:35:12 +02:00
laszabine
fb73d4943a
Amend documentation to Language.evaluate ( #5319 )
...
* Specified usage of arguments to Language.evaluate
* Created contributor agreement
2020-04-16 20:00:18 +02:00
Jakob Jul Elben
663333c3b2
Fixes #5413 ( #5315 )
...
* Fix 5314
* Add contributor
* Resolve requested changes
Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>
2020-04-16 13:29:02 +02:00
Sébastien Harinck
dac70f29eb
contrib: add contributor agreement for user sebastienharinck ( #5316 )
2020-04-16 11:32:09 +02:00
Paolo Arduin
1ca32d8f9c
Matcher support for Span as well as Doc ( #5113 )
...
* Matcher support for Span, as well as Doc #5056
* Removes an import unused
* Signed contributors agreement
* Code optimization and better test
* Add error message for bad Matcher call argument
* Fix merging
2020-04-15 13:51:33 +02:00
Thomas Thiebaud
1eef60c658
Add spacy_fastlang to universe ( #5271 )
...
* Add spacy_fastlang to universe
* Sign SCA
2020-04-15 13:50:46 +02:00
Paolo Arduin
8ce408d2e1
Comparison predicate handling for !=
( #5282 )
...
* Fix #5281
* Optim test
2020-04-14 19:14:15 +02:00
Marek Grzenkowicz
6a8a52650f
[ Closes #5292 ] Fix typo in option name "--n-save_every" ( #5293 )
...
* Sign contributor agreement for chopeen
* Fix typo in option name and close #5292
2020-04-11 23:35:01 +02:00
Umar Butler
8952effcc4
Fixed Typo in Warning ( #5284 )
...
* Fixed typo in cli warning
Fixed a typo in the warning for the provision of exactly two labels, which have not been designated as binary, to textcat.
* Create and signed contributor form
2020-04-09 15:46:15 +02:00
Leander Fiedler
b63871ceff
issue5230: added contributors agreement
2020-04-06 21:04:06 +02:00
vincent d warmerdam
f329d5663a
add "whatlies" to spaCy universe ( #5252 )
...
* Add "whatlies"
We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)
* sign contributor thing
* Added fancy gif
as the image
* Update universe.json
Spellin error and spaCy clarification.
2020-04-06 11:29:30 +02:00
YohannesDatasci
beef184e53
Armenian language support ( #5246 )
...
* add Armenian language and test cases
* agreement submission
2020-04-03 13:02:18 +02:00
Michael Leichtfried
2b14997b68
Remove duplicated branch in if/else-if statement ( #5234 )
...
* Remove duplicated branch in if-elif-statement
* Add contributor agreement for leicmi
2020-04-02 14:47:42 +02:00
Jacob Lauritzen
0b76212831
Extend and fix Danish examples ( #5227 )
...
* Extend and fix Danish examples
This PR fixes two examples, adds additional examples translated from the english version, and adds punctuation.
The two changed examples are:
* "fortov" changed to "fortovet", which is more [used](https://www.google.com/search?client=firefox-b-d&sxsrf=ALeKk0143gEuPe4IbIUpzBBt-oU10OMVqA%3A1585549036477&ei=7I6BXuvJHMGOrwSqi46oCQ&q=l%C3%B8behjul+p%C3%A5+fortov&oq=l%C3%B8behjul+p%C3%A5+fortov&gs_lcp=CgZwc3ktYWIQAzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQR1DT8xZY0_MWYK_0FmgAcAZ4AIABAIgBAJIBAJgBAKABAaoBB2d3cy13aXo&sclient=psy-ab&ved=0ahUKEwjr7964xsHoAhVBx4sKHaqFA5UQ4dUDCAo&uact=5 ) and more natural. The Swedish and Norwegian examples also use this version of the word.
* "stor by" changed to "storby". In Danish we have a specific noun to describe a large, metropolitan city which is different from just describing a city as "large". In this sentence it would be much more natural to describe London as a "storby". Google even correct as search for "London stor by" to "London storby".
* Sign contrib agreement
2020-04-02 10:42:35 +02:00
Nikhil Saldanha
4f27a24f5b
Add kannada examples ( #5162 )
...
* Add example sentences for Kannada
* sign contributor agreement
2020-03-29 13:54:42 +02:00
Tom Milligan
e904958115
Limit to cupy-cuda v8, so as not to pull in v9 automatically. ( #5194 )
2020-03-29 13:52:08 +02:00
Tiljander
e53232533b
Describing priority rules for overlapping matches ( #5197 )
...
* Describing priority rules for overlapping matches
* Create Tiljander.md
* Describing priority rules for overlapping matches
* Update website/docs/api/entityruler.md
Co-Authored-By: Ines Montani <ines@ines.io>
Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 13:13:22 +01:00
Ines Montani
3fc2309c48
Merge pull request #5174 from Baciccin/master
...
Add Ligurian language
2020-03-24 16:33:59 +01:00
Philip Gillißen
128acb9ee1
Update guerda.md
2020-03-24 10:42:30 +01:00
Philip Gillißen
5d067bcc5e
Add SCA for guerda
2020-03-24 10:42:10 +01:00
Baciccin
3b53617a69
Add Ligurian language
2020-03-19 21:37:01 -07:00
Ines Montani
17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
...
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Alan Chan
1ae01684cf
Fill in contributor agreement
2020-03-15 03:45:20 +08:00
nihil
9cde7eb08c
add spacy_syllables to universe + sign contributor agreement
2020-03-13 18:09:42 +01:00
Himanshu Garg
27d1300bdb
Create merrcury.md
2020-03-10 15:11:07 +05:30
Mark Abraham
0345135167
Tokenizer to_disk and from_disk now ensure paths ( #5116 )
...
* Tokenizer to_disk and from_disk now ensure strings are converted to paths
Fixes #5115
* Sign contributor agreement
2020-03-08 13:25:56 +01:00
David Pollack
80004930ed
fix typo in svg file
2020-03-05 17:04:33 +01:00
Tom Keefe
ddf63b97a8
make idx available via to_array ( #5030 )
2020-02-22 14:13:06 +01:00
Jan Jessewitsch
c7e4fe9c5c
Fix/Improve german stop words ( #5024 )
...
* Fix german stop words
Two stop words ("einige" and "einigen") are sticking together.
Remove three nouns that may serve as stop words in a specific context (e.g. religious or news) but are not applicable for general use.
* Create Jan-711.md
2020-02-17 18:59:22 +01:00
Filip Bednárik
d4f4060bf3
Add Slovak language tools implementation ( #4943 )
...
* Add correct stopwords for Slovak language
* Add SNK Tags
* Disable formatting lint for TAGS
* Add example sentences for Slovak language
* Add slovak numerals in base form
* Add lex_attrs to sk init
* Add contributor agreement
2020-02-03 13:03:59 +01:00
Tyler Couto
9fa9d7f2cb
Fix for Issue 4665 - conllu2json ( #4953 )
...
* Fix for Issue 4665 - conllu2json
- Allowing HEAD to be an underscore
* Added contributor agreement
2020-02-03 13:01:48 +01:00
Paco Nathan
49fefb6139
Submitting PyTextRank
for inclusion in the spaCy uniVerse ( #4942 )
...
* submitting PyTextRank for consideration of including in the spaCy uniVerse
* including SCA
2020-01-28 11:37:54 +01:00
Anastasiia Iurshina
1830a12578
Fixes typos ( #4843 )
...
* Fixes typos
* Fixes typo
* Contributor agreement
2019-12-29 14:24:13 +01:00
Ivan Echevarria
ef13e0c038
Add n_process to Language.pipe documentation ( #4842 ) [ci skip]
...
* Add n_process to documentation
* Auto-format and add default [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2019-12-29 14:23:33 +01:00
Al Johri
fd4a7bd2b7
sign contributor agreement for AlJohri ( #4839 ) [ci skip]
2019-12-29 14:17:28 +01:00
Olamilekan Wahab
a741de7cf6
Adding support for Yoruba Language ( #4614 )
...
* Adding Support for Yoruba
* test text
* Updated test string.
* Fixing encoding declaration.
* Adding encoding to stop_words.py
* Added contributor agreement and removed iranlowo.
* Added removed test files and removed iranlowo to keep project bare.
* Returned CONTRIBUTING.md to default state.
* Added delted conftest entries
* Tidy up and auto-format
* Revert CONTRIBUTING.md
Co-authored-by: Ines Montani <ines@ines.io>
2019-12-21 14:11:50 +01:00
Nicolai Bjerre Pedersen
de5453cdcb
Fix link to user hooks in docs ( #4778 )
...
* Fix link to user hooks in docs
* Update mr_bjerre.md
Mistake in contributor agreement
* Apparently hard to get it right (wrong name of sca)
2019-12-06 19:17:12 +01:00
Antti Ajanki
e626a011cc
Improvements to the Finnish language data ( #4738 )
...
* Enable lex_attrs on Finnish
* Copy the Danish tokenizer rules to Finnish
Specifically, don't break hyphenated compound words
* Contributor agreement
* A new file for Finnish tokenizer rules instead of including the Danish ones
2019-12-03 12:55:28 +01:00
Matt Maybeno
c9f1e99787
Agnostic vocab array fix ( #4680 )
...
* Use get_array_module instead of numpy
* add contributor agreement
2019-11-23 14:59:52 +01:00
GuiGel
8f7ab70870
Bugfix/fix entity ruler from disk ( #4670 )
...
* fix EntityRuler from_disk bug
* add contributor file
* Test EntityRuler PhraseMatcher deserialization (#4651 )
* newline at end of file
* fix copy paste error
* serializing the EntityRuler by itself
* Add unicode declarations for Python 2 and auto-format
2019-11-21 16:26:37 +01:00
Elijah Rippeth
5ad5c4b44a
Add initial Korean support ( #4660 )
...
* add hangul and jamo char classes.
* add initial Korean lexical attributes.
* add contributor agreement
2019-11-18 12:56:07 +01:00
Christoph Purschke
433748e867
Fix basic language support for Luxembourgish (by adding punctuation.py) ( #4648 )
...
* Update __init__.py
* Create punctuation.py
* Update tokenizer_exceptions.py
* Create questoph.md
* Update questoph.md
* Update test_text.py
* Update test_text.py
* Update test_text.py
* Update test_text.py
2019-11-15 16:16:47 +01:00
Priscilla de Abreu Lopes
39e79fcc86
Bugfix/dep matcher issue 4590 ( #4601 )
...
* add contributor agreement for prilopes
* add test for issue #4590
* fix on_match params for DependencyMacther (#4590 )
2019-11-07 12:01:06 +01:00
Neel Kamath
6c036ab57d
Add "spaCy Server" to spaCy Universe ( #4553 )
...
* Add "spaCy Server" to spaCy Universe
* Accept the spaCy Contributor Agreement
2019-10-30 13:20:46 +01:00
Ines Montani
1185702993
Port over contributor agreement from spacy-lookups-data [ci skip]
2019-10-25 13:06:10 +02:00
Zhuoru Lin
10d88b09bb
Bugfix/fix wikidata train entity linker ( #4509 )
...
* Fix labels_discard Nonetype iteration error
* Contributor agreement for Zhuoru Lin
* Enhance EntityLinker.predict() to handle labels_discard is None case.
2019-10-24 12:52:59 +02:00
gustavengstrom
050e2445a8
Adding noun_chunks to the Swedish language model (sv) ( #4422 )
...
* Create syntax_iterators.py
Replica of spacy/lang/fr/syntax_iterators.py
* Added import statements for SYNTAX_ITERATORS
* Create gustavengstrom.md
* Added "dobj" to list of labels in noun_chunks method and a test_noun_chunks method to the Swedish language model.
* Delete README-checkpoint.md
Co-authored-by: Gustav <gustav@davcon.se>
Co-authored-by: Ines Montani <ines@ines.io>
2019-10-21 12:57:06 +02:00
Pepe Berba
7772d5d3c5
Update vocab.get_vector
docs to include features on Fasttext ngram ( #4464 )
...
* Update `vocab.get_vector`
* Added contrib agreement
2019-10-20 01:28:18 +02:00