Adriane Boyd
167df42cb6
Move lemmatizer is_base_form to language settings ( #5663 )
...
Move `Lemmatizer.is_base_form` to the language settings so that each
language can provide a language-specific method as
`LanguageDefaults.is_base_form`.
The existing English-specific `Lemmatizer.is_base_form` is moved to
`EnglishDefaults`.
2020-06-29 14:16:57 +02:00
Ines Montani
c685ee734a
Fix compat for v2.x branch
2020-05-22 14:22:36 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
...
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
svlandeg
36a94c409a
failing test to reproduce overlapping spans problem
2020-05-20 23:06:03 +02:00
adrianeboyd
40e65d6f63
Fix most_similar for vectors with unused rows ( #5348 )
...
* Fix most_similar for vectors with unused rows
Address issues related to the unused rows in the vector table and
`most_similar`:
* Update `most_similar()` to search only through rows that are in use
according to `key2row`.
* Raise an error when `most_similar(n=n)` is larger than the number of
vectors in the table.
* Set and restore `_unset` correctly when vectors are added or
deserialized so that new vectors are added in the correct row.
* Set data and keys to the same length in `Vocab.prune_vectors()` to
avoid spurious entries in `key2row`.
* Fix regression test using `most_similar`
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-19 16:41:26 +02:00
Sofie Van Landeghem
cfdaf99b80
Fix passing of component configuration ( #5374 )
...
* add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument
* add fix and test for Issue 5137
2020-04-29 12:56:17 +02:00
Sofie Van Landeghem
f67343295d
Update NEL examples and documentation ( #5370 )
...
* simplify creation of KB by skipping dim reduction
* small fixes to train EL example script
* add KB creation and NEL training example scripts to example section
* update descriptions of example scripts in the documentation
* moving wiki_entity_linking folder from bin to projects
* remove test for wiki NEL functionality that is being moved
2020-04-29 12:53:53 +02:00
adrianeboyd
f8ac5b9f56
bugfix in span similarity ( #5155 ) ( #5358 )
...
* bugfix in span similarity
* also rewrite doc.pyx for clarity
* formatting
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-27 16:51:27 +02:00
Jakob Jul Elben
663333c3b2
Fixes #5413 ( #5315 )
...
* Fix 5314
* Add contributor
* Resolve requested changes
Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>
2020-04-16 13:29:02 +02:00
Leander Fiedler
d60e2d3ebf
issue5230 added unit test for dumping and loading knowledgebase
2020-04-12 09:08:41 +02:00
Leander Fiedler
d2bb649227
issue5230 filter warnings in addition to filterwarnings to prevent deprecation warnings in python35(win) setup to pop up
2020-04-10 23:21:13 +02:00
Leander Fiedler
ca2a7a44db
issue5230 store string values of warnings to remotely debug failing python35(win) setup
2020-04-10 22:26:55 +02:00
Leander Fiedler
88ca40a15d
issue5230 raise warnings as errors to remotely debug failing python35(win) setup
2020-04-10 21:45:53 +02:00
Leander Fiedler
a7bdfe42e1
issue5230 added print statement to warnings filter to remotely debug failing python35(win) setup
2020-04-10 21:14:33 +02:00
Leander Fiedler
8c1d0d628f
issue5230 writer now checks instance of loc parameter before trying to operate on it
2020-04-10 20:35:52 +02:00
lfiedler
e1e25c7e30
issue5230: added unittest test case for completion
2020-04-06 21:36:02 +02:00
Leander Fiedler
cde96f6c64
issue5230: optimized unit test a bit
2020-04-06 20:51:12 +02:00
Leander Fiedler
71cc903d65
issue5230: replaced open statements on path objects so that serialization still works an files are closed
2020-04-06 20:30:41 +02:00
Leander Fiedler
273ed452bb
issue5230: added unicode declaration at top of the file
2020-04-06 19:22:32 +02:00
Leander Fiedler
1cd975d4a5
issue5230: fixed resource warnings in language
2020-04-06 18:54:32 +02:00
Leander Fiedler
493c77462a
issue5230: test cases
...
covering known sources of resource warnings
2020-04-06 18:46:51 +02:00
Ines Montani
828acffc12
Tidy up and auto-format
2020-03-25 12:28:12 +01:00
Sofie Van Landeghem
1a2b8fc264
set vector of merged entity ( #5085 )
...
* merge_entities sets the vector in the vocab for the merged token
* add unit test
* import unicode_literals
* move code to _merge function
* only set vector if vocab has non-zero vectors
2020-03-06 14:45:28 +01:00
Sofie Van Landeghem
d307e9ca58
take care of global vectors in multiprocessing ( #5081 )
...
* restore load_nlp.VECTORS in the child process
* add unit test
* fix test
* remove unnecessary import
* add utf8 encoding
* import unicode_literals
2020-03-03 13:58:22 +01:00
Sofie Van Landeghem
c6b12ab02a
Bugfix/get doc ( #5049 )
...
* new (broken) unit test
* fixing get_doc method
2020-03-02 11:49:28 +01:00
svlandeg
b49a3afd0c
use clean_underscore fixture
2020-02-23 15:49:20 +01:00
svlandeg
6e717c62ed
avoid the tests interacting with eachother through the global Underscore variable
2020-02-12 13:21:31 +01:00
svlandeg
7939c63886
use English instead of model
2020-02-12 12:26:27 +01:00
svlandeg
46628d8890
add some asserts
2020-02-12 12:12:52 +01:00
svlandeg
51d37033c8
remove old comment
2020-02-12 12:10:05 +01:00
svlandeg
05dedaa2cf
add unit test
2020-02-12 12:00:13 +01:00
Tyler Couto
9fa9d7f2cb
Fix for Issue 4665 - conllu2json ( #4953 )
...
* Fix for Issue 4665 - conllu2json
- Allowing HEAD to be an underscore
* Added contributor agreement
2020-02-03 13:01:48 +01:00
Yohei Tamura
708a4d27eb
fix nlp.evaluate ( #4924 ) ( #4925 )
...
* new file: test_issue4924.py
* modified: spacy/gold.pyx
* modified: test_issue4924.py for python2
2020-01-20 12:17:46 +01:00
Sofie Van Landeghem
a1b22e90cd
serialize ENT_ID ( #4852 )
...
* expand serialization test for custom token attribute
* add failing test for issue 4849
* define ENT_ID as attr and use in doc serialization
* fix few typos
2020-01-06 14:57:34 +01:00
Ines Montani
3431ac42de
Fix typo
2019-12-21 21:17:45 +01:00
Ines Montani
7c69d30de5
Tidy up and expect warning
2019-12-21 21:14:52 +01:00
Ines Montani
cb4145adc7
Tidy up and auto-format
2019-12-21 19:04:17 +01:00
Sofie Van Landeghem
f9b541f9ef
More robust set entities method in KB ( #4794 )
...
* add unit test for setting entities with duplicate identifiers
* count the number of actual unique identifiers and throw duplicate warning
2019-12-13 10:45:29 +01:00
Ines Montani
5b36dec7eb
Auto-exclude disabled when calling from_disk during load ( #4708 )
2019-11-25 16:01:22 +01:00
Ines Montani
5d4eede1e4
Fix test util imports
2019-11-21 16:28:29 +01:00
GuiGel
8f7ab70870
Bugfix/fix entity ruler from disk ( #4670 )
...
* fix EntityRuler from_disk bug
* add contributor file
* Test EntityRuler PhraseMatcher deserialization (#4651 )
* newline at end of file
* fix copy paste error
* serializing the EntityRuler by itself
* Add unicode declarations for Python 2 and auto-format
2019-11-21 16:26:37 +01:00
Ines Montani
5bf9ab5b03
Tidy up and auto-format
2019-11-20 13:16:33 +01:00
Ines Montani
6e303de717
Auto-format
2019-11-20 13:15:24 +01:00
Ines Montani
74b951fe61
Fix xpassing tests ( #4657 )
...
* Ignore internal warnings
* Un-xfail passing tests
* Skip instead of xfail
2019-11-16 20:20:53 +01:00
Priscilla de Abreu Lopes
39e79fcc86
Bugfix/dep matcher issue 4590 ( #4601 )
...
* add contributor agreement for prilopes
* add test for issue #4590
* fix on_match params for DependencyMacther (#4590 )
2019-11-07 12:01:06 +01:00
Ines Montani
a90025b277
Fix serialization of extension attr values in DocBin ( #4540 )
2019-10-28 16:02:13 +01:00
Ines Montani
96bb8f2187
Add regression test for #4528 [ci skip]
2019-10-28 14:36:03 +01:00
Ines Montani
c5e41247e8
Tidy up and auto-format
2019-10-28 12:43:55 +01:00
Sofie Van Landeghem
8e7414dace
Match pop with append for training format ( #4516 )
...
* trying to fix script - not succesful yet
* match pop() with extend() to avoid changing the data
* few more pop-extend fixes
* reinsert deleted print statement
* fix print statement
* add last tested version
* append instead of extend
* add in few comments
* quick fix for 4402 + unit test
* fixing number of docs (not counting cats)
* more fixes
* fix len
* print tmp file instead of using data from examples dir
* print tmp file instead of using data from examples dir (2)
2019-10-27 16:01:32 +01:00
tamuhey
fcd25db033
[ #4529 ] fix: gold pyx ( #4530 )
...
* fix: gold pyx
* remove print
* skip test in python2
* Add unicode declarations and don't skip test on Python 2
2019-10-27 13:50:07 +01:00