* Update Makefile
For more recent python version
* updated for bsc changes
New tokenization changes
* Update test_text.py
* updating tests and requirements
* changed failed test in test/lang/ca
changed failed test in test/lang/ca
* Update .gitignore
deleted stashed changes line
* back to python 3.6 and remove transformer requirements
As per request
* Update test_exception.py
Change the test
* Update test_exception.py
Remove test print
* Update Makefile
For more recent python version
* updated for bsc changes
New tokenization changes
* updating tests and requirements
* Update requirements.txt
Removed spacy-transfromers from requirements
* Update test_exception.py
Added final punctuation to ensure consistency
* Update Makefile
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Format
* Update test to check all tokens
Co-authored-by: cayorodriguez <crodriguezp@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Accept Doc input in pipelines
Allow `Doc` input to `Language.__call__` and `Language.pipe`, which
skips `Language.make_doc` and passes the doc directly to the pipeline.
* ensure_doc helper function
* avoid running multiple processes on GPU
* Update spacy/tests/test_language.py
Co-authored-by: svlandeg <svlandeg@github.com>
* Validate pos values when creating Doc
* Add clear error when setting invalid pos
This also changes the error language slightly.
* Fix variable name
* Update spacy/tokens/doc.pyx
* Test that setting invalid pos raises an error
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* First take at StringStore/Vocab docs
Things to check:
1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?
* Update docs
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updates based on review feedback
* Minor fix
* Move example code down
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Remove two attributes marked for removal in 3.1
* Add back unused ints with changed names
* Change data_dir to _unused_object
This is still kept in the type definition, but I removed it from the
serialization code.
* Put serialization code back for now
Not sure how this interacts with old serialized models yet.
* Replace all basestring references with unicode
`basestring` was a compatability type introduced by Cython to make
dealing with utf-8 strings in Python2 easier. In Python3 it is
equivalent to the unicode (or str) type.
I replaced all references to basestring with unicode, since that was
used elsewhere, but we could also just replace them with str, which
shoudl also be equivalent.
All tests pass locally.
* Replace all references to unicode type with str
Since we only support python3 this is simpler.
* Remove all references to unicode type
This removes all references to the unicode type across the codebase and
replaces them with `str`, which makes it more drastic than the prior
commits. In order to make this work importing `unicode_literals` had to
be removed, and one explicit unicode literal also had to be removed (it
is unclear why this is necessary in Cython with language level 3, but
without doing it there were errors about implicit conversion).
When `unicode` is used as a type in comments it was also edited to be
`str`.
Additionally `coding: utf8` headers were removed from a few files.