mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 01:04:34 +03:00
71 lines
3.9 KiB
Plaintext
71 lines
3.9 KiB
Plaintext
//- 💫 DOCS > USAGE > FACTS & FIGURES > OTHER LIBRARIES
|
||
|
||
p
|
||
| Data scientists, researchers and machine learning engineers have
|
||
| converged on Python as the language for AI. This gives developers a rich
|
||
| ecosystem of NLP libraries to work with. Here's how we think the pieces
|
||
| fit together.
|
||
|
||
//-+aside("Using spaCy with other libraries")
|
||
| For details on how to use spaCy together with popular machine learning
|
||
| libraries like TensorFlow, Keras or PyTorch, see the
|
||
| #[+a("/usage/deep-learning") usage guide on deep learning].
|
||
|
||
+infobox
|
||
+infobox-logos(["nltk", 80, 25, "http://nltk.org"])
|
||
| #[+label-inline NLTK] offers some of the same functionality as spaCy.
|
||
| Although originally developed for teaching and research, its longevity
|
||
| and stability has resulted in a large number of industrial users. It's
|
||
| the main alternative to spaCy for tokenization and sentence segmentation.
|
||
| In comparison to spaCy, NLTK takes a much more "broad church" approach –
|
||
| so it has some functions that spaCy doesn't provide, at the expense of a
|
||
| bit more clutter to sift through. spaCy is also much more
|
||
| performance-focussed than NLTK: where the two libraries provide the same
|
||
| functionality, spaCy's implementation will usually be faster and more
|
||
| accurate.
|
||
|
||
+infobox
|
||
+infobox-logos(["gensim", 40, 40, "https://radimrehurek.com/gensim/"])
|
||
| #[+label-inline Gensim] provides unsupervised text modelling algorithms.
|
||
| Although Gensim isn't a runtime dependency of spaCy, we use it to train
|
||
| word vectors. There's almost no overlap between the libraries – the two
|
||
| work together.
|
||
|
||
+infobox
|
||
+infobox-logos(["tensorflow", 35, 42, "https://www.tensorflow.org"], ["keras", 45, 45, "https://www.keras.io"])
|
||
| #[+label-inline Tensorflow / Keras] is the most popular deep learning library.
|
||
| spaCy provides efficient and powerful feature extraction functionality,
|
||
| that can be used as a pre-process to any deep learning library. You can
|
||
| also use Tensorflow and Keras to create spaCy pipeline components, to add
|
||
| annotations to the #[code Doc] object.
|
||
|
||
+infobox
|
||
+infobox-logos(["scikitlearn", 90, 44, "http://scikit-learn.org"])
|
||
| #[+label-inline scikit-learn] features a number of useful NLP functions,
|
||
| especially for solving text classification problems using linear models
|
||
| with bag-of-words features. If you know you need exactly that, it might
|
||
| be better to use scikit-learn's built-in pipeline directly. However, if
|
||
| you want to extract more detailed features, using part-of-speech tags,
|
||
| named entity labels, or string transformations, you can use spaCy as a
|
||
| pre-process in your classification system. scikit-learn also provides a
|
||
| lot of experiment management and evaluation utilities that people use
|
||
| alongside spaCy.
|
||
|
||
+infobox
|
||
+infobox-logos(["pytorch", 100, 48, "http://pytorch.org"], ["dynet", 80, 34, "http://dynet.readthedocs.io/"], ["chainer", 80, 43, "http://chainer.org"])
|
||
| #[+label-inline PyTorch, DyNet and Chainer] are dynamic neural network
|
||
| libraries, which can be much easier to work with for NLP. Outside of
|
||
| Google, there's a general shift among NLP researchers to both DyNet and
|
||
| Pytorch. spaCy is the front-end of choice for PyTorch's
|
||
| #[code torch.text] extension. You can use any of these libraries to
|
||
| create spaCy pipeline components, to add annotations to the #[code Doc]
|
||
| object.
|
||
|
||
+infobox
|
||
+infobox-logos(["allennlp", 124, 22, "http://allennlp.org"])
|
||
| #[+label-inline AllenNLP] is a new library designed to accelerate NLP
|
||
| research, by providing a framework that supports modern deep learning
|
||
| workflows for cutting-edge language understanding problems. AllenNLP uses
|
||
| spaCy as a preprocessing component. You can also use AllenNLP to develop
|
||
| spaCy pipeline components, to add annotations to the #[code Doc] object.
|