spaCy/website/usage/_facts-figures/_other-libraries.jade

71 lines
3.9 KiB
Plaintext
Raw Normal View History

2017-10-03 15:26:20 +03:00
//- 💫 DOCS > USAGE > FACTS & FIGURES > OTHER LIBRARIES
p
| Data scientists, researchers and machine learning engineers have
| converged on Python as the language for AI. This gives developers a rich
| ecosystem of NLP libraries to work with. Here's how we think the pieces
| fit together.
2017-11-07 16:48:17 +03:00
//-+aside("Using spaCy with other libraries")
2017-10-03 15:26:20 +03:00
| For details on how to use spaCy together with popular machine learning
| libraries like TensorFlow, Keras or PyTorch, see the
| #[+a("/usage/deep-learning") usage guide on deep learning].
+infobox
+infobox-logos(["nltk", 80, 25, "http://nltk.org"])
| #[+label-inline NLTK] offers some of the same functionality as spaCy.
| Although originally developed for teaching and research, its longevity
| and stability has resulted in a large number of industrial users. It's
| the main alternative to spaCy for tokenization and sentence segmentation.
| In comparison to spaCy, NLTK takes a much more "broad church" approach
| so it has some functions that spaCy doesn't provide, at the expense of a
| bit more clutter to sift through. spaCy is also much more
| performance-focussed than NLTK: where the two libraries provide the same
| functionality, spaCy's implementation will usually be faster and more
| accurate.
+infobox
+infobox-logos(["gensim", 40, 40, "https://radimrehurek.com/gensim/"])
| #[+label-inline Gensim] provides unsupervised text modelling algorithms.
| Although Gensim isn't a runtime dependency of spaCy, we use it to train
| word vectors. There's almost no overlap between the libraries the two
| work together.
+infobox
+infobox-logos(["tensorflow", 35, 42, "https://www.tensorflow.org"], ["keras", 45, 45, "https://www.keras.io"])
| #[+label-inline Tensorflow / Keras] is the most popular deep learning library.
| spaCy provides efficient and powerful feature extraction functionality,
| that can be used as a pre-process to any deep learning library. You can
| also use Tensorflow and Keras to create spaCy pipeline components, to add
| annotations to the #[code Doc] object.
+infobox
+infobox-logos(["scikitlearn", 90, 44, "http://scikit-learn.org"])
| #[+label-inline scikit-learn] features a number of useful NLP functions,
| especially for solving text classification problems using linear models
| with bag-of-words features. If you know you need exactly that, it might
| be better to use scikit-learn's built-in pipeline directly. However, if
| you want to extract more detailed features, using part-of-speech tags,
| named entity labels, or string transformations, you can use spaCy as a
| pre-process in your classification system. scikit-learn also provides a
| lot of experiment management and evaluation utilities that people use
| alongside spaCy.
+infobox
+infobox-logos(["pytorch", 100, 48, "http://pytorch.org"], ["dynet", 80, 34, "http://dynet.readthedocs.io/"], ["chainer", 80, 43, "http://chainer.org"])
| #[+label-inline PyTorch, DyNet and Chainer] are dynamic neural network
| libraries, which can be much easier to work with for NLP. Outside of
| Google, there's a general shift among NLP researchers to both DyNet and
| Pytorch. spaCy is the front-end of choice for PyTorch's
| #[code torch.text] extension. You can use any of these libraries to
| create spaCy pipeline components, to add annotations to the #[code Doc]
| object.
+infobox
+infobox-logos(["allennlp", 124, 22, "http://allennlp.org"])
| #[+label-inline AllenNLP] is a new library designed to accelerate NLP
| research, by providing a framework that supports modern deep learning
| workflows for cutting-edge language understanding problems. AllenNLP uses
| spaCy as a preprocessing component. You can also use AllenNLP to develop
| spaCy pipeline components, to add annotations to the #[code Doc] object.