Google has unveiled TensorFlow.Text, a library for preprocessing language models using TensorFlow, the company's open-source machine-learning (ML) framework.
"TensorFlow provides a wide breadth of ops that greatly aid in building models from images and video. However, there are many models that begin with text and the language models built from these require some preprocessing before the text can be fed into the model," explained Robby Neale, software engineer at TensorFlow.
"TF.Text… is designed to ease this problem by providing ops to handle the preprocessing regularly found in text-based models, and other features useful for language modeling not provided by core TensorFlow," he added.
With TF.Text users can utilize tokens to break apart and analyze text like words, numbers and punctuation. It can recognize white space, Unicode script and predetermined sequences of word fragments, or "wordpieces" such as suffixes or prefixes, a technique it has used before in program's such as its pretraining technique for language models, BERT. The library can be installed using PIP.
The introduction of TF.Text follows just days after the beta release of TensorFlow 2.0 which uses fewer APIs, deeper Keras integration and improvements to runtime for Eager Execution.
TF.Text is not the first dedicated library utilizing ML that Google has introduced of late. Last month, the company released TensorFlow Graphics in an effort to bring more deep learning to graphics and 3D models.