Quickstart¶

Loading embeddings¶

Embeddings are loaded as follows:

import snakefusion

# Loading embeddings in finalfusion format
embeds = snakefusion.Embeddings("myembeddings.fifu")

# Or if you want to memory-map the embedding matrix:
embeds = snakefusion.Embeddings("myembeddings.fifu", mmap=True)

# fastText format
embeds = snakefusion.Embeddings.read_fasttext("myembeddings.bin")

# floret format
embeds = snakefusion.Embeddings.read_floret_text("myembeddings.floret")

# word2vec format
embeds = snakefusion.Embeddings.read_word2vec("myembeddings.w2v")

Queries¶

With a set of embeddings loaded, you can look up an embedding or perform similarity/analogy queries:

# Look up the embedding for 'Tübingen'
embed = embeds.embedding("Tübingen")

# Similarity query for "Tübingen"
embeds.word_similarity("Tübingen")

# Similarity query based on a vector, returning the closest embedding to
# the input vector, skipping "Tübingen".
embeds.embedding_similarity(embed, skip={"Tübingen"})

# Default analogy query (Berlin is to Germany as Amsterdam is to ...)
embeds.analogy("Berlin", "Deutschland", "Amsterdam")

# Analogy query allowing "Deutschland" as answer.
embeds.analogy("Berlin", "Deutschland", "Amsterdam", mask=(True,False,True))

Low-level data structures¶

If you want to operate directly on the full embedding matrix, you can get a copy of this matrix through:

# get copy of embedding matrix, changes to this won't touch the original matrix
embeds.storage.matrix_copy()

You can also use the vocabulary directly:

vocab = embeds.vocab

# get a list of indices associated with "Tübingen"
vocab.["Tübingen"]

# get a list of `(ngram, index)` tuples for "Tübingen"
vocab.ngram_indices("Tübingen")

# get a list of subword indices for "Tübingen"
v.subword_indices("Tübingen")

More usage examples can be found in the [examples](https://github.com/finalfusion/finalfusion-python/tree/master/examples) directory.