Natural Language Processing - Tokenization (NLP Zero to Hero, part 1) - Lake Harding Association

Natural Language Processing – Tokenization (NLP Zero to Hero, part 1)

31 Comments found

User

MASTER FAZE SAITAMA

This is so complicated

Reply
User

Vinay

Very Difficult to learn

Reply
User

Vibhooti Kishor

Cool

Reply
User

Nishal K

Thanks for making it clear waiting for the next one

Reply
User

ashim karki

The legend is back

Reply
User

Chowa C

I've always been discouraged learning NLP ..But you've just made it a whole lot easier

Reply
User

Sébastien MEYER

What are the advantages of using TF framework instead of other preprocessing method such as thoose spacy or nltk provides for example ? 🙂 thank you

Reply
User

Sébastien MEYER

Should we keep only nouns when topic modelling ? I am quite new with NLP and it seems there is no clear universal thumb rule for extracting topics information, what would you advise ?

Reply
User

Balach Khan

Its great. Waiting for the next.

Reply
User

Alexander Pohl

04:15 is really misleading for anyone watching this as their entry to nlp. There are too many steps missing that need to be talked about it in a 'Zero to Hero' tutorial series after this point, instead of jumping into sequenzing. Even steps before this point.
I see why these aren't included (because these are not included in tensorflow). But at the same time, this is just setting an unrealistic standard.
In machine learning terms, I'd say… This video is just mislabeled

Reply
User

Patrick Jähne

Great introduction which is easy to understand. Can't wait for the next videos of this series!
But is there any way to group words ignoring some grammar? Like: "He plays piano – I play piano" where "plays" != "play", but it basically is the same word and tempus.
The part of ignoring the "!" in "dog!" is fascinating.

Reply
User

Akshay Shah

Yeah. Zero to hero back

Reply
User

Siddhartha Boppana

Too late to the party Tensorflow!! It’s not 2010. Love the video though, thanks😎

Reply
User

Danylo Baibak

A question of ignoring the "!". It seems, the Tokenizer doesn't include "!" because it was filtered as punctuation. Let's assume, that we want to use punctuation and set `filters=''` for Tokenizer. In this case, Tokenizer is not smart enough to separate the token "dog" from the token "!"

Here's the example in Colab https://colab.research.google.com/drive/1M6Nf-WQxorf_X9z2jFnCSJ_QjrY3i5BJ

Reply
User

Carlos Segura

Love Tokenizers ❤️

Reply
User

Deepak Dakhore

Very nice

Reply
User

Atang Motloli

Thanks. Where is the next episode?

Reply
User

Learn With Milind

How many languages are supported? Or only English is supporting.

Reply
User

renderdreality

Does NLP only process english? Could it do another language? My question is really if it could be used to learn a different language as basis and go from there.

Reply
User

Vikrant Singh

For people who want to work deep in tokens, tokens are made using Regex, so that is why dog! came up as a dog .

Reply
User

Harish

Great video 👍

Reply
User

Albin Bajramovic

The colab is labeld as Course 3 – Week 1 – Lesson 1.ipynb – where can I sign up for the full course? Thank you!

Reply
User

Atul Kumar A

I really like the way you explain from level 0 – hero. It's an art!

Reply
User

RS

Is the link to Part 2: Sequencing – Turning Sentences into Data available?

Reply
User

Ken MacDonald

Can't install tensorflow because it doesn't work with python 3.8🙄

Reply
User

Matty Mallz

Fantastic video! Very informative. Thank you for sharing TensorFlow!

Reply
User

Oliver Li

Can this be called hacked? Or are there reasons that Keras doesn't include this? (Notice: "'you're", the left quote is still there, and it's got "'": 11 recognized as a word. and num_words=4 doesn't really limit the word count down to 4.)

from tensorflow.keras.preprocessing.text import Tokenizer

sentences = [
    'i love my dog',
    'I, love my cat',
    'You love my dog!',
    "Jack said, 'You're gonna love my cat!'"
]

tokenizer = Tokenizer(num_words = 4)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index)

{'love': 1, 'my': 2, 'i': 3, 'dog': 4, 'cat': 5, 'you': 6, 'jack': 7, 'said': 8, "'you're": 9, 'gonna': 10, "'": 11}

Reply
User

Mohamed Zaroug

🤩😍😍🤩 Very informative, waiting for the rest

Reply
User

sujeesh s valath

How to detect difference between "I love my dog" and "l love not my dog"?

Reply
User

刘新新

Anyone can tell me what is first princeple method teach

Reply
User

Srikrithi Bharadwaj

Thank u so much 🙏🏻 such a great information.

Reply

Add Comment

Your email address will not be published. Required fields are marked *