Skip to content

samuelvara/LanguageIdentification

Repository files navigation

/Data Collection
1. run create_vocab.py to create the vocabularies of individual languages

/Data Collection and Data Cleaning
2. run gen_cleaned_sentences.py to run generate cleaned sentences from the collected new articles.

/Data Preparation
3. run create_master_dic to append all dictionaries into one.

/Data Preparation
4. run encode_data.py to encode the cleaned sentences into numbers

/Encoding Data into numbers
5. run train_test_split.py to shuffle and create the data for Training and Testing

/Training the model
6. 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages