My NLG Pipeline

  1. Acquisition
    1. Newspaper3k
  2. Cleaning/Parsing
    1. Regex, NLTK, TextBlob, spacy
      1.  http://www.cbs.dtu.dk/courses/27610/regular-expressions-cheat-sheet-v2.pdf
      2. Remove non-alphanumeric character
      3. Convert to lowercase
    2. Spell-check
      1. Standardize spelling (e.g. kewl = cool)
    3. Tokenizers
      1. Words
        1. Remove irrelevant words (e.g. URLs)
      2. Sentences?
    4. NER
      1. Players, teams, etc.
    5. Lemmatization
    6. Part-of-speech tagging?
  3. Representation
    1. One-hot encoding
      1. N-grams
    2. TF-IDF
    3. word2vec, word2tensor
      1. Low explainability
    4. GloVE, CoVe
    5. Dimensionality reduction
      1. LSI, LSA, LDA
  4. Analysis
    1. Keyword extraction
    2. Sentiment
    3. Humanness
    4. Style
    5. Classification
      1. Start with the simplest algorithm (e.g. logistic regression)
  5. Generation
    1. Encoder-Decoder
    2. Char- vs. word-based
    3. seq2seq
    4. Attention
    5. Activation function
    6. Neural networks
      1. CNN
        1. For classification
      2. RNN
        1. LSTM
      3. GAN
  6. Evaluation
    1. Accuracy
      1. Understand the mistakes
    2. Confusion matrix
    3. Most important features (words)
    4. LIME
      1. Black-box explainer
    5. Readability level
    6. BLEU, ERR
  7. Decide
    1. Work on data or more complex model

Topics Of Interest

AI

Natural Language

NLG

Neural Networks

Web Scraping

NLP

Misc

Cloud Solutions