Corpus Language setting : check and confirm the language of your corpus
Total tokens (how large is your corpus), unique types (unique words), TTR = lexical diversity count
See if your XML markup is indexed correctly
Size of your corpus segments
Wordlist (downloadable)
Confirm your tagset definition. This is optional but AI will usually work better if you define the tagset. you can do this by
manual edit
Ask AI to guess then edit
Upload from an excel file (tag, then definition columns)
or direct input (tag tab column per line)
Segment your corpus by topic (sport, religion, science, or your self-defined topic) and sentiment (negative, positive, neutral)
BERTopic allows you to define your own topic from a group of keywords, which can be used in Restricted query