I want to find a relationship between number of characters in a word and the number of such valid words in english and also other languages. I was considering writing a programme to send a string to an online engine and get a response if valid or not valid word ( spell check). Is there any such online programme? Alternatively is it possible to request how many words in a dictionary of length x characters? I know this topic is streaching relevance - but someone might come up with a good suggestion. Regards.
Hey @Clem73188
Something simple 
You might be looking for this popular python library.
https://pyenchant.github.io/pyenchant/
# from the docs
import enchant
d = enchant.Dict("en_US")
d.check("Hello")
Use case matters.
I think here you asking for the ratio between all possible combinations of characters of some length and the number of those that are words.
the_kings_english : every_possible_combination_of_A-Z_with_repetition
If that’s all you want you can calculate total number of words using permutation with repetition formulas.
For example : how many four letter strings can you make with the 26 chars of the Latin Alphabet.
26p4 = 26^4 = 456,976
450,000 checks is nothing for python, even on a raspberry pi.
However… note that all combinations less than 10chars is
That’s… not beyond python.
It will be hard to do.
If I had to brute for this, lord have mercy, I would use Julia.
Julia was born for this.
I would download this massive dataset of all the words. Then I’d make sure I have a sneaky bit of free ram and start running contains() (which is Julia’s heavy weight champ for this kind of thing).
checkMe : String = "cupcake"
data = CSV.read("englishwordsA-G.csv")
any(contains(checkMe),data[:1])
I have not checked the julia forums, but I almost guarantee you some poor and lonely English Literature student has asked for this and the some math nerd has delivered a beautiful one-liner for you to copy and paste. (And they probably posted the results too).
Counting words
This is the way I recon. Just grab that massive dataset of words I mentioned above and pandas will munch it up no problemo.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
# Untested... I'm just going off memory here.
import pandas as pd
df = pd.read_csv('all_the_words.csv')
df['num_char'] = df.index.str.len()
df.groupby('num_char').count().describe() #I think this is right
Anyway… that was fun 
Thanks for giving me something to think about during boxing day test match ads
Pix
Hi Pixmusix, thank you for your post - I knew there had to be someone out there wanting something to think about during adds! Thanks for the link to the word library. I did not understand your code, but I could write some code on the Pi to read in these files and extract all legitimate words and then process for number in each character length bin. Would there be a similar library for other languages like russian, greek, german, etc? I am doing a bit of a study on intelegent paterns in random noise. My thought is that the longer a code sequence is the less useful codes compared to all the possible codes of same length is and this is universal. It would be interesting to compare diferent languages. Someone said given protein sequences of amino acids there is about 1 in 1E+70 useful proteins compared to possible combinations of same length. Proteins are usually about 200 to 300 amino acids long. Regards.
Enchant can do other languages. Hit the docs for more info
Oh yeah i think i get what your going for.
If your search doesnt have to be exhaustive you can use whatever tools and languages you want.
For instance its easy to run 10,000,000 checks per bin size and then form a statistical argument. Anything better than a standard deviation of 3.5 would satisfy most.
The numbers your talking about are hefty bois, even for computers. If your search has to be exhaustive, quality of your algorithm and the speed of your language does matter.