The number of units basically denote the size (length) of the internal vector states, h and c of the LSTM and it is a hyperparameter , Right ??
But do they have any kind of dependency on input sequence length we provide during training ?
Is there any way to determine what would be an optimal value for no of units ?
Suppose in my input data, no of timesteps = maxlen of any sequence = 35, and now I take a very small value say 10 as my number of units and consider another scenario where I take a large value say 512 , how will the information flow will be affected in both the scenarios and how will these values eventually affect my model performance (apart from computational complexity) ??
Taking values like 128, 256 does it mean that we are allowing higher level or basically more detailed information to flow through LSTM cells ?