-
Notifications
You must be signed in to change notification settings - Fork 504
enable flash attn by default #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This will continue to degrade the performance of other non-Nvidia platforms, so why enable it? |
Do you have some numbers for your platform(s) ? |
|
You save a tiny bit of memory. But it's more than 50% slower (22s > 33s, Z Image Turbo) on any AMD hardware I've tested, and that's a well-known fact. |
It depends on the card. My own 7600 XT gets slower on Vulkan (9.08s/it with FA and 7.88s/it without, for a 1024x1024 8-step gen), but much faster on ROCm (5.65s/it versus 10.32s/it). The memory savings may justify leaving it on by default, though (-1.7G on ROCm, -3.9G on Vulkan). @leejet , perhaps we could standardize the boolean flags, to make general usage easier, and avoid changing them when a default changes? We could e.g. consistently adopt a 'no-' prefix to turn the flags off, and always accept both forms (so in this case |
|
About the performance differences, maybe it could be a good idea to create a discussion to keep track of the performance in all the different hardware configurations (just like in llama.cpp). |
This would be great. It would be nice if we could get a set of people with varying hardware and software configurations to come back and add the numbers for the same model/settings to their table. It would also be nice to have a more dedicated benchmark tool/mode. Maybe with dummy models or something. |
To be honest, there are already quite a lot of command-line options, and adding this change would only make things more redundant. Moreover, for users who are not familiar with them, they will still need to check the documentation to figure out the option names, so it wouldn’t really save much time. |
No description provided.