Jump to content
Sign in to follow this  
Dwarden

Custom Memory Allocator for engine since b85869

Recommended Posts

Tested only with Shadows off. Sorry for the misleading Text.

Benchmark Post updated.

Edited by Humvee28

Share this post


Link to post
Share on other sites

I think i will use windows allocator,it's giving me the best performance and is stable.

With TBB4 im getting sometimes few seconds of bad fps then it comes back to normal... and the overall performance is not convincing.

Any suggestions or hints ? i'm interested in using these allocators

Share this post


Link to post
Share on other sites

I had a go at playing the Harvest Red mission yesterday (Chernogorsk), using the latest beta and the TBB4 malloc... and was completely blown away. Performance is infinitely better than the last time I tried this particular mission (couple of months ago), and I am finally able to increase model detail to high with negligable FPS loss. Framerates in the mission are now consistently above 30 (usually around 40-50) and even when they do drop to the low 30s it still feels smooth.

Prior to these betas, my performance in this particular mission left a lot to be desired, with framerates often dropping into the 20s and feeling quite stuttery.

Share this post


Link to post
Share on other sites

Sweet - as general performance improvement indication - but I think it's not a fair comparison to compare to results from months ago in light of memory allocator.

At least IMO it says nothing about the memory allocator, just that between build X (months ago) and build Y (current build) improvements have been made,

which could also be system/setting etc related?

I suppose for memory allocator benching it would be great to compare various memory allocators, and compare with Beta version from before the adjusted memory allocators.

Edited by Sickboy

Share this post


Link to post
Share on other sites

Note: the allocators are now available with full source code in the Community Wiki, each with a corresponding license. Beware: The allocators other than TBB 4 were not updated for quite some time and would perhaps benefit in bringing upto a more recent version. If anyone wants to do this, I recommend to first compare each version provided by us against the version it is based upon to see what customizations and fixes were made in it by us, as it is possible they will need to be recreated in the recent version as well.

Share this post


Link to post
Share on other sites
Sweet - as general performance improvement indication - but I think it's not a fair comparison to compare to results from months ago in light of memory allocator.

At least IMO it says nothing about the memory allocator, just that between build X (months ago) and build Y (current build) improvements have been made,

which could also be system/setting etc related?

I suppose for memory allocator benching it would be great to compare various memory allocators, and compare with Beta version from before the adjusted memory allocators.

My post was really just meant as a subjective comment on the overall performance improvements I'm seeing, not comparing any memory allocators as such. :)

I'll leave the scientific testing until someone defines a good process to follow (ideally automated). ;)

Share this post


Link to post
Share on other sites
Sweet - as general performance improvement indication - but I think it's not a fair comparison to compare to results from months ago in light of memory allocator.

At least IMO it says nothing about the memory allocator, just that between build X (months ago) and build Y (current build) improvements have been made,

which could also be system/setting etc related?

I suppose for memory allocator benching it would be great to compare various memory allocators, and compare with Beta version from before the adjusted memory allocators.

It seems that ArmA is benefitting from the TBB4 the most under conditions, where lots of memory is required. My experience with TBB4 in Chernarus Warfare is similar. Difference between TBB3 and TBB4 is around 50 plus percent with the latter. Similar scenarios in Takistan don't benefit that much. Memory usage in Takistan is usally at 1.2G while in Chernarus it goes usually up to 1.5G very fast and further up to 1.8G. So performance gain seems correlated to memory requirement; which is not that surprising cause we are talking Memory Allocator here. Also stability is pretty good, no crashes here for a while. Thumbs up!

System is i7 920 | X58 Chipset | 6G 1600Mhz RAM | Samsung SSD | AMD 5870 GPU

Share this post


Link to post
Share on other sites
I'll leave the scientific testing until someone defines a good process to follow (ideally automated). ;)
What about kju's benchmark suite, that seems to be excellent and already used for this purpose :D

Share this post


Link to post
Share on other sites
What about kju's benchmark suite, that seems to be excellent and already used for this purpose :D

I wasn't aware that such a thing existed, but I guess I'll go look for it and try it out when I get the chance. :)

Share this post


Link to post
Share on other sites

Another influencing Factor for this could be the different Technologies of Memory Management.

Remember that the old CPU´s (like mine) got an external Memory Management via Northbrigde,

while the new CPU´s (i-series) got it integrated on the Die. :)

Share this post


Link to post
Share on other sites
I wasn't aware that such a thing existe
Shortcomings of forum threads in general and thread starters (they can only edit first post) not being thorough enough i'd say :D

(e.g include recommend methods of benchmark etc)

Edited by Sickboy

Share this post


Link to post
Share on other sites

I built tcmalloc_bi and got a out of mem. error in Arma. Anyway it was quite easy creating the dll using Visual Studio 2010

- the project built a dll file with a size of 184 kb - larger than the TTB versions.

Maybe I can set some optimization options in VS - though I'd have too look into that.

Anyway here's an excerpt of the arma2oa.RPT

== E:\Games\Bohemia Interactive\Expansion\beta\arma2oa.exe

== "E:\Games\Bohemia Interactive\Expansion\beta\arma2oa.exe" -nosplash -skipintro -cpucount=12 "-mod=expansion\beta;expansion\beta\expansion -malloc=TCMalloc_bi

=====================================================================

Exe timestamp: 2011/10/31 16:31:28

Current time: 2011/11/01 17:21:03

Version 1.59.85889

Allocator: E:\Games\Bohemia Interactive\Expansion\beta\dll\tcmalloc_bi.dll

Item str_disp_server_control listed twice

Cannot register unknown string STR_VERY_LARGE

...

...

Virtual memory total 4095 MB (4294836224 B)

Virtual memory free 2951 MB (3095097344 B)

Physical memory free 16251 MB (17041092608 B)

Page file free 15880 MB (16652148736 B)

Process working set 719 MB (754245632 B)

Process page file used 755 MB (792195072 B)

Longest free VM region: 2146865152 B

VM busy 1217036288 B (reserved 342503424 B, committed 874532864 B, mapped 47562752 B), free 3077799936 B

Small mapped regions: 8, size 36864 B

ErrorMessage: Out of memory (requested 3 KB).

footprint 408420352 KB.

pages 16384 KB.

... A lot of these errors listed as well

Link to 99c702d4 (Obj-224,206:724) not released

Link to 9966f292 (Obj-222,203:658) not released

I read in another post that tmalloc had been used previously.

My experience so far looks like it's neck and neck between TTB v3 and v4...

Will try to build some of the other allocators from source given and test.

(JE Malloc from VS 2010 generates a dll around 454 kb...located in debug folder....maybe I have to set some VS options to strip it down....anyway this is fun)

---------- Post added at 06:52 PM ---------- Previous post was at 05:54 PM ----------

I got this out put from VS 2010:

1>cl : Command line error D8016: '/ZI' and '/GL' command-line options are incompatible

Looks like the compiler CL.EXE command get's the options supplied by some of my default VS settings ?!

/c /ZI /nologo /W3 /WX- /Od /Oy- /GL /D WIN32 /D _DEBUG /D _WINDOWS /D _USRDLL /D NEDMALLOC_BI_EXPORTS /D _WINDLL /D _UNICODE /D UNICODE /Gm /RTC1 /MTd /GS /arch:SSE /fp:fast /Zc:wchar_t /Zc:forScope /Fo"Debug\\" /Fd"Debug\vc100.pdb" /Gd /TP /analyze- /errorReport:prompt

/GL Enables whole program optimization

/ZI Includes debug information in a program database compatible with Edit and Continue

Found the /ZI option and changed it to /Zi Generates complete debugging information

And now the project build completes - dll size is 383 KB

BTW: Was looking at the export section that BI already made (Is to be found in all the sources given from http://http://community.bistudio.com/wiki/ArmA_2:_Custom_Memory_Allocator)

extern "C" {
DLL_EXPORT size_t __stdcall MemTotalReserved() {return nedmalloc::VirtualReserved;}
DLL_EXPORT size_t __stdcall MemTotalCommitted() {return nedmalloc::VirtualReserved;}
DLL_EXPORT size_t __stdcall MemFlushCache(size_t size) {size_t before = nedmalloc::VirtualReserved;nedalloc::nedmalloc_trim(0);return before-nedmalloc::VirtualReserved;}
DLL_EXPORT void __stdcall MemFlushCacheAll() {nedalloc::nedmalloc_trim(0);}
DLL_EXPORT size_t __stdcall MemSize(void *mem) {int isforeign;return nedalloc::nedblksize(&isforeign,mem);}
DLL_EXPORT void *__stdcall MemAlloc(size_t size) {return nedalloc::nedmalloc(size);}
DLL_EXPORT void __stdcall MemFree(void *mem) {nedalloc::nedfree(mem);}
// DLL_EXPORT __stdcall void *MemResize(void *mem, size_t size) {return moz_expand(mem,size);} // TODO: consider implementing expand?

This is a nice hint for those that want's to roll their own implementation -

Using BI's modified project sources I'm pretty sure that given some time it would be possible to check what to look for and possibly modify in a "3rd" party malloc implementation.

So maybe the Hoard is coming in over the horizon...

Edited by DBGB
Added line -fixed some typos

Share this post


Link to post
Share on other sites
(JE Malloc from VS 2010 generates a dll around 454 kb...located in debug folder....maybe I have to set some VS options to strip it down....anyway this is fun)

You should be building a release version, not a debug build. :)

Right click on the project in VS and select "configuration manager", then switch to the release configuration. Then build.

Share this post


Link to post
Share on other sites
You should be building a release version, not a debug build. :)

Right click on the project in VS and select "configuration manager", then switch to the release configuration. Then build.

JEMalloc_bi dll size reduced to 58 KB from 484 KB

TCMalloc_bi dll only 36 KB from 184 KB

NedMalloc_bi is down from 383 to 80 KB

Thx ;-)

(Have no clue if the above is super optimized... or if I can set some other options I don't know about yet)

Wonder if the debug version contained all kinds of debug symbols and other not used struff - that impacted the performance I saw when testing the different 'debug' builds.

BTW: SW License wise - is it legal to distribute the above DLL's without the source / (+with or w/o VS project files) - like for instance somebody can't figure out how to build in VS or anywhere else - can I compile the DLL and send it / post it somewhere without risking violating the SW license for the given source code (GPL vs Booster license vs etc)?

---------- Post added at 10:57 PM ---------- Previous post was at 10:37 PM ----------

Haven't tested the release builds of my malloc builds from previous post...

Got curious when I saw that BI also had provided the source code for TBB4 - That code is obviously different from what's available here :

http://threadingbuildingblocks.org/ver.php?fid=174

But could provide some insight on how to modify / export the functions from other malloc implementations when adapting it to the interface described in BI's malloc wiki.

I wonder if BI's implementation is from the latest code commit from threadingbuildingblocks.org since there is differences.

I guess the intel http web download link could be old - maybe somewhere there is a newer repository (subversion/github link please) :-)

Well - I'm going to try to build from the TBB site. I recommend using something like BeyondCompare to adapt/modify source when doing this in windows - and you know almost nothing about programming...

Found this post "How does TBB load balance between muti-cores" on a TBB forum : http://software.intel.com/en-us/forums/showthread.php?t=86049&o=a&s=lr Reads to me that there's better NUMA awareness using the QuickThreading paradigm - Comparison link between TTB and QT http://www.quickthreadprogramming.com/Comparative%20analysis%20between%20QuickThread%20and%20Intel%20Threading%20Building%20Blocks%20009.htm

Edited by DBGB
Added link ;-)

Share this post


Link to post
Share on other sites

The available memory allocator sources are now available on GitHub!

https://github.com/sickboy/bis-memory_allocators

Feel free to fork, share, create pull requests for applying improvements, etc.

If there are improvements and license permits it, BI could be interested in including the allocator with the game.

Information:

Recommended Windows Clients

Will add to BIKI shortly.

Edited by Sickboy

Share this post


Link to post
Share on other sites

I won't have time to look into messing around with any 'new' malloc implementations during the weekend. I'm traveling from tomorrow but...

But one hint regarding the game engines interface.

It looks like TTB3 which was mentioned as being used as the default memory allocator in the engine - the interface specification from the BI wiki is taken directly from tbbmalloc.ccp - line 216 to 223

#ifdef _WIN32

#define DLL_EXPORT __declspec(dllexport)
extern "C" {
 DLL_EXPORT size_t __stdcall MemTotalCommitted() {return scalable_footprint();}
 DLL_EXPORT size_t __stdcall MemTotalReserved() {return scalable_footprint();}
 DLL_EXPORT size_t __stdcall MemFlushCache(size_t size) {return scalable_trim(size);}
 DLL_EXPORT void __stdcall  MemFlushCacheAll() {scalable_trim((size_t)-1);}
 DLL_EXPORT size_t __stdcall MemSize(void *mem) {return scalable_msize(mem);}
 DLL_EXPORT void * __stdcall MemAlloc(size_t size) {return scalable_malloc(size);}
 DLL_EXPORT void  __stdcall MemFree(void *mem) {scalable_free(mem);}

So basically from my perspective it's necessary to figure out for MemTotalCommitted() what type is returned and what argument's (pointer/object/struct ref) the function accepts....

  DLL_EXPORT size_t __stdcall MemTotalCommitted() {return scalable_footprint();}

Points to scalable_footprint() - which again looks like it's 'templated' and really the 'overlloaded?? function internal_footprint that is really MappedMemory

So it's a bit tricky to me figuring out atm how I can convert 'others' malloc function calls to the TBB3 interface.

I know I need to look up TTB3 it seems - then go figure out if my "custom malloc" implementation have a single or several functions combined that does what TTB3 does - figure out how I can make a call to that/these functions and what kind of data they return and maybe typecast them into something that TTB3 accepts.

But nevertheless it's fun to look into - I got some colleagues at work who have given me some directions - although they didn't really understand my motivation for creating a 'custom' dll.

Hope that community will start to look into this as well...

I definitely need to read up on TTB3 (is that pthreads ?) tonight...

Share this post


Link to post
Share on other sites
It looks like TTB3 which was mentioned as being used as the default memory allocator in the engine - the interface specification from the BI wiki is taken directly from tbbmalloc.ccp - line 216 to 223

Such interface does not exist in TBB, this is the part which was modified by us. Remember the TBB3 sources are modified, if you want to check their original state, you need to download directly from the TBB site (link also provided in the Community Wiki).

As for scalable_footprint, internal_footprint and MappedMemory, those are also our additions.

Share this post


Link to post
Share on other sites

Suma you're right

Noob error, I hadn't downloaded the source files from TBB3 = tbb30_20110427oss_src.tgz only tbb30_20110427oss_win

So I only searched in BI's TBB3_source dir src - missed that this dir was missing from tbb30_20110427oss_win...

Now I can see the modifications...thx

Share this post


Link to post
Share on other sites
The available memory allocator sources are now available on GitHub!

https://github.com/sickboy/bis-memory_allocators

Feel free to fork, share, create pull requests for applying improvements, etc.

If there are improvements and license permits it, BI could be interested in including the allocator with the game.

Information:

Recommended Windows Clients

Will add to BIKI shortly.

Recreated the repo with proper history by using the correct source file versions, and applied the changes now step by step.

Original -> BI diffs

Share this post


Link to post
Share on other sites

Academic Discussion

NUMA_aware_heap_memory_manager_article_final.pdf

Source code based on google-perftools-0.97 code is provided in the pdf.s second last page incl. diff

Source code link provided here as well: http://developer.amd.com/Assets/NUMA-aware%20TCMalloc.zip

Update:

I wrote a link to a comparison between TBB4 an Quickthreading in a previous post here : http://forums.bistudio.com/showpost.php?p=2049014&postcount=66 - Quickthreading is apparently building on what's described here: New NUMA Support with Windows Server 2008 R2 and Windows 7

Some MSDN example code is given here : Win7NumaSamples.zip

Edited by DBGB
Added detail about quickthreading and msdn arhcive

Share this post


Link to post
Share on other sites

Benchmark Results (E08, Beta 85876, no Mods)

Allocator -- 1st Run -- 2nd Run -- Comments (for Run 1 of 2)

tbb4 -------- 36 ------- 36 ------ smooth

tbb3 -------- 36 ------- 35 ------ slight Object Plopping

Windows ---- 35 ------- 35 ------ more Object Plopping

-------------------------------------------------------------------------

Benchmark Results (E08, Beta 85889, No Mods)

Same Results as with 85876

-------------------------------------------------------------------------

Benchmark Results (E08, Beta 86055, No Mods)

Allocator -- 1st Run -- 2nd Run -- Comments (for Run 1 of 2)

tbb4 -------- 43 ------- 43 ------ smooth

tbb3 -------- 43 ------- 43 ------ slight Object / LOD Plopping

jemalloc ----- 42 ------- 42 ------ s.a.

nedmalloc --- 41 ------- 42 ------ s.a.

tcmalloc ---- 42 -------- X ------- Out of Memory Error in 2nd Run

Windows ---- 42 ------- 42 ------ more Object / LOD Plopping

-----------------------------------------------------------------------

Testing Environment :

Sys :

OS : Win7-64 Home Premium

CPU : C2Q Q9650 @ 4Ghz (FSB 445)

GPU : ASUS HD 5870 @ Stock Clocks

RAM : 4 GB Mushkin 996599 @ 890 Mhz ( FSB - RAM 1:1) @ 5-5-5-15

MB : DFI LP DK P45-T2RS

HDD : 2 x Seagate Barracuda 500GB @ RAID 0

PSU : Silverstone Strider Plus 750W

Drivers : all the latest (GPU, Chipset etc.)

Ingame Settings :

VD : 2000

Res : 1920 x 1080

others : all very high except VRAM (Default), AA (Normal), PP (low). Vsync on

Config : AToC = 0, GPU Rendered & Detected Frames = 1

Personal Conclusion :

Dunno what´s going on, but Beta 86055 gives me better Performance than 85876 and 85889.

Nothing has been changed on my Sys. For the best overall Apperance i have to retest with the two "tbb" Mallocs.

The new Mallocs doesn´t perform very well here. Tcmalloc caused CTD with Out of Memory Error (RPT and .bidmp sent to Dwarden).

Share this post


Link to post
Share on other sites

Tested this new allocators (Q6600 @ 3.65 ghz,GTX285 ,4 GO RAM,WD Caviar),on benchmark E08.

My graphic settings in order are set to:

(Normal,Veryhigh,High,Disabled,Very low,Very low,High,normal,Disabled)

Tbb4 = 59 FPS

Ned malloc =62 FPS

Jemalloc = 56 FPS

Tcmalloc = 55 FPS

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×