Jump to content
Sign in to follow this  
Dwarden

Custom Memory Allocator for engine since b85869

Recommended Posts

Since Arma 2 Operation Arrowhead build 85869 (1.60 beta) it is possible to provide custom memory allocators for the game.

The memory allocator is a very important component, which significantly affects both performance an stability of the game.

The purpose of this customization is to allow the allocator to be developed independently on the application,

allowing both Bohemia Interactive and community to fix bugs and improve performance without having to modify the core game files.

READ more @ BIKI: http://community.bistudio.com/wiki/ArmA_2:_Custom_Memory_Allocator

this is followup to memalloc testing in earlier betas http://forums.bistudio.com/showthread.php?t=121455

note: this thread might be moved elsewhere when the beta is over, atm. feel free to discuss this news here ...

Share this post


Link to post
Share on other sites

Uhm, yeah, interesting. Would you mind to translate this into a "programming for dummies" language?

Don't get me wrong, really appreciate every improvement. It's just, it tells me nothing if and how i would/could benefit from that. Blame it on my stupidity. :D

Share this post


Link to post
Share on other sites

i suggest read this http://en.wikipedia.org/wiki/Memory_management

and some other materials about memory allocators

advantages of this approach

- You may write own allocator for engine

- You may alter existing allocators for the engine and update them anytime You see fit

another plus, ability to use allocators which aren't used by us for various reasons

(e.g. impose licencing or rules we can't adopt, while they support free usage for home users , too complicated and so on)

list of some allocators for experimenting:

HOARD: http://plasma.cs.umass.edu/emery/hoard ( http://plasma.cs.umass.edu/emery/licensing-hoard )

Edited by Dwarden

Share this post


Link to post
Share on other sites

Nice... I guess...

This would really be a firstmover thing AFAIBelieve.

If the enduser could supply a commandline argument that would enable use of 'external' GPL/Commercial malloc implementations optimized for his/her's specific CPU/NUMA/RAM environment/topology.

So basically I dream BI could enable that the coreengine could hook into a whatever 'malloc' implementation the enduser wanted to use...

Is this the idea ?

[Would be nice if the community or BI could supply a script that would run through a benchmark that would help the enduser choose the malloc giving the best performance]

Edited by DBGB
Update:

Share this post


Link to post
Share on other sites
Nice... I guess...

This would really be a firstmover thing AFAIBelieve.

If the enduser could supply a commandline argument that would enable use of 'external' GPL/Commercial malloc implementations optimized for his/her's specific CPU/NUMA/RAM environment/topology.

So basically I dream BI could enable that the coreengine could hook into a whatever 'malloc' implementation the enduser wanted to use...

Is this the idea ?

[Would be nice if the community or BI could supply a script that would run through a benchmark that would help the enduser choose the malloc giving the best performance]

it's already possible, check Yourself todays beta build :)

read the BIKI page

by default You can use either windows memalloc (erase all other allocators DLL from \dll\ directory)

or choose use Intel TBB 3.0 or Intel TBB 4.0 allocators (which are included)

Share this post


Link to post
Share on other sites
Myke;2045002']Would you mind to translate this into a "programming for dummies" language?

Just think you need to fill or clean a room with solid objects.

And when you put something in the room, you can't move it later.

And you need to fill up using every little space of air.

And when you remove lot of little things, you may need to find space for a huge thing.

You may invent ideas like put big things on the right and little on the left.

Or put things that are big "2" near 2 things that are big "1" each, thightly packed.

Or put everything everywhere without care, starting from the door.

Or put blue stuff on the floor and red stuff hanging from the ceiling.

Or put important things on the front and never-used stuff on the back.

Or lay things aligned on black tiles and other on white ones.

Important is, when you need to put in the room that big object that suddendly comes, to have enough space free. And that fit its shape.

The room is your RAM, the objects are packs of bytes, the ideas are the allocators.

Share this post


Link to post
Share on other sites

Browsed through the links presented above - (interesting Hoard results).

Seems some testing is imminent with this beta.

I wonder if the Intel implementation (I have a quad-socket quad-core AMD Opteron setup in "Numa" mode) will invoke some kind of artificial throttling - or 'miss' some optimizations based upon CPU architecture.... (

Intel compiler controversy: http://www.agner.org/optimize/blog/read.php?i=49

That's why it would be NiceToHave some kind of malloc plugin benchmark interface that would help the casual user to decide the best malloc option to use (maybe even provide a compile option for 'custom' implementations)

Edited by DBGB
Updated with a link:

Share this post


Link to post
Share on other sites

The whole thing is a bit over my head but I'm hopeful that more skilled individuals will make good use of this. :D

Share this post


Link to post
Share on other sites

Just tested with the winhoard.dll x64 downloaded from here : http://plasma.cs.umass.edu/emery/download-hoard

I just made a backup of the dll folder (with tbb3malloc_bi and tbb4malloc_bi) only kept the winhoard.dll file there.

Is this the correct way to do test - or do I still need to specify the malloc option in the command line ? Please confirm.

Anyway - I tested Benchmark 2 (on chenaurus)

and got a "to many virtual blocks allocated" error.

It should be noted that I had this in my GFX options

.ArmA2OAProfile

version=2;

blood=1;

singleVoice=0;

shadingQuality=100;

shadowQuality=4;

maxSamplesPlayed=80;

anisoFilter=4;

TexQuality=3;

TexMemory=4;

...

sceneComplexity=1000000;

viewDistance=10000.001;

terrainGrid=6.25;

And this as my commandline:

Bohemia Interactive\Expansion\beta\arma2oa.exe" -nosplash -skipintro -cpucount=12 "-mod=expansion\beta;expansion\beta\expansion;@CBA;@ACE;@ACEX;@ACEX_USNavy;@ACEX_SM;@ACEX_RU

My experience: LOOOOOL - everything was less than one frame pr. second in the beginning but later the frames and the sounds of shots began to come in sync - so it sounded like a drummer on a slave galley that gradually increased his BPM - as the amount of objects shown/calculated in scene became less and less.... I think I watched the benchmark for a few minutes...thinking daaaamn...this is slowmo...but getting better...and better...and...

.....WHAM CTD with this new wonderfull error message....

I'm gonna test with the other mallocs and 'regular' no ACE commandline.

And maybe a bit less ambitious viewdistance ;-)

2nd update: Funny thing happended when testing tbb4malloc_bi.dll

Note: I didn't change any gfx/cmd options -

The benchmark initially ran just as slow as the winhoard.dll but gradually the gunshots/shell shots began to get in sync with the framerate and get faster and faster untill it actually got into something that felt like a few frames per second...

Now the funny part....This benchmark run didn't crash - it also never ended... I alt-tabbed out to write this.

The camera just stops sometime after flying over the control tower and the/ some AI shilkas goes crazy on the flying targets which sometimes circles into view (or stays out of the scene, who to tell).

This is probably related to other beta (trigger) changes... but definitely a difference between the two malloc dll's so far.

3rd update

Ahh the benchmark is about to end... I was just impatient... the screen is fading to black....I'm waiting for the FPS score....will alt tab back again in a minute...10 maybe to tell result ;-) Nevermind...it must be less than 1 FPS..

Will now test the default malloc (empty dll folder)....1..2...3

4th update:

Default malloc (empty dll folder) - crashes to desktop with the same "to many virtual blocks allocated" error. Only seems to get about half into the benchmark

5th update:

Reset GFX to default in options (VD=2400) and used winhoard.dll - no crash but still benchmarks never shows FPS / ends - will now try without ACE cmdline.

Haven't paid attention to any cpu core affinity issues - but let the engine use 12 out of 16 cores on my rig in every benchmark.

6th update: Wooohooo got 8 FPS with VD=2400 and winhoard.dll - will make a table - give me 20 min.

7th update: Ran both intel/bi's 'beta' memallocators and the winhoard - and with empty dll folder - benchmark 2 - two runs each -> all hovered at the 8 or 9 FPS... default gfx options 1600x900 + VD=2400. (Radeon 5800+ latest drivers - server 2008 R2 x64)

So maybe I'm doing it wrong ? Or the benchmark / malloc / engine options combo won't show any big miracles.

Edited by DBGB
Details added (+ 2nd benchmark run data) + minor update

Share this post


Link to post
Share on other sites

Mission designers starting to code memory allocators...

The end of the world is near...

Fun times :)

Share this post


Link to post
Share on other sites
Just tested with the winhoard.dll x64 downloaded from here : http://plasma.cs.umass.edu/emery/download-hoard

I just made a backup of the dll folder (with tbb3malloc_bi and tbb4malloc_bi) only kept the winhoard.dll file there.

Is this the correct way to do test - or do I still need to specify the malloc option in the command line ?

This will not work, because the winhoard.dll does not conform to the interface needed for a custom allocator as specified in the Wiki, and therefore it will not be used. It should be possible to create a custom allocator based on Hoard, but it involves a bit more work.

Share this post


Link to post
Share on other sites
This will not work, because the winhoard.dll does not conform to the interface needed for a custom allocator as specified in the Wiki, and therefore it will not be used. It should be possible to create a custom allocator based on Hoard, but it involves a bit more work.

ok - Downloaded the windows source for winhoard -> hoard-38.zip

Looking into the BI wikipage : http://community.bistudio.com/wiki/ArmA_2:_Custom_Memory_Allocator

Take the 'MemTotalReserved()'

- Total memory reserved by the allocator (should correspond to VirtualAlloc with MEM_RESERVE)

I find this function call in the hoard source in the two header files:

mmapheap.h

mmapwrapper.h

And in a c file - sbrk.c

Since I'm such a novice at programming (anything) I'd like the community to help modify the hoard38.zip source to conform to the DLL Interface required by the game engine.

I have Visual Studio so I should be able to figure out how to make/build/compile from the sources.

But atm - it's too much work for me to figure this out on my own...I think ;-)

Share this post


Link to post
Share on other sites

One thing you might want to consider: it's not possible to call a 64bit DLL from a 32bit executable.

Share this post


Link to post
Share on other sites

A few questions:

  • Should we test tbb4malloc_bi.dll?
  • What is different to tbb3malloc_bi.dll?
  • Should we test some other, like the open source ones - which ones?
  • How to test? What to look for in testing?
  • Prio is stability, performance and then memory usage?

Thanks :bounce3:

Share this post


Link to post
Share on other sites

Would be easier if people can try it with a good test/benchmark mission where they can see or feel the differences. Something simple and userfriendly. :)

Share this post


Link to post
Share on other sites
One thing you might want to consider: it's not possible to call a 64bit DLL from a 32bit executable.

That is what got me interested when i read the title first time (which limited knowledge about memory allocators that is).

props for BIS for making this move, but some more information and guidelines would be appreciated (see kju's post). even with a lot of very knowledgeable guys around here...

Share this post


Link to post
Share on other sites

After some googling I've discovered that the "TBB" in TBB3/4 stands for Thread Building Blocks. Never heard of that before.

TBB4 seems to be quite new (available since the beginning of September).

Share this post


Link to post
Share on other sites
After some googling I've discovered that the "TBB" in TBB3/4 stands for Thread Building Blocks. Never heard of that before.

TBB4 seems to be quite new (available since the beginning of September).

TTB4 http://threadingbuildingblocks.org/whatsnew.php

I guess BI may or may not have a license for the commercial version (Intel resource link:) http://software.intel.com/en-us/articles/intel-tbb/

But apparently BI can distribute a version of TTB3+4 under GPLv2 + RE or maybe they have a license for the 'commercial' TTB.

Anyway - OS or not - I'm quite interested in (alternative) implementations that are focused on optimizing code-paths for multicore - Numa systems.

Dreaming again : Would be snazzy to have the core engine compiled specifically for the code-path that will give the optimal (parallel) execution flow - by the press of a button ;-) and a fallback default codepath.

Update - Maybe nedmalloc should be my focus point instead of hoard

http://www.nedprod.com/programs/portable/nedmalloc/

ArmA_2:_Custom_Memory_Allocator

tbb3malloc_bi - based on Intel TBB 3, distributed under GPL v2 + RE

tbb4malloc_bi - based on Intel TBB 4, distributed under GPL v2 + RE

jemalloc_bi - not available yet, based on JEMalloc, distributed under BSD-derived license

tcmalloc_bi - not available yet, based on TCMalloc, distributed under New BSD license

nedmalloc_bi - not available yet, based on NedMalloc, distributed under Boost Software License

customMalloc_bi - not provided, feel free to plug-in your own

It looks like BI will provide the above list, execpt for the last of course....

:-D So maybe I should just be patient.....

Edited by DBGB
Forget about hoard if nedmalloc claim is true

Share this post


Link to post
Share on other sites

Dunno if its related to those memor allocators, but performance dropped significantly with the latest beta. Im on Win7 x64.

Share this post


Link to post
Share on other sites
Dunno if its related to those memor allocators, but performance dropped significantly with the latest beta. Im on Win7 x64.
Perhaps add some useful details? http://dev-heaven.net/projects/cis/wiki/CIT#How-to-report-a-bug

Like what is significantly?

Like where/when/how do you notice this?

Like what are your startup parameters?

Like what are the mods?

...

Half of that is available by simply attaching your RPT file.

Edited by Sickboy

Share this post


Link to post
Share on other sites
Dunno if its related to those memor allocators, but performance dropped significantly with the latest beta. Im on Win7 x64.

Yes, you are right. Performance is not as good as before.

And just one smokeshell will half FPS for you or even worse (though not related and not new, available since A1, and maybe since OFP).

And who cares about bugs like this:

http://dev-heaven.net/issues/17458

A3 will fix all of this. Or not ?

Xeno

Share this post


Link to post
Share on other sites
After some googling I've discovered that the "TBB" in TBB3/4 stands for Thread Building Blocks. Never heard of that before.

TBB4 seems to be quite new (available since the beginning of September).

wonder why googling was involved when it's linked from the BIKI page listed in first thread :confused:

---------- Post added at 02:11 ---------- Previous post was at 02:08 ----------

A few questions:

  • Should we test tbb4malloc_bi.dll?
  • What is different to tbb3malloc_bi.dll?
  • Should we test some other, like the open source ones - which ones?
  • How to test? What to look for in testing?
  • Prio is stability, performance and then memory usage?

Thanks :bounce3:

as You can read in TBB4 news and details

http://threadingbuildingblocks.org/whatsnew.php

http://threadingbuildingblocks.org/features&benefits.php

the memalloc was further improved hence i suggest You try v4 if it fits better (TBB3 is default now)...

from my experience with some other projects (can't talk about they not ours anyway) which switched to v4 and it works better ...

but this ofcourse may differ due to many variables and that's why modular approach is best of all worlds :)

---------- Post added at 02:16 ---------- Previous post was at 02:11 ----------

Update - Maybe nedmalloc should be my focus point instead of hoard

http://www.nedprod.com/programs/portable/nedmalloc/

It looks like BI will provide the above list, execpt for the last of course....

:-D So maybe I should just be patient.....

keep in mind what Suma posted here:

http://forums.bistudio.com/showpost.php?p=2044253&postcount=92

We have used several memory allocators written by experts, including TCMalloc, NedMalloc and jemalloc. Most of them broke down under the ArmA 2 load, causing memory corruption or crashes.

Share this post


Link to post
Share on other sites
Yes, you are right. Performance is not as good as before.

And just one smokeshell will half FPS for you or even worse (though not related and not new, available since A1, and maybe since OFP).

Xeno

dont have this. have it with newest beta and -malloc=tbb4malloc_bi.

viewdistance 3500, on chernarus. 60 fps on most parts. if i drop one smoke

no fps drop maybe sometimes 1 frame less.

if i throw 9 smokes, than it went down to 40 fps. but thats normal, and for me it was less than in older betas.

will test more, but for me its running better.

can you implement that in takeonhelli ?? there it would also help to get better performence.

Share this post


Link to post
Share on other sites
Yes, you are right. Performance is not as good as before.

And just one smokeshell will half FPS for you or even worse (though not related and not new, available since A1, and maybe since OFP).

And who cares about bugs like this:

http://dev-heaven.net/issues/17458

A3 will fix all of this. Or not ?

Xeno

Wow someone is in a very bad mood today. :p

Share this post


Link to post
Share on other sites
wonder why googling was involved when it's linked from the BIKI page listed in first thread :confused:

I guess I just missed the fact that it was a link. :D

So I wonder which of these memory allocators was #3 - the "winner" of the malloc poll.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×