Jump to content

Photo
- - - - -

Custom Memory Allocator for engine since b85869


  • Please log in to reply
167 replies to this topic
Thread Starter
Dwarden
Dwarden

    BI Developer

  • 9654 posts

  • Joined: 05-March 2002
  • LocationBrno, Czech Republic

Posted 27 October 2011 - 12:34 #1

Since Arma 2 Operation Arrowhead build 85869 (1.60 beta) it is possible to provide custom memory allocators for the game.
The memory allocator is a very important component, which significantly affects both performance an stability of the game.
The purpose of this customization is to allow the allocator to be developed independently on the application,
allowing both Bohemia Interactive and community to fix bugs and improve performance without having to modify the core game files.

READ more @ BIKI: http://community.bis...emory_Allocator


this is followup to memalloc testing in earlier betas http://forums.bistud...ad.php?t=121455

note: this thread might be moved elsewhere when the beta is over, atm. feel free to discuss this news here ...

RealTimeChat ~ARMA2 in Your browser (w/o Java), RealTimeChat ~ARMA3 in Your browser (w/o Java),
irc.GameSurge.net/ARMA2 (external IRC clients) irc.GameSurge.net/ARMA3 (external IRC clients)
ARMA 3 Feedback Tracker: http://feedback.arma...y_view_page.php
~100k fans @STEAM ARMA 2 + ARMA 2: OA + ARMA 3: + ~2k @XFIRE A2:OA
Follow my Twitter: http://twitter.com/FoltynD or my Facebook http://facebook.com/FoltynD


-FRL-Myke
-FRL-Myke

    Moderator

  • 6571 posts

  • Joined: 27-May 2007

Posted 27 October 2011 - 13:16 #2

Uhm, yeah, interesting. Would you mind to translate this into a "programming for dummies" language?

Don't get me wrong, really appreciate every improvement. It's just, it tells me nothing if and how i would/could benefit from that. Blame it on my stupidity. :D
Posted Image

Thread Starter
Dwarden
Dwarden

    BI Developer

  • 9654 posts

  • Joined: 05-March 2002
  • LocationBrno, Czech Republic

Posted 27 October 2011 - 13:21 #3

i suggest read this http://en.wikipedia....mory_management
and some other materials about memory allocators

advantages of this approach
- You may write own allocator for engine
- You may alter existing allocators for the engine and update them anytime You see fit

another plus, ability to use allocators which aren't used by us for various reasons
(e.g. impose licencing or rules we can't adopt, while they support free usage for home users , too complicated and so on)

list of some allocators for experimenting:
HOARD: http://plasma.cs.umass.edu/emery/hoard ( http://plasma.cs.uma...licensing-hoard )

Edited by Dwarden, 27 October 2011 - 13:33.

RealTimeChat ~ARMA2 in Your browser (w/o Java), RealTimeChat ~ARMA3 in Your browser (w/o Java),
irc.GameSurge.net/ARMA2 (external IRC clients) irc.GameSurge.net/ARMA3 (external IRC clients)
ARMA 3 Feedback Tracker: http://feedback.arma...y_view_page.php
~100k fans @STEAM ARMA 2 + ARMA 2: OA + ARMA 3: + ~2k @XFIRE A2:OA
Follow my Twitter: http://twitter.com/FoltynD or my Facebook http://facebook.com/FoltynD


DBGB
DBGB

    Private First Class

  • Members
  • 26 posts

  • Joined: 21-November 2009

Posted 27 October 2011 - 17:53 #4

Nice... I guess...

This would really be a firstmover thing AFAIBelieve.

If the enduser could supply a commandline argument that would enable use of 'external' GPL/Commercial malloc implementations optimized for his/her's specific CPU/NUMA/RAM environment/topology.

So basically I dream BI could enable that the coreengine could hook into a whatever 'malloc' implementation the enduser wanted to use...

Is this the idea ?

[Would be nice if the community or BI could supply a script that would run through a benchmark that would help the enduser choose the malloc giving the best performance]

Edited by DBGB, 27 October 2011 - 17:57.
Update:


Thread Starter
Dwarden
Dwarden

    BI Developer

  • 9654 posts

  • Joined: 05-March 2002
  • LocationBrno, Czech Republic

Posted 27 October 2011 - 18:09 #5

Nice... I guess...

This would really be a firstmover thing AFAIBelieve.

If the enduser could supply a commandline argument that would enable use of 'external' GPL/Commercial malloc implementations optimized for his/her's specific CPU/NUMA/RAM environment/topology.

So basically I dream BI could enable that the coreengine could hook into a whatever 'malloc' implementation the enduser wanted to use...

Is this the idea ?

[Would be nice if the community or BI could supply a script that would run through a benchmark that would help the enduser choose the malloc giving the best performance]


it's already possible, check Yourself todays beta build :)

read the BIKI page

by default You can use either windows memalloc (erase all other allocators DLL from \dll\ directory)
or choose use Intel TBB 3.0 or Intel TBB 4.0 allocators (which are included)

RealTimeChat ~ARMA2 in Your browser (w/o Java), RealTimeChat ~ARMA3 in Your browser (w/o Java),
irc.GameSurge.net/ARMA2 (external IRC clients) irc.GameSurge.net/ARMA3 (external IRC clients)
ARMA 3 Feedback Tracker: http://feedback.arma...y_view_page.php
~100k fans @STEAM ARMA 2 + ARMA 2: OA + ARMA 3: + ~2k @XFIRE A2:OA
Follow my Twitter: http://twitter.com/FoltynD or my Facebook http://facebook.com/FoltynD


alef
alef

    Staff Sergeant

  • Members
  • 279 posts

  • Joined: 24-October 2007

Posted 27 October 2011 - 18:25 #6

Myke;2045002']Would you mind to translate this into a "programming for dummies" language?


Just think you need to fill or clean a room with solid objects.
And when you put something in the room, you can't move it later.
And you need to fill up using every little space of air.
And when you remove lot of little things, you may need to find space for a huge thing.

You may invent ideas like put big things on the right and little on the left.
Or put things that are big "2" near 2 things that are big "1" each, thightly packed.
Or put everything everywhere without care, starting from the door.
Or put blue stuff on the floor and red stuff hanging from the ceiling.
Or put important things on the front and never-used stuff on the back.
Or lay things aligned on black tiles and other on white ones.

Important is, when you need to put in the room that big object that suddendly comes, to have enough space free. And that fit its shape.

The room is your RAM, the objects are packs of bytes, the ideas are the allocators.

DBGB
DBGB

    Private First Class

  • Members
  • 26 posts

  • Joined: 21-November 2009

Posted 27 October 2011 - 18:36 #7

Browsed through the links presented above - (interesting Hoard results).

Seems some testing is imminent with this beta.

I wonder if the Intel implementation (I have a quad-socket quad-core AMD Opteron setup in "Numa" mode) will invoke some kind of artificial throttling - or 'miss' some optimizations based upon CPU architecture.... (

Intel compiler controversy: http://www.agner.org...g/read.php?i=49

That's why it would be NiceToHave some kind of malloc plugin benchmark interface that would help the casual user to decide the best malloc option to use (maybe even provide a compile option for 'custom' implementations)

Edited by DBGB, 27 October 2011 - 18:41.
Updated with a link:


MavericK96
MavericK96

    Zero Cool

  • Members
  • 1883 posts

  • Joined: 03-July 2004
  • LocationAnacortes, WA

Posted 27 October 2011 - 19:01 #8

The whole thing is a bit over my head but I'm hopeful that more skilled individuals will make good use of this. :D
Core i7 4790K @ 4.7 GHz, HT on
16 GB G.Skill DDR3-2400
Asus Strix GTX 970 (OCed)
Samsung 830 Series 256 GB SSD
Windows 7 Pro x64

DBGB
DBGB

    Private First Class

  • Members
  • 26 posts

  • Joined: 21-November 2009

Posted 27 October 2011 - 19:30 #9

Just tested with the winhoard.dll x64 downloaded from here : http://plasma.cs.uma.../download-hoard

I just made a backup of the dll folder (with tbb3malloc_bi and tbb4malloc_bi) only kept the winhoard.dll file there.

Is this the correct way to do test - or do I still need to specify the malloc option in the command line ? Please confirm.

Anyway - I tested Benchmark 2 (on chenaurus)

and got a "to many virtual blocks allocated" error.



It should be noted that I had this in my GFX options

.ArmA2OAProfile
version=2;
blood=1;
singleVoice=0;
shadingQuality=100;
shadowQuality=4;
maxSamplesPlayed=80;
anisoFilter=4;
TexQuality=3;
TexMemory=4;
...
sceneComplexity=1000000;
viewDistance=10000.001;
terrainGrid=6.25;


And this as my commandline:

Bohemia Interactive\Expansion\beta\arma2oa.exe" -nosplash -skipintro -cpucount=12 "-mod=expansion\beta;expansion\beta\expansion;@CBA;@ACE;@ACEX;@ACEX_USNavy;@ACEX_SM;@ACEX_RU

My experience: LOOOOOL - everything was less than one frame pr. second in the beginning but later the frames and the sounds of shots began to come in sync - so it sounded like a drummer on a slave galley that gradually increased his BPM - as the amount of objects shown/calculated in scene became less and less.... I think I watched the benchmark for a few minutes...thinking daaaamn...this is slowmo...but getting better...and better...and...

.....WHAM CTD with this new wonderfull error message....

I'm gonna test with the other mallocs and 'regular' no ACE commandline.


And maybe a bit less ambitious viewdistance ;-)


2nd update: Funny thing happended when testing tbb4malloc_bi.dll

Note: I didn't change any gfx/cmd options -

The benchmark initially ran just as slow as the winhoard.dll but gradually the gunshots/shell shots began to get in sync with the framerate and get faster and faster untill it actually got into something that felt like a few frames per second...

Now the funny part....This benchmark run didn't crash - it also never ended... I alt-tabbed out to write this.

The camera just stops sometime after flying over the control tower and the/ some AI shilkas goes crazy on the flying targets which sometimes circles into view (or stays out of the scene, who to tell).

This is probably related to other beta (trigger) changes... but definitely a difference between the two malloc dll's so far.


3rd update

Ahh the benchmark is about to end... I was just impatient... the screen is fading to black....I'm waiting for the FPS score....will alt tab back again in a minute...10 maybe to tell result ;-) Nevermind...it must be less than 1 FPS..

Will now test the default malloc (empty dll folder)....1..2...3

4th update:

Default malloc (empty dll folder) - crashes to desktop with the same "to many virtual blocks allocated" error. Only seems to get about half into the benchmark

5th update:

Reset GFX to default in options (VD=2400) and used winhoard.dll - no crash but still benchmarks never shows FPS / ends - will now try without ACE cmdline.

Haven't paid attention to any cpu core affinity issues - but let the engine use 12 out of 16 cores on my rig in every benchmark.

6th update: Wooohooo got 8 FPS with VD=2400 and winhoard.dll - will make a table - give me 20 min.

7th update: Ran both intel/bi's 'beta' memallocators and the winhoard - and with empty dll folder - benchmark 2 - two runs each -> all hovered at the 8 or 9 FPS... default gfx options 1600x900 + VD=2400. (Radeon 5800+ latest drivers - server 2008 R2 x64)

So maybe I'm doing it wrong ? Or the benchmark / malloc / engine options combo won't show any big miracles.

Edited by DBGB, 27 October 2011 - 21:16.
Details added (+ 2nd benchmark run data) + minor update


zyklone
zyklone

    Gunnery Sergeant

  • Members
  • 431 posts

  • Joined: 25-November 2002

Posted 27 October 2011 - 20:33 #10

Mission designers starting to code memory allocators...
The end of the world is near...

Fun times :)

Suma
Suma

    BI Developer

  • 3707 posts

  • Joined: 27-June 2001

Posted 27 October 2011 - 20:53 #11

Just tested with the winhoard.dll x64 downloaded from here : http://plasma.cs.uma.../download-hoard

I just made a backup of the dll folder (with tbb3malloc_bi and tbb4malloc_bi) only kept the winhoard.dll file there.

Is this the correct way to do test - or do I still need to specify the malloc option in the command line ?


This will not work, because the winhoard.dll does not conform to the interface needed for a custom allocator as specified in the Wiki, and therefore it will not be used. It should be possible to create a custom allocator based on Hoard, but it involves a bit more work.
Ondrej Spanel, BIS Lead Programmer

DBGB
DBGB

    Private First Class

  • Members
  • 26 posts

  • Joined: 21-November 2009

Posted 28 October 2011 - 09:05 #12

This will not work, because the winhoard.dll does not conform to the interface needed for a custom allocator as specified in the Wiki, and therefore it will not be used. It should be possible to create a custom allocator based on Hoard, but it involves a bit more work.


ok - Downloaded the windows source for winhoard -> hoard-38.zip

Looking into the BI wikipage : http://community.bis...emory_Allocator

Take the 'MemTotalReserved()'

- Total memory reserved by the allocator (should correspond to VirtualAlloc with MEM_RESERVE)

I find this function call in the hoard source in the two header files:

mmapheap.h
mmapwrapper.h

And in a c file - sbrk.c

Since I'm such a novice at programming (anything) I'd like the community to help modify the hoard38.zip source to conform to the DLL Interface required by the game engine.

I have Visual Studio so I should be able to figure out how to make/build/compile from the sources.

But atm - it's too much work for me to figure this out on my own...I think ;-)

MadDogX
MadDogX

    Mindless F@nb0!

  • Moderator
  • 9050 posts

  • Joined: 04-November 2002

Posted 28 October 2011 - 09:09 #13

One thing you might want to consider: it's not possible to call a 64bit DLL from a 32bit executable.

Gigabyte Z97-HD3 Motherboard | Intel Core i5 4690k @ 4.5GHz | NVidia GTX 970
16GB G-Skill Ripjaws 2133MHz RAM | Kingston HyperX SSD | be Quiet! 750W PSU

.kju -PvPscene-
.kju -PvPscene-

    Brigadier General

  • Members
  • 12275 posts

  • Joined: 20-October 2001

Posted 28 October 2011 - 09:21 #14

A few questions:
  • Should we test tbb4malloc_bi.dll?
  • What is different to tbb3malloc_bi.dll?
  • Should we test some other, like the open source ones - which ones?
  • How to test? What to look for in testing?
  • Prio is stability, performance and then memory usage?

Thanks :bounce3:



Current active projects: None :(

Maintained/assisted projects: IFA3, Blitzkrieg


Help: Got a crash? Report it! What is the RPT log file?


NoRailgunner
NoRailgunner

    Second Lieutenant

  • Members
  • 4688 posts

  • Joined: 23-January 2007

Posted 28 October 2011 - 09:27 #15

Would be easier if people can try it with a good test/benchmark mission where they can see or feel the differences. Something simple and userfriendly. :)

PuFu
PuFu

    Poly Bully

  • Members
  • 7221 posts

  • Joined: 17-February 2007

Posted 28 October 2011 - 09:57 #16

One thing you might want to consider: it's not possible to call a 64bit DLL from a 32bit executable.

That is what got me interested when i read the title first time (which limited knowledge about memory allocators that is).

props for BIS for making this move, but some more information and guidelines would be appreciated (see kju's post). even with a lot of very knowledgeable guys around here...

Posted Image


MadDogX
MadDogX

    Mindless F@nb0!

  • Moderator
  • 9050 posts

  • Joined: 04-November 2002

Posted 28 October 2011 - 10:02 #17

After some googling I've discovered that the "TBB" in TBB3/4 stands for Thread Building Blocks. Never heard of that before.

TBB4 seems to be quite new (available since the beginning of September).

Gigabyte Z97-HD3 Motherboard | Intel Core i5 4690k @ 4.5GHz | NVidia GTX 970
16GB G-Skill Ripjaws 2133MHz RAM | Kingston HyperX SSD | be Quiet! 750W PSU

DBGB
DBGB

    Private First Class

  • Members
  • 26 posts

  • Joined: 21-November 2009

Posted 28 October 2011 - 11:06 #18

After some googling I've discovered that the "TBB" in TBB3/4 stands for Thread Building Blocks. Never heard of that before.

TBB4 seems to be quite new (available since the beginning of September).


TTB4 http://threadingbuil...rg/whatsnew.php

I guess BI may or may not have a license for the commercial version (Intel resource link:) http://software.inte...cles/intel-tbb/

But apparently BI can distribute a version of TTB3+4 under GPLv2 + RE or maybe they have a license for the 'commercial' TTB.

Anyway - OS or not - I'm quite interested in (alternative) implementations that are focused on optimizing code-paths for multicore - Numa systems.

Dreaming again : Would be snazzy to have the core engine compiled specifically for the code-path that will give the optimal (parallel) execution flow - by the press of a button ;-) and a fallback default codepath.

Update - Maybe nedmalloc should be my focus point instead of hoard

http://www.nedprod.c...able/nedmalloc/

ArmA_2:_Custom_Memory_Allocator

tbb3malloc_bi - based on Intel TBB 3, distributed under GPL v2 + RE
tbb4malloc_bi - based on Intel TBB 4, distributed under GPL v2 + RE
jemalloc_bi - not available yet, based on JEMalloc, distributed under BSD-derived license
tcmalloc_bi - not available yet, based on TCMalloc, distributed under New BSD license
nedmalloc_bi - not available yet, based on NedMalloc, distributed under Boost Software License
customMalloc_bi - not provided, feel free to plug-in your own



It looks like BI will provide the above list, execpt for the last of course....

:-D So maybe I should just be patient.....

Edited by DBGB, 28 October 2011 - 12:25.
Forget about hoard if nedmalloc claim is true


mr.g-c
mr.g-c

    Warrant Officer

  • Members
  • 2381 posts

  • Joined: 14-January 2007

Posted 28 October 2011 - 14:48 #19

Dunno if its related to those memor allocators, but performance dropped significantly with the latest beta. Im on Win7 x64.
Marek Spanel: [...] Every single element is well taught so that it fits together. So this is a significant change, because with ArmA 1 it was just random, really.
We made some units because we had to. There wasn't much passion from our side with the first ArmA, to be honest. This time it's different. (Videogamer.com Interview:)

Please BIS: Arma2 must become a TRUE MASTERPIECE - Not a middle-heavy catastrophe!

sickboy
sickboy

    Colonel

  • Members
  • 9947 posts

  • Joined: 11-May 2005

Posted 28 October 2011 - 14:49 #20

Dunno if its related to those memor allocators, but performance dropped significantly with the latest beta. Im on Win7 x64.

Perhaps add some useful details? http://dev-heaven.ne...to-report-a-bug

Like what is significantly?
Like where/when/how do you notice this?
Like what are your startup parameters?
Like what are the mods?
...
Half of that is available by simply attaching your RPT file.

Edited by Sickboy, 28 October 2011 - 14:54.