Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Community » U++ community news and announcements » Core 2019
Re: Core 2019 [message #51849 is a reply to message #51846] Sun, 09 June 2019 18:51 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 09 June 2019 17:02

If I remember correctly, some of the system allocation routines initialize allocated memory with zeros even if you do not write there anything ...


They can delay that to the moment the page is allocated in physical memory.

Mirek
Re: Core 2019 [message #51850 is a reply to message #51848] Sun, 09 June 2019 18:54 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Looking at peak profile, it looks like there are very little "small" blocks and most of memory is in those 80 "huge" (that means >64KB) blocks.

Can that be correct?

Mirek
Re: Core 2019 [message #51854 is a reply to message #51847] Sun, 09 June 2019 20:56 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 09 June 2019 17:15
I hacked your TIMING macro and made a similar RMEMUSE one:


There is also

int MemoryUsedKbMax();

anyway, both MemoryUsedKb and this one have one disadvantage - they only count active blocks, so if fragmentation is high, it is not accounted for.

That said, it looks like the fragmentation is the real culprit here. It looks like we have 300MB of active memory and 500MB in memory fragments. Looks like stdalloc fights with that too, with little bit better success.

I would like to get a list of allocations your code is doing so that I can hopefully replicate it and investigate whether there can be anything done to reduce the fragmentation.... I will post temporary changes to get the log tomorrow, if you are willing to help.

Mirek
Re: Core 2019 [message #51855 is a reply to message #51850] Sun, 09 June 2019 21:39 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Sun, 09 June 2019 12:54
Looking at peak profile, it looks like there are very little "small" blocks and most of memory is in those 80 "huge" (that means >64KB) blocks.

Can that be correct?

Mirek

It is hard to tell. I'm not controlling that.
Another problem is that all allocations/deallocations happen in CoWork's threads. I cannot call RDUMP(*PeakMemoryProfile()) inside of CoWork because it will be called at least 181363 times ...

The app is parsing Wikipedia XML dump. It is decompressing a bz2 archive and parsing chunks of XML. After that my own parser is parsing Mediawiki text.
As a first pass my parser is building a list of tokens organized as a Vector<> (I'm not inserting in the middle Smile )
My parser is avoiding memory allocation at all possible costs. I'm calling Vector::SetCountR and reusing these vectors. When I need to deal with String I'm using my own not owning data string class.
Unfortunately, I cannot control memory allocation with XmlParser. I have to relay on the default allocator.

Ideally, I'd love to see U++ allocator designed like this.
Related papers:
https://people.cs.umass.edu/~emery/pubs/berger-pldi2001.pdf
https://erdani.com/publications/cuj-2005-12.pdf
https://accu.org/content/conf2008/Alexandrescu-memory-alloca tion.screen.pdf

It doesn't have to be a complete implementation of everything. I just would like to be able plug into U++'s allocator in a similar fashion and extend/tune it.

mirek wrote on Sun, 09 June 2019 14:56
I would like to get a list of allocations your code is doing so that I can hopefully replicate it and investigate whether there can be anything done to reduce the fragmentation.... I will post temporary changes to get the log tomorrow, if you are willing to help.

Yes, I'm willing to help. I even willing to implement this policy-based allocator. I just need the ability to integrate it into U++. It doesn't have to be a part of U++.


Regards,
Novo
Re: Core 2019 [message #51856 is a reply to message #51855] Sun, 09 June 2019 23:11 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Sun, 09 June 2019 21:39
mirek wrote on Sun, 09 June 2019 12:54
Looking at peak profile, it looks like there are very little "small" blocks and most of memory is in those 80 "huge" (that means >64KB) blocks.

Can that be correct?

Mirek

It is hard to tell. I'm not controlling that.
Another problem is that all allocations/deallocations happen in CoWork's threads. I cannot call RDUMP(*PeakMemoryProfile()) inside of CoWork because it will be called at least 181363 times ...


Why would you want to? Peak is really peak, it is profile at the moment when there is maximum memory use.

One caveat about profile is that it is only profile of current thread for small and large blocks. But our problem is with huge blocks anyway.

Quote:

The app is parsing Wikipedia XML dump. It is decompressing a bz2 archive and parsing chunks of XML. After that my own parser is parsing Mediawiki text.
As a first pass my parser is building a list of tokens organized as a Vector<> (I'm not inserting in the middle Smile )
My parser is avoiding memory allocation at all possible costs. I'm calling Vector::SetCountR and reusing these vectors. When I need to deal with String I'm using my own not owning data string class.


Well, maybe there can also be an interference with MemoryTryRealloc (as those Vectors grow). Perhaps you can test what happens if

bool  MemoryTryRealloc(void *ptr, size_t& newsize) {
	return false; // (((dword)(uintptr_t)ptr) & 16) && MemoryTryRealloc__(ptr, newsize);
}


Quote:

Unfortunately, I cannot control memory allocation with XmlParser. I have to relay on the default allocator.


There are not many... BTW, are you parsing memory - XmlParser(const char *), or streams - XmlParser(Stream& in) ?

Mirek
Re: Core 2019 [message #51857 is a reply to message #51856] Sun, 09 June 2019 23:25 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Here is the code for logging all huge allocations (replace in Core/hheap.cpp):


void *Heap::HugeAlloc(size_t count) // count in 4kb pages
{
	ASSERT(count);

#ifdef LSTAT
	if(count < 65536)
		hstat[count]++;
#endif

	huge_4KB_count += count;
	
	if(huge_4KB_count > huge_4KB_count_max) {
		huge_4KB_count_max = huge_4KB_count;
		if(MemoryUsedKb() > sKBLimit)
			Panic("MemoryLimitKb breached!");
		if(sPeak)
			Make(*sPeak);
	}

	if(!D::freelist[0]->next) { // initialization
		for(int i = 0; i < 2; i++)
			Dbl_Self(D::freelist[i]);
	}
		
	if(count > HPAGE) { // we are wasting 4KB to store just 4 bytes here, but this is >32MB after all..
		LTIMING("SysAlloc");
		byte *sysblk = (byte *)SysAllocRaw((count + 1) * 4096, 0);
		BlkHeader *h = (BlkHeader *)(sysblk + 4096);
		h->size = 0;
		*((size_t *)sysblk) = count;
		sys_count++;
		sys_size += 4096 * count;
		return h;
	}
	
	LTIMING("Huge Alloc");

	word wcount = (word)count;
	
	if(16 * free_4KB > huge_4KB_count) // keep number of free 4KB blocks in check
		FreeSmallEmpty(INT_MAX, int(free_4KB - huge_4KB_count / 32));
	
	for(int pass = 0; pass < 2; pass++) {
		for(int i = count >= 16; i < 2; i++) {
			BlkHeader *l = D::freelist[i];
			BlkHeader *h = l->next;
			while(h != l) {
				word sz = h->GetSize();
				if(sz >= count) {
					void *ptr = MakeAlloc(h, wcount);
					if(count > 16)
						RLOG("HugeAlloc " << asString(ptr) << ", size: " << asString(count));
					return ptr;
				}
				h = h->next;
			}
		}

		if(!FreeSmallEmpty(wcount, INT_MAX)) { // try to coalesce 4KB small free blocks back to huge storage
			void *ptr = SysAllocRaw(HPAGE * 4096, 0);
			HugePage *pg = (HugePage *)MemoryAllocPermanent(sizeof(HugePage));
			pg->page = ptr;
			pg->next = huge_pages;
			huge_pages = pg;
			AddChunk((BlkHeader *)ptr, HPAGE); // failed, add 32MB from the system
			huge_chunks++;
		}
	}
	Panic("Out of memory");
	return NULL;
}

int Heap::HugeFree(void *ptr)
{
	BlkHeader *h = (BlkHeader *)ptr;
	if(h->size == 0) {
		LTIMING("Sys Free");
		byte *sysblk = (byte *)h - 4096;
		size_t count = *((size_t *)sysblk);
		SysFreeRaw(sysblk, (count + 1) * 4096);
		huge_4KB_count -= count;
		sys_count--;
		sys_size -= 4096 * count;
		return 0;
	}
	LTIMING("Huge Free");
	if(h->GetSize() > 16)
		RLOG("HugeFree " << asString(ptr) << ", size: " << asString(h->GetSize()));
	huge_4KB_count -= h->GetSize();
	return BlkHeap::Free(h)->GetSize();
}

bool Heap::HugeTryRealloc(void *ptr, size_t count)
{
	bool b = count <= HPAGE && BlkHeap::TryRealloc(ptr, count, huge_4KB_count);
	if(b)
		RLOG("HugeRealloc " << asString(ptr) << ", size: " << asString(count));
	return b;
}


(please test with active MemoryTryRealloc)

[Updated on: Sun, 09 June 2019 23:25]

Report message to a moderator

Re: Core 2019 [message #51860 is a reply to message #51857] Mon, 10 June 2019 17:27 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
I have tried to improve fragmentation using approximate best fit, hopefully this will help a bit... (in trunk)
Re: Core 2019 [message #51862 is a reply to message #51860] Mon, 10 June 2019 18:01 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Mon, 10 June 2019 11:27
I have tried to improve fragmentation using approximate best fit, hopefully this will help a bit... (in trunk)

Thanks!
mem: 400 Mb, time: 230 s.
This is a huge improvement.


Regards,
Novo
Re: Core 2019 [message #51863 is a reply to message #51862] Mon, 10 June 2019 18:17 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 11892
Registered: November 2005
Ultimate Member
Novo wrote on Mon, 10 June 2019 18:01
mirek wrote on Mon, 10 June 2019 11:27
I have tried to improve fragmentation using approximate best fit, hopefully this will help a bit... (in trunk)

Thanks!
mem: 400 Mb, time: 230 s.
This is a huge improvement.


Cool. So I guess issue solved and we do not need to worry about other tests?

Mirek
Re: Core 2019 [message #51864 is a reply to message #51856] Mon, 10 June 2019 18:18 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Sun, 09 June 2019 17:11
BTW, are you parsing memory - XmlParser(const char *), or streams - XmlParser(Stream& in) ?

Stream. bz2::DecompressStream.
I guess that XmlParser is responsible for fragmentation.


Regards,
Novo
Re: Core 2019 [message #51865 is a reply to message #51863] Mon, 10 June 2019 18:21 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Mon, 10 June 2019 12:17

Cool. So I guess issue solved and we do not need to worry about other tests?

I'll try to run other tests and see what happens ...


Regards,
Novo
Re: Core 2019 [message #51866 is a reply to message #51856] Mon, 10 June 2019 18:34 Go to previous messageGo to next message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Sun, 09 June 2019 17:11

Well, maybe there can also be an interference with MemoryTryRealloc (as those Vectors grow). Perhaps you can test what happens if

bool  MemoryTryRealloc(void *ptr, size_t& newsize) {
	return false; // (((dword)(uintptr_t)ptr) & 16) && MemoryTryRealloc__(ptr, newsize);
}


This doesn't affect anything.


Regards,
Novo
Re: Core 2019 [message #51867 is a reply to message #51857] Mon, 10 June 2019 18:45 Go to previous message
Novo is currently offline  Novo
Messages: 817
Registered: December 2006
Experienced Contributor
mirek wrote on Sun, 09 June 2019 17:25
Here is the code for logging all huge allocations (replace in Core/hheap.cpp):

This code is crashing with the latest trunk.
I guess we can stop at this point.


Regards,
Novo
Previous Topic: U++ 2019.1 released
Goto Forum:
  


Current Time: Wed Jun 19 12:44:26 CEST 2019

Total time taken to generate the page: 0.00797 seconds