• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 22:01
CET 04:01
KST 12:01
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
TL.net Map Contest #21: Winners10Intel X Team Liquid Seoul event: Showmatches and Meet the Pros10[ASL20] Finals Preview: Arrival13TL.net Map Contest #21: Voting12[ASL20] Ro4 Preview: Descent11
Community News
StarCraft, SC2, HotS, WC3, Returning to Blizzcon!41$5,000+ WardiTV 2025 Championship6[BSL21] RO32 Group Stage4Weekly Cups (Oct 26-Nov 2): Liquid, Clem, Solar win; LAN in Philly2Weekly Cups (Oct 20-26): MaxPax, Clem, Creator win10
StarCraft 2
General
StarCraft, SC2, HotS, WC3, Returning to Blizzcon! Mech is the composition that needs teleportation t TL.net Map Contest #21: Winners Weekly Cups (Oct 20-26): MaxPax, Clem, Creator win RotterdaM "Serral is the GOAT, and it's not close"
Tourneys
Constellation Cup - Main Event - Stellar Fest $5,000+ WardiTV 2025 Championship Sparkling Tuna Cup - Weekly Open Tournament Merivale 8 Open - LAN - Stellar Fest Sea Duckling Open (Global, Bronze-Diamond)
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 498 Wheel of Misfortune|Cradle of Death Mutation # 497 Battle Haredened Mutation # 496 Endless Infection Mutation # 495 Rest In Peace
Brood War
General
BW General Discussion [ASL20] Ask the mapmakers — Drop your questions [BSL21] RO32 Group Stage BGH Auto Balance -> http://bghmmr.eu/ SnOw's ASL S20 Finals Review
Tourneys
[BSL21] RO32 Group A - Saturday 21:00 CET [ASL20] Grand Finals [Megathread] Daily Proleagues [BSL21] RO32 Group B - Sunday 21:00 CET
Strategy
Current Meta PvZ map balance How to stay on top of macro? Soma's 9 hatch build from ASL Game 2
Other Games
General Games
Stormgate/Frost Giant Megathread Nintendo Switch Thread Path of Exile Should offensive tower rushing be viable in RTS games? Dawn of War IV
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread SPIRED by.ASL Mafia {211640}
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine YouTube Thread Dating: How's your luck?
Fan Clubs
White-Ra Fan Club The herO Fan Club!
Media & Entertainment
[Manga] One Piece Anime Discussion Thread Movie Discussion! Korean Music Discussion Series you have seen recently...
Sports
2024 - 2026 Football Thread Formula 1 Discussion NBA General Discussion MLB/Baseball 2023 TeamLiquid Health and Fitness Initiative For 2023
World Cup 2022
Tech Support
SC2 Client Relocalization [Change SC2 Language] Linksys AE2500 USB WIFI keeps disconnecting Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List Recent Gifted Posts
Blogs
Learning my new SC2 hotkey…
Hildegard
Coffee x Performance in Espo…
TrAiDoS
Saturation point
Uldridge
DnB/metal remix FFO Mick Go…
ImbaTosS
Reality "theory" prov…
perfectspheres
Our Last Hope in th…
KrillinFromwales
Customize Sidebar...

Website Feedback

Closed Threads



Active: 991 users

The Big Programming Thread - Page 644

Forum Index > General Forum
Post a Reply
Prev 1 642 643 644 645 646 1032 Next
Thread Rules
1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution.
2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20)
3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible.
4. Use [code] tags to format code blocks.
Slithe
Profile Blog Joined February 2007
United States985 Posts
June 23 2015 22:20 GMT
#12861
What kind of operations are you doing? 10k integers isn't that big, so I'm surprised that you're running into paging issues.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
Last Edited: 2015-06-24 03:26:38
June 24 2015 03:25 GMT
#12862
It's an assignment, Java is required ^^

It's a 100K by 100K dense graph I need to count triangles on. Multi-threading is easy to do because I can abuse a little bit of parallelism for what to parse at what time. But I need to implement a single-threaded solution that doesn't cramp up. Currently we split the edges in half to do triangle counting, but we have to do about 600M set contains() operations on essentially sets of ~100K integers, which Java chokes on because it can't keep all the edges in memory at the same time.

I just have no clue what to do to keep all the integers in memory so when I can call contains() on different sets of size 100K and not have it cramp up. The contains() operations are incredibly fast (<5ms), but having 600M of them in serial hurts.

Been trying to partition the data so they're potentially allocated closer together but the benefits are intangible/worse most of the time. I might have to just accept this is as fast as I'm going to be able to call contains() 600M times.

I'm more of a C guy so doing this in Java is much more work :p
There is no one like you in the universe.
phar
Profile Joined August 2011
United States1080 Posts
Last Edited: 2015-06-24 04:00:31
June 24 2015 03:59 GMT
#12863
Did you do an implementation in C and actually get faster results?

600 million 32 bit integers is like less than 3GB of RAM.

What is the actual algorithm you're using to count triangles? What data structures are you using?

How many edges do you actually have for 100K nodes? (Depending on the real-world application this may be only around 1M total edges, which you should easily be able to fit in memory in Java on a single machine).


Have you actually verified you're running out of memory? If so:

How are you invoking java? Are you giving the JVM enough RAM? For example a lot of default jvm will stick you at 1G max heap, if you're allocating >1G of objects you'll have a bad time.

try

java -XX:+PrintFlagsFinal -version | grep HeapSize


if your max heap size is tiny, try -Xmx to bump it up
Who after all is today speaking about the destruction of the Armenians?
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
Last Edited: 2015-06-24 13:22:00
June 24 2015 13:11 GMT
#12864
--- Nuked ---
Acrofales
Profile Joined August 2010
Spain18111 Posts
Last Edited: 2015-06-24 14:41:49
June 24 2015 14:40 GMT
#12865
On June 24 2015 12:25 Blisse wrote:
It's an assignment, Java is required ^^

It's a 100K by 100K dense graph I need to count triangles on. Multi-threading is easy to do because I can abuse a little bit of parallelism for what to parse at what time. But I need to implement a single-threaded solution that doesn't cramp up. Currently we split the edges in half to do triangle counting, but we have to do about 600M set contains() operations on essentially sets of ~100K integers, which Java chokes on because it can't keep all the edges in memory at the same time.

I just have no clue what to do to keep all the integers in memory so when I can call contains() on different sets of size 100K and not have it cramp up. The contains() operations are incredibly fast (<5ms), but having 600M of them in serial hurts.

Been trying to partition the data so they're potentially allocated closer together but the benefits are intangible/worse most of the time. I might have to just accept this is as fast as I'm going to be able to call contains() 600M times.

I'm more of a C guy so doing this in Java is much more work :p

Well, clearly you have to think about your data structure. You don't have lists of integers, you have a dense graph with 100K nodes. At least, that's what I understand from your 100K by 100K (which makes it sound more like a matrix, but presumably that is your matrix representation of the graph?) There are many ways of representing that, the most common being a graph datastructure and a matrix (basically a 2D array), both of which have some good Java libraries for manipulating.

However, given your task, I actually doubt your graph is dense. A fully connected graph of 100K nodes has C(100K, 3) triangles, or to be precise: 166,661,666,700,000 triangles in a fully connected graph of 100K nodes. Now obviously dense does not mean fully connected, just mostly connected. Which if you store this as a text file using 1 byte per triangle (unrealistic assumption, you will use more), that is still ~166TB to say what the triangles are. Now even if it is not fully connected, it only decreases linearly for a very long time, and for a densely connected graph you can count on a lower bound of about 100K^2 triangles, or using 1 byte per triangle, ~10GB of triangles.

In a real problem, these can be the kind of sizes you are looking at. However, in real problems graphs are usually sparsely connected (small world). I did a quick google and cannot actually find a real-world problem that is represented as a dense graph.

For a sparse graph you can use different datastructures and algorithms. While you can always represent it as a dense matrix (read: a matrix where the 0s are explicitly represented), it is horribly inefficient.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
Last Edited: 2015-06-24 16:24:54
June 24 2015 16:20 GMT
#12866
On June 24 2015 12:59 phar wrote:
Did you do an implementation in C and actually get faster results?

600 million 32 bit integers is like less than 3GB of RAM.

What is the actual algorithm you're using to count triangles? What data structures are you using?

How many edges do you actually have for 100K nodes? (Depending on the real-world application this may be only around 1M total edges, which you should easily be able to fit in memory in Java on a single machine).


Have you actually verified you're running out of memory? If so:

How are you invoking java? Are you giving the JVM enough RAM? For example a lot of default jvm will stick you at 1G max heap, if you're allocating >1G of objects you'll have a bad time.

try

java -XX:+PrintFlagsFinal -version | grep HeapSize


if your max heap size is tiny, try -Xmx to bump it up


600M contains() operations. We use about 7-8GBs of memory on the servers last time I checked, we might be at 5GB now.

Our input is the vertex and a space separated list of other vertices it is connected to (v1, v2) edge pair. There are 100K vertices and 10M edges.

We're not running out of memory (we probably will when we have to use the not-as-good servers, but that's for later).

The algorithm is, for each vertex, generate all size-2 ascending order combinations from the list of adjacent vertices by partitioning the list into less than the current vertex and greater than the current vertex. For each size-2 combination, check if the edge exists. If it exists, this is a unique triangle, otherwise not.

We store the left lower combinations in a 100K list of list of integers in sorted order (sorting takes <1s and gets us 15s faster due to locality). We store the right upper combinations in a 100K list of sets of integers by a hashset (so the contains operations can be as fast as possible). The triangle check is just a for loop over >600M iterations of calls to contains().

This is as efficient as we think we can be without any more assumptions about the graph layout.

I think 10M edges in a graph of 100K vertices is like 0.1% the total number of edges, which I assume can be considered dense. The reason I'm asking about this is because I believe I'm at the limits for using standard data structures in Java (only HashSets and sorted ArrayLists really), but I have no clue because I use C/C++ more. LinkedHashSets to get sorted ordering adds too much iteration overhead to get the locality benefits.
There is no one like you in the universe.
Acrofales
Profile Joined August 2010
Spain18111 Posts
June 24 2015 16:57 GMT
#12867
Sounds sparse to me. You only have 100 times as many edges as you do nodes. While you might make an argument for it being dense for a small world graph, any small world graph is a sparse graph.

If I understand your algorithm, what you do (conceptually) is:

for each node do:
neighbours = { v in node.getNeighbours() | v.id > node.id }
for each neighbour do:
seconds = { v in neighbour.getNeighbours() | v.id < neighbour.id }
for each second in seconds do:
if second in neighbours then:
add (node.id, neighbour.id, second.id) to triangles
endif
endfor
endfor


That seems like a pretty good approach. I can't think of any direct improvements.

All of these operations seem like they would be more efficient storing all the nodes in a hashmap with their id as key, and in each node you keep two lists of neighbours: those with ID > node.id and those with ID < node.id. The total number of neighbours is on average about 200 IDs long, making your entire hashmap about 100k * 200 + 100k = 20.1million integers + overhead, which should not be a problem at all. Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
June 24 2015 17:53 GMT
#12868
Yup that's basically our algorithm, though we already do the pre-processing. We have two 2D arrays, one array of arrays for the ID < node.id, one array of sets for ID > node.id. We originally had HashMaps instead of ArrayLists but it doesn't seem to have a significant impact for the outer array, no noticeable performance difference.

Might have to consider trying to store the < and > lists in a single class though instead of two lists, which might help a bit with locality...
There is no one like you in the universe.
Ropid
Profile Joined March 2009
Germany3557 Posts
June 24 2015 18:14 GMT
#12869
Is that ArrayList<Integer>? What's the difference between Integer and int? Would the 8GB turn into 2GB or something if it could be int instead of Integer?
"My goal is to replace my soul with coffee and become immortal."
spinesheath
Profile Blog Joined June 2009
Germany8679 Posts
June 24 2015 18:15 GMT
#12870
On June 25 2015 01:57 Acrofales wrote:
Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.

That doesn't seem quite right. Is it that way in Java? That would be an awful implementation of a hashtable.

If your performance issues come from cache misses because the data is too large (do you have a way to properly verify that?), you definitely should look into ways to split the data up into multiple chunks and process them individually. Even if you have to "overlap" the chunks and execute a lot more instructions because of that, it still should be significantly faster. Cache misses are very expensive in high performance computing scenarios.
If you have a good reason to disagree with the above, please tell me. Thank you.
Acrofales
Profile Joined August 2010
Spain18111 Posts
June 24 2015 18:24 GMT
#12871
On June 25 2015 02:53 Blisse wrote:
Yup that's basically our algorithm, though we already do the pre-processing. We have two 2D arrays, one array of arrays for the ID < node.id, one array of sets for ID > node.id. We originally had HashMaps instead of ArrayLists but it doesn't seem to have a significant impact for the outer array, no noticeable performance difference.

Might have to consider trying to store the < and > lists in a single class though instead of two lists, which might help a bit with locality...

But then I don't know why you are having problems with your contains() calls. On average, every node should have 200 edges (so even if it's scale-free, the average is 200), and that check should be negligible: each edge is checked at most twice, so you will have at most 2e contains calls (in practice far less). If you have a scale-free topology and are ending up doing your contains calls on the highly connected nodes all the times, switch your ordering (check the least-connected nodes first instead of the most-connected ones). In big-O complexity it makes no difference, but in practice it means you will have lots and lots and lots of small contains() calls, instead of a few large ones.

Acrofales
Profile Joined August 2010
Spain18111 Posts
Last Edited: 2015-06-24 18:47:19
June 24 2015 18:30 GMT
#12872
On June 25 2015 03:15 spinesheath wrote:
Show nested quote +
On June 25 2015 01:57 Acrofales wrote:
Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.

That doesn't seem quite right. Is it that way in Java? That would be an awful implementation of a hashtable.

If your performance issues come from cache misses because the data is too large (do you have a way to properly verify that?), you definitely should look into ways to split the data up into multiple chunks and process them individually. Even if you have to "overlap" the chunks and execute a lot more instructions because of that, it still should be significantly faster. Cache misses are very expensive in high performance computing scenarios.

Yeah. Whoops. It's O(1). And you can probably make it a faster O(1) by implementing your own map for integers, as Blisse seems to have done. However, the log n in the complexity is for checking the presence of "second" in the "neighbours" list. This is log n (assuming neighbours is sorted, as Blisse said it was), not for look up in the hashtable.
Acrofales
Profile Joined August 2010
Spain18111 Posts
Last Edited: 2015-06-24 18:38:55
June 24 2015 18:36 GMT
#12873
On June 25 2015 03:14 Ropid wrote:
Is that ArrayList<Integer>? What's the difference between Integer and int? Would the 8GB turn into 2GB or something if it could be int instead of Integer?

http://stackoverflow.com/questions/8661420/do-the-java-integer-and-double-objects-have-unnecessary-overhead

Integers have 12 bytes overhead (which, given that a 32-bit integers is 4 bytes, is quite significant). Now this seems to be referencing Java 1.3 (which is ancient), so perhaps it has changed.

Using a library (or building your own datastructure) that allows for the use of primitives might very well make a difference. However, it shouldn't be 2GB in the first place for the size of graph he is talking about. So doing more in-depth profiling of what is eating all the memory and where the performance issues are coming from seems necessary before scrapping the standard library.
phar
Profile Joined August 2011
United States1080 Posts
June 25 2015 03:52 GMT
#12874
Ok back up, I am confused. You are or aren't running out of memory? The initial post mentioned something about your data set getting paged out, yet you say you have only 10M edges, which should trivially fit in memory (like honestly 10M int is only 40MB or something). Where are you getting 8GB from??



What's the distribution of your dataset like? Is it uniform, or following a more real-world (i.e. power-law) distribution? If the latter, there's a rather large number of better algorithms you could use.
Who after all is today speaking about the destruction of the Armenians?
RoyGBiv_13
Profile Blog Joined August 2010
United States1275 Posts
June 25 2015 04:08 GMT
#12875
On June 24 2015 04:18 sabas123 wrote:
Show nested quote +
On June 23 2015 09:53 RoyGBiv_13 wrote:
A dynamically sized buffer or string can be declared on the stack, if it is small (my rule of thumb is under 100 bytes or 1 line in a file), or placed on the heap if it is large. It should always be passed by reference.

You mean if the buffer is big you want to pass it by reference right?

Nice guide btw, I enjoyed it alot.


The data has to live somewhere on the system. You can declare it on the stack in say, main(), then pass by reference to helper functions to modify/generate/use the data. Once you have the buffer placed in memory, you should almost always pass by reference, since you need the address of the buffer anyway to iterate through it.
Any sufficiently advanced technology is indistinguishable from magic
SixStrings
Profile Blog Joined August 2013
Germany2046 Posts
June 25 2015 09:45 GMT
#12876
Hey girls,

I probably have no business posting here, but here goes:

I just finished the Khan course "Introduction to Javascript" and I enjoyed it in a brain-teaser / puzzle kind of way, so I want to get deeper into programming. There's a Python course starting in October, what could I do in the meanwhile?
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
June 25 2015 11:19 GMT
#12877
--- Nuked ---
Isualin
Profile Joined March 2011
Germany1903 Posts
June 25 2015 11:37 GMT
#12878
I am really interested in procedural generation of mazes and cities after playing dozens of roguelikes. I checked out /r/proceduralgeneration and this stackoverflow link. Visualization part looks really hard to me because i have no experience in shaders, 3d rendering or directx/opengl.(I work as fullstack web-developer) Can anyone recommend me some stuff to work on?
| INnoVation | The literal god TY | ByuNjwa | LRSL when? |
Acrofales
Profile Joined August 2010
Spain18111 Posts
June 25 2015 13:32 GMT
#12879
On June 25 2015 18:45 SixStrings wrote:
Hey girls,

I probably have no business posting here, but here goes:

I just finished the Khan course "Introduction to Javascript" and I enjoyed it in a brain-teaser / puzzle kind of way, so I want to get deeper into programming. There's a Python course starting in October, what could I do in the meanwhile?

Hey cutie,

Well shucks, I think you should probably stick to embroidery or crotcheting. But if you actually want advice, how about you start by showing respect to the community, and use a gender neutral address. If not, then address the vast majority of people here, which are almost certainly going to be guys: you are on a gaming website dedicated to Starcraft (don't remember the stats, but I believe it was about 90% male players), and in the programming thread (my CS course had about a 95% male population): combine the two populations and the only girl here is the one giving you a blowjob while she's phoning her boyfriend + Show Spoiler +
Dating how's your luck thread never forgets
.
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
Last Edited: 2015-06-25 13:47:49
June 25 2015 13:47 GMT
#12880
--- Nuked ---
Prev 1 642 643 644 645 646 1032 Next
Please log in or register to reply.
Live Events Refresh
Replay Cast
23:00
PiGosaur Cup #55
Liquipedia
BSL 21
20:00
ProLeague - RO32 Group A
Gosudark vs Kyrie
Gypsy vs OyAji
UltrA vs Radley
Dandy vs Ptak
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
RuFF_SC2 129
StarCraft: Brood War
Britney 20637
Sea 1806
NaDa 68
Noble 45
Dota 2
monkeys_forever460
NeuroSwarm96
League of Legends
JimRising 604
Heroes of the Storm
Khaldor141
Other Games
tarik_tv10367
summit1g9517
ViBE44
goatrope34
Models1
Organizations
Other Games
gamesdonequick617
Counter-Strike
PGL119
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 13 non-featured ]
StarCraft 2
• Hupsaiya 171
• davetesta9
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• masondota21604
Upcoming Events
Sparkling Tuna Cup
6h 59m
WardiTV Korean Royale
8h 59m
LAN Event
11h 59m
ByuN vs Zoun
TBD vs TriGGeR
Clem vs TBD
IPSL
14h 59m
JDConan vs WIZARD
WolFix vs Cross
BSL 21
16h 59m
spx vs rasowy
HBO vs KameZerg
Cross vs Razz
dxtr13 vs ZZZero
Replay Cast
1d 5h
Wardi Open
1d 8h
WardiTV Korean Royale
2 days
Replay Cast
3 days
Kung Fu Cup
3 days
Classic vs Solar
herO vs Cure
Reynor vs GuMiho
ByuN vs ShoWTimE
[ Show More ]
Tenacious Turtle Tussle
3 days
The PondCast
4 days
RSL Revival
4 days
Solar vs Zoun
MaxPax vs Bunny
Kung Fu Cup
4 days
WardiTV Korean Royale
4 days
RSL Revival
5 days
Classic vs Creator
Cure vs TriGGeR
Kung Fu Cup
5 days
CranKy Ducklings
6 days
RSL Revival
6 days
herO vs Gerald
ByuN vs SHIN
Kung Fu Cup
6 days
BSL 21
6 days
Tarson vs Julia
Doodle vs OldBoy
eOnzErG vs WolFix
StRyKeR vs Aeternum
Liquipedia Results

Completed

BSL 21 Points
SC4ALL: StarCraft II
Eternal Conflict S1

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
SOOP Univ League 2025
YSL S2
BSL Season 21
Stellar Fest: Constellation Cup
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual

Upcoming

SLON Tour Season 2
BSL 21 Non-Korean Championship
Acropolis #4
IPSL Spring 2026
HSC XXVIII
RSL Offline Finals
WardiTV 2025
RSL Revival: Season 3
META Madness #9
BLAST Bounty Winter 2026: Closed Qualifier
eXTREMESLAND 2025
ESL Impact League Season 8
SL Budapest Major 2025
BLAST Rivals Fall 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.