• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 03:52
CEST 09:52
KST 16:52
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Code S Season 1 - RO12 Group A: Rogue, Percival, Solar, Zoun5[ASL21] Ro8 Preview Pt1: Inheritors16[ASL21] Ro16 Preview Pt2: All Star10Team Liquid Map Contest #22 - The Finalists19[ASL21] Ro16 Preview Pt1: Fresh Flow9
Community News
2026 GSL Season 1 Qualifiers25Maestros of the Game 2 announced92026 GSL Tour plans announced15Weekly Cups (April 6-12): herO doubles, "Villains" prevail1MaNa leaves Team Liquid25
StarCraft 2
General
Code S Season 1 - RO12 Group A: Rogue, Percival, Solar, Zoun Blizzard Classic Cup @ BlizzCon 2026 - $100k prize pool Team Liquid Map Contest #22 - The Finalists MaNa leaves Team Liquid Maestros of the Game 2 announced
Tourneys
WardiTV Spring Cup 2026 GSL Season 1 Qualifiers Sparkling Tuna Cup - Weekly Open Tournament INu's Battles#14 <BO.9 2Matches> GSL CK: More events planned pending crowdfunding
Strategy
Custom Maps
[D]RTS in all its shapes and glory <3 [A] Nemrods 1/4 players [M] (2) Frigid Storage
External Content
The PondCast: SC2 News & Results Mutation # 523 Firewall Mutation # 522 Flip My Base Mutation # 521 Memorable Boss
Brood War
General
JaeDong's ASL S21 Ro16 Post-Review BW General Discussion Leta's ASL S21 Ro.16 review ASL21 General Discussion [ASL21] Ro8 Preview Pt1: Inheritors
Tourneys
[ASL21] Ro8 Day 2 [BSL22] RO16 Group Stage - 02 - 10 May [Megathread] Daily Proleagues [ASL21] Ro8 Day 1
Strategy
Fighting Spirit mining rates Simple Questions, Simple Answers What's the deal with APM & what's its true value Any training maps people recommend?
Other Games
General Games
Stormgate/Frost Giant Megathread Daigo vs Menard Best of 10 Nintendo Switch Thread Dawn of War IV Diablo IV
Dota 2
The Story of Wings Gaming
League of Legends
G2 just beat GenG in First stand
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia Mafia Game Mode Feedback/Ideas TL Mafia Community Thread Five o'clock TL Mafia
Community
General
US Politics Mega-thread European Politico-economics QA Mega-thread Russo-Ukrainian War Thread 3D technology/software discussion Canadian Politics Mega-thread
Fan Clubs
The IdrA Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece [Req][Books] Good Fantasy/SciFi books Movie Discussion!
Sports
2024 - 2026 Football Thread Formula 1 Discussion McBoner: A hockey love story
World Cup 2022
Tech Support
streaming software Strange computer issues (software) [G] How to Block Livestream Ads
TL Community
The Automated Ban List
Blogs
Sexual Health Of Gamers
TrAiDoS
lurker extra damage testi…
StaticNine
Broowar part 2
qwaykee
Funny Nicknames
LUCKY_NOOB
Iranian anarchists: organize…
XenOsky
Customize Sidebar...

Website Feedback

Closed Threads



Active: 2135 users

The Big Programming Thread - Page 644

Forum Index > General Forum
Post a Reply
Prev 1 642 643 644 645 646 1032 Next
Thread Rules
1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution.
2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20)
3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible.
4. Use [code] tags to format code blocks.
Slithe
Profile Blog Joined February 2007
United States985 Posts
June 23 2015 22:20 GMT
#12861
What kind of operations are you doing? 10k integers isn't that big, so I'm surprised that you're running into paging issues.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
Last Edited: 2015-06-24 03:26:38
June 24 2015 03:25 GMT
#12862
It's an assignment, Java is required ^^

It's a 100K by 100K dense graph I need to count triangles on. Multi-threading is easy to do because I can abuse a little bit of parallelism for what to parse at what time. But I need to implement a single-threaded solution that doesn't cramp up. Currently we split the edges in half to do triangle counting, but we have to do about 600M set contains() operations on essentially sets of ~100K integers, which Java chokes on because it can't keep all the edges in memory at the same time.

I just have no clue what to do to keep all the integers in memory so when I can call contains() on different sets of size 100K and not have it cramp up. The contains() operations are incredibly fast (<5ms), but having 600M of them in serial hurts.

Been trying to partition the data so they're potentially allocated closer together but the benefits are intangible/worse most of the time. I might have to just accept this is as fast as I'm going to be able to call contains() 600M times.

I'm more of a C guy so doing this in Java is much more work :p
There is no one like you in the universe.
phar
Profile Joined August 2011
United States1080 Posts
Last Edited: 2015-06-24 04:00:31
June 24 2015 03:59 GMT
#12863
Did you do an implementation in C and actually get faster results?

600 million 32 bit integers is like less than 3GB of RAM.

What is the actual algorithm you're using to count triangles? What data structures are you using?

How many edges do you actually have for 100K nodes? (Depending on the real-world application this may be only around 1M total edges, which you should easily be able to fit in memory in Java on a single machine).


Have you actually verified you're running out of memory? If so:

How are you invoking java? Are you giving the JVM enough RAM? For example a lot of default jvm will stick you at 1G max heap, if you're allocating >1G of objects you'll have a bad time.

try

java -XX:+PrintFlagsFinal -version | grep HeapSize


if your max heap size is tiny, try -Xmx to bump it up
Who after all is today speaking about the destruction of the Armenians?
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
Last Edited: 2015-06-24 13:22:00
June 24 2015 13:11 GMT
#12864
--- Nuked ---
Acrofales
Profile Joined August 2010
Spain18280 Posts
Last Edited: 2015-06-24 14:41:49
June 24 2015 14:40 GMT
#12865
On June 24 2015 12:25 Blisse wrote:
It's an assignment, Java is required ^^

It's a 100K by 100K dense graph I need to count triangles on. Multi-threading is easy to do because I can abuse a little bit of parallelism for what to parse at what time. But I need to implement a single-threaded solution that doesn't cramp up. Currently we split the edges in half to do triangle counting, but we have to do about 600M set contains() operations on essentially sets of ~100K integers, which Java chokes on because it can't keep all the edges in memory at the same time.

I just have no clue what to do to keep all the integers in memory so when I can call contains() on different sets of size 100K and not have it cramp up. The contains() operations are incredibly fast (<5ms), but having 600M of them in serial hurts.

Been trying to partition the data so they're potentially allocated closer together but the benefits are intangible/worse most of the time. I might have to just accept this is as fast as I'm going to be able to call contains() 600M times.

I'm more of a C guy so doing this in Java is much more work :p

Well, clearly you have to think about your data structure. You don't have lists of integers, you have a dense graph with 100K nodes. At least, that's what I understand from your 100K by 100K (which makes it sound more like a matrix, but presumably that is your matrix representation of the graph?) There are many ways of representing that, the most common being a graph datastructure and a matrix (basically a 2D array), both of which have some good Java libraries for manipulating.

However, given your task, I actually doubt your graph is dense. A fully connected graph of 100K nodes has C(100K, 3) triangles, or to be precise: 166,661,666,700,000 triangles in a fully connected graph of 100K nodes. Now obviously dense does not mean fully connected, just mostly connected. Which if you store this as a text file using 1 byte per triangle (unrealistic assumption, you will use more), that is still ~166TB to say what the triangles are. Now even if it is not fully connected, it only decreases linearly for a very long time, and for a densely connected graph you can count on a lower bound of about 100K^2 triangles, or using 1 byte per triangle, ~10GB of triangles.

In a real problem, these can be the kind of sizes you are looking at. However, in real problems graphs are usually sparsely connected (small world). I did a quick google and cannot actually find a real-world problem that is represented as a dense graph.

For a sparse graph you can use different datastructures and algorithms. While you can always represent it as a dense matrix (read: a matrix where the 0s are explicitly represented), it is horribly inefficient.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
Last Edited: 2015-06-24 16:24:54
June 24 2015 16:20 GMT
#12866
On June 24 2015 12:59 phar wrote:
Did you do an implementation in C and actually get faster results?

600 million 32 bit integers is like less than 3GB of RAM.

What is the actual algorithm you're using to count triangles? What data structures are you using?

How many edges do you actually have for 100K nodes? (Depending on the real-world application this may be only around 1M total edges, which you should easily be able to fit in memory in Java on a single machine).


Have you actually verified you're running out of memory? If so:

How are you invoking java? Are you giving the JVM enough RAM? For example a lot of default jvm will stick you at 1G max heap, if you're allocating >1G of objects you'll have a bad time.

try

java -XX:+PrintFlagsFinal -version | grep HeapSize


if your max heap size is tiny, try -Xmx to bump it up


600M contains() operations. We use about 7-8GBs of memory on the servers last time I checked, we might be at 5GB now.

Our input is the vertex and a space separated list of other vertices it is connected to (v1, v2) edge pair. There are 100K vertices and 10M edges.

We're not running out of memory (we probably will when we have to use the not-as-good servers, but that's for later).

The algorithm is, for each vertex, generate all size-2 ascending order combinations from the list of adjacent vertices by partitioning the list into less than the current vertex and greater than the current vertex. For each size-2 combination, check if the edge exists. If it exists, this is a unique triangle, otherwise not.

We store the left lower combinations in a 100K list of list of integers in sorted order (sorting takes <1s and gets us 15s faster due to locality). We store the right upper combinations in a 100K list of sets of integers by a hashset (so the contains operations can be as fast as possible). The triangle check is just a for loop over >600M iterations of calls to contains().

This is as efficient as we think we can be without any more assumptions about the graph layout.

I think 10M edges in a graph of 100K vertices is like 0.1% the total number of edges, which I assume can be considered dense. The reason I'm asking about this is because I believe I'm at the limits for using standard data structures in Java (only HashSets and sorted ArrayLists really), but I have no clue because I use C/C++ more. LinkedHashSets to get sorted ordering adds too much iteration overhead to get the locality benefits.
There is no one like you in the universe.
Acrofales
Profile Joined August 2010
Spain18280 Posts
June 24 2015 16:57 GMT
#12867
Sounds sparse to me. You only have 100 times as many edges as you do nodes. While you might make an argument for it being dense for a small world graph, any small world graph is a sparse graph.

If I understand your algorithm, what you do (conceptually) is:

for each node do:
neighbours = { v in node.getNeighbours() | v.id > node.id }
for each neighbour do:
seconds = { v in neighbour.getNeighbours() | v.id < neighbour.id }
for each second in seconds do:
if second in neighbours then:
add (node.id, neighbour.id, second.id) to triangles
endif
endfor
endfor


That seems like a pretty good approach. I can't think of any direct improvements.

All of these operations seem like they would be more efficient storing all the nodes in a hashmap with their id as key, and in each node you keep two lists of neighbours: those with ID > node.id and those with ID < node.id. The total number of neighbours is on average about 200 IDs long, making your entire hashmap about 100k * 200 + 100k = 20.1million integers + overhead, which should not be a problem at all. Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
June 24 2015 17:53 GMT
#12868
Yup that's basically our algorithm, though we already do the pre-processing. We have two 2D arrays, one array of arrays for the ID < node.id, one array of sets for ID > node.id. We originally had HashMaps instead of ArrayLists but it doesn't seem to have a significant impact for the outer array, no noticeable performance difference.

Might have to consider trying to store the < and > lists in a single class though instead of two lists, which might help a bit with locality...
There is no one like you in the universe.
Ropid
Profile Joined March 2009
Germany3557 Posts
June 24 2015 18:14 GMT
#12869
Is that ArrayList<Integer>? What's the difference between Integer and int? Would the 8GB turn into 2GB or something if it could be int instead of Integer?
"My goal is to replace my soul with coffee and become immortal."
spinesheath
Profile Blog Joined June 2009
Germany8679 Posts
June 24 2015 18:15 GMT
#12870
On June 25 2015 01:57 Acrofales wrote:
Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.

That doesn't seem quite right. Is it that way in Java? That would be an awful implementation of a hashtable.

If your performance issues come from cache misses because the data is too large (do you have a way to properly verify that?), you definitely should look into ways to split the data up into multiple chunks and process them individually. Even if you have to "overlap" the chunks and execute a lot more instructions because of that, it still should be significantly faster. Cache misses are very expensive in high performance computing scenarios.
If you have a good reason to disagree with the above, please tell me. Thank you.
Acrofales
Profile Joined August 2010
Spain18280 Posts
June 24 2015 18:24 GMT
#12871
On June 25 2015 02:53 Blisse wrote:
Yup that's basically our algorithm, though we already do the pre-processing. We have two 2D arrays, one array of arrays for the ID < node.id, one array of sets for ID > node.id. We originally had HashMaps instead of ArrayLists but it doesn't seem to have a significant impact for the outer array, no noticeable performance difference.

Might have to consider trying to store the < and > lists in a single class though instead of two lists, which might help a bit with locality...

But then I don't know why you are having problems with your contains() calls. On average, every node should have 200 edges (so even if it's scale-free, the average is 200), and that check should be negligible: each edge is checked at most twice, so you will have at most 2e contains calls (in practice far less). If you have a scale-free topology and are ending up doing your contains calls on the highly connected nodes all the times, switch your ordering (check the least-connected nodes first instead of the most-connected ones). In big-O complexity it makes no difference, but in practice it means you will have lots and lots and lots of small contains() calls, instead of a few large ones.

Acrofales
Profile Joined August 2010
Spain18280 Posts
Last Edited: 2015-06-24 18:47:19
June 24 2015 18:30 GMT
#12872
On June 25 2015 03:15 spinesheath wrote:
Show nested quote +
On June 25 2015 01:57 Acrofales wrote:
Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.

That doesn't seem quite right. Is it that way in Java? That would be an awful implementation of a hashtable.

If your performance issues come from cache misses because the data is too large (do you have a way to properly verify that?), you definitely should look into ways to split the data up into multiple chunks and process them individually. Even if you have to "overlap" the chunks and execute a lot more instructions because of that, it still should be significantly faster. Cache misses are very expensive in high performance computing scenarios.

Yeah. Whoops. It's O(1). And you can probably make it a faster O(1) by implementing your own map for integers, as Blisse seems to have done. However, the log n in the complexity is for checking the presence of "second" in the "neighbours" list. This is log n (assuming neighbours is sorted, as Blisse said it was), not for look up in the hashtable.
Acrofales
Profile Joined August 2010
Spain18280 Posts
Last Edited: 2015-06-24 18:38:55
June 24 2015 18:36 GMT
#12873
On June 25 2015 03:14 Ropid wrote:
Is that ArrayList<Integer>? What's the difference between Integer and int? Would the 8GB turn into 2GB or something if it could be int instead of Integer?

http://stackoverflow.com/questions/8661420/do-the-java-integer-and-double-objects-have-unnecessary-overhead

Integers have 12 bytes overhead (which, given that a 32-bit integers is 4 bytes, is quite significant). Now this seems to be referencing Java 1.3 (which is ancient), so perhaps it has changed.

Using a library (or building your own datastructure) that allows for the use of primitives might very well make a difference. However, it shouldn't be 2GB in the first place for the size of graph he is talking about. So doing more in-depth profiling of what is eating all the memory and where the performance issues are coming from seems necessary before scrapping the standard library.
phar
Profile Joined August 2011
United States1080 Posts
June 25 2015 03:52 GMT
#12874
Ok back up, I am confused. You are or aren't running out of memory? The initial post mentioned something about your data set getting paged out, yet you say you have only 10M edges, which should trivially fit in memory (like honestly 10M int is only 40MB or something). Where are you getting 8GB from??



What's the distribution of your dataset like? Is it uniform, or following a more real-world (i.e. power-law) distribution? If the latter, there's a rather large number of better algorithms you could use.
Who after all is today speaking about the destruction of the Armenians?
RoyGBiv_13
Profile Blog Joined August 2010
United States1275 Posts
June 25 2015 04:08 GMT
#12875
On June 24 2015 04:18 sabas123 wrote:
Show nested quote +
On June 23 2015 09:53 RoyGBiv_13 wrote:
A dynamically sized buffer or string can be declared on the stack, if it is small (my rule of thumb is under 100 bytes or 1 line in a file), or placed on the heap if it is large. It should always be passed by reference.

You mean if the buffer is big you want to pass it by reference right?

Nice guide btw, I enjoyed it alot.


The data has to live somewhere on the system. You can declare it on the stack in say, main(), then pass by reference to helper functions to modify/generate/use the data. Once you have the buffer placed in memory, you should almost always pass by reference, since you need the address of the buffer anyway to iterate through it.
Any sufficiently advanced technology is indistinguishable from magic
SixStrings
Profile Blog Joined August 2013
Germany2046 Posts
June 25 2015 09:45 GMT
#12876
Hey girls,

I probably have no business posting here, but here goes:

I just finished the Khan course "Introduction to Javascript" and I enjoyed it in a brain-teaser / puzzle kind of way, so I want to get deeper into programming. There's a Python course starting in October, what could I do in the meanwhile?
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
June 25 2015 11:19 GMT
#12877
--- Nuked ---
Isualin
Profile Joined March 2011
Germany1903 Posts
June 25 2015 11:37 GMT
#12878
I am really interested in procedural generation of mazes and cities after playing dozens of roguelikes. I checked out /r/proceduralgeneration and this stackoverflow link. Visualization part looks really hard to me because i have no experience in shaders, 3d rendering or directx/opengl.(I work as fullstack web-developer) Can anyone recommend me some stuff to work on?
| INnoVation | The literal god TY | ByuNjwa | LRSL when? |
Acrofales
Profile Joined August 2010
Spain18280 Posts
June 25 2015 13:32 GMT
#12879
On June 25 2015 18:45 SixStrings wrote:
Hey girls,

I probably have no business posting here, but here goes:

I just finished the Khan course "Introduction to Javascript" and I enjoyed it in a brain-teaser / puzzle kind of way, so I want to get deeper into programming. There's a Python course starting in October, what could I do in the meanwhile?

Hey cutie,

Well shucks, I think you should probably stick to embroidery or crotcheting. But if you actually want advice, how about you start by showing respect to the community, and use a gender neutral address. If not, then address the vast majority of people here, which are almost certainly going to be guys: you are on a gaming website dedicated to Starcraft (don't remember the stats, but I believe it was about 90% male players), and in the programming thread (my CS course had about a 95% male population): combine the two populations and the only girl here is the one giving you a blowjob while she's phoning her boyfriend + Show Spoiler +
Dating how's your luck thread never forgets
.
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
Last Edited: 2015-06-25 13:47:49
June 25 2015 13:47 GMT
#12880
--- Nuked ---
Prev 1 642 643 644 645 646 1032 Next
Please log in or register to reply.
Live Events Refresh
Next event in 1h 38m
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Nina 194
ProTech140
StarCraft: Brood War
Calm 8433
GuemChi 3056
Zeus 409
Hyuk 294
Dewaltoss 79
Aegong 60
sSak 54
Shinee 54
soO 48
Bale 24
[ Show more ]
Noble 19
ZergMaN 8
Counter-Strike
Stewie2K1032
shoxiejesuss843
olofmeister247
allub238
Super Smash Bros
Mew2King101
Other Games
summit1g5847
C9.Mang0380
Happy230
Livibee66
amsayoshi31
Organizations
Other Games
gamesdonequick743
Dota 2
PGL Dota 2 - Main Stream119
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
[ Show 12 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• iopq 5
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
League of Legends
• Stunt580
Upcoming Events
GSL
1h 38m
Rogue vs Percival
Zoun vs Solar
Replay Cast
16h 8m
GSL
1d 1h
Cure vs TriGGeR
ByuN vs Bunny
KCM Race Survival
1d 2h
Big Gabe
1d 4h
Replay Cast
1d 16h
Replay Cast
2 days
Escore
2 days
OSC
2 days
Replay Cast
2 days
[ Show More ]
Replay Cast
3 days
RSL Revival
3 days
IPSL
3 days
Ret vs Art_Of_Turtle
Radley vs TBD
BSL
3 days
Replay Cast
3 days
RSL Revival
4 days
uThermal 2v2 Circuit
4 days
BSL
4 days
IPSL
4 days
eOnzErG vs TBD
G5 vs Nesh
Replay Cast
5 days
Wardi Open
5 days
Afreeca Starleague
5 days
Jaedong vs Light
Monday Night Weeklies
5 days
Replay Cast
5 days
Sparkling Tuna Cup
6 days
Afreeca Starleague
6 days
Snow vs Flash
Liquipedia Results

Completed

Proleague 2026-04-28
WardiTV TLMC #16
Nations Cup 2026

Ongoing

BSL Season 22
ASL Season 21
CSL 2026 SPRING (S20)
IPSL Spring 2026
KCM Race Survival 2026 Season 2
StarCraft2 Community Team League 2026 Spring
2026 GSL S1
BLAST Rivals Spring 2026
IEM Rio 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026
ESL Pro League S23 Finals
ESL Pro League S23 Stage 1&2
PGL Cluj-Napoca 2026

Upcoming

Escore Tournament S2: W5
KK 2v2 League Season 1
Acropolis #4
BSL 22 Non-Korean Championship
CSLAN 4
Kung Fu Cup 2026 Grand Finals
HSC XXIX
uThermal 2v2 2026 Main Event
Maestros of the Game 2
2026 GSL S2
RSL Revival: Season 5
XSE Pro League 2026
IEM Cologne Major 2026
Stake Ranked Episode 2
CS Asia Championships 2026
IEM Atlanta 2026
Asian Champions League 2026
PGL Astana 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.