• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 00:11
CEST 06:11
KST 13:11
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL19] Finals Recap: Standing Tall9HomeStory Cup 27 - Info & Preview18Classic wins Code S Season 2 (2025)16Code S RO4 & Finals Preview: herO, Rogue, Classic, GuMiho0TL Team Map Contest #5: Presented by Monster Energy6
Community News
Flash Announces Hiatus From ASL60Weekly Cups (June 23-29): Reynor in world title form?13FEL Cracov 2025 (July 27) - $8000 live event19Esports World Cup 2025 - Final Player Roster16Weekly Cups (June 16-22): Clem strikes back1
StarCraft 2
General
Program: SC2 / XSplit / OBS Scene Switcher Statistics for vetoed/disliked maps The SCII GOAT: A statistical Evaluation Weekly Cups (June 23-29): Reynor in world title form? PiG Sty Festival #5: Playoffs Preview + Groups Recap
Tourneys
FEL Cracov 2025 (July 27) - $8000 live event RSL: Revival, a new crowdfunded tournament series Korean Starcraft League Week 77 Master Swan Open (Global Bronze-Master 2) [GSL 2025] Code S: Season 2 - Semi Finals & Finals
Strategy
How did i lose this ZvP, whats the proper response Simple Questions Simple Answers
Custom Maps
[UMS] Zillion Zerglings
External Content
Mutation # 480 Moths to the Flame Mutation # 479 Worn Out Welcome Mutation # 478 Instant Karma Mutation # 477 Slow and Steady
Brood War
General
Player “Jedi” cheat on CSL ASL20 Preliminary Maps SC uni coach streams logging into betting site Flash Announces Hiatus From ASL BGH Mineral Boosts Tutorial Video
Tourneys
[BSL20] Grand Finals - Sunday 20:00 CET [Megathread] Daily Proleagues Small VOD Thread 2.0 [BSL20] GosuLeague RO16 - Tue & Wed 20:00+CET
Strategy
Simple Questions, Simple Answers I am doing this better than progamers do.
Other Games
General Games
Stormgate/Frost Giant Megathread Nintendo Switch Thread Path of Exile What do you want from future RTS games? Beyond All Reason
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Vanilla Mini Mafia
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread Trading/Investing Thread Things Aren’t Peaceful in Palestine The Games Industry And ATVI
Fan Clubs
SKT1 Classic Fan Club! Maru Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece [\m/] Heavy Metal Thread
Sports
2024 - 2025 Football Thread Formula 1 Discussion NBA General Discussion TeamLiquid Health and Fitness Initiative For 2023 NHL Playoffs 2024
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread
TL Community
Blogs
Culture Clash in Video Games…
TrAiDoS
from making sc maps to makin…
Husyelt
Blog #2
tankgirl
StarCraft improvement
iopq
Trip to the Zoo
micronesia
Customize Sidebar...

Website Feedback

Closed Threads



Active: 540 users

The Big Programming Thread - Page 644

Forum Index > General Forum
Post a Reply
Prev 1 642 643 644 645 646 1031 Next
Thread Rules
1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution.
2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20)
3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible.
4. Use [code] tags to format code blocks.
Slithe
Profile Blog Joined February 2007
United States985 Posts
June 23 2015 22:20 GMT
#12861
What kind of operations are you doing? 10k integers isn't that big, so I'm surprised that you're running into paging issues.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
Last Edited: 2015-06-24 03:26:38
June 24 2015 03:25 GMT
#12862
It's an assignment, Java is required ^^

It's a 100K by 100K dense graph I need to count triangles on. Multi-threading is easy to do because I can abuse a little bit of parallelism for what to parse at what time. But I need to implement a single-threaded solution that doesn't cramp up. Currently we split the edges in half to do triangle counting, but we have to do about 600M set contains() operations on essentially sets of ~100K integers, which Java chokes on because it can't keep all the edges in memory at the same time.

I just have no clue what to do to keep all the integers in memory so when I can call contains() on different sets of size 100K and not have it cramp up. The contains() operations are incredibly fast (<5ms), but having 600M of them in serial hurts.

Been trying to partition the data so they're potentially allocated closer together but the benefits are intangible/worse most of the time. I might have to just accept this is as fast as I'm going to be able to call contains() 600M times.

I'm more of a C guy so doing this in Java is much more work :p
There is no one like you in the universe.
phar
Profile Joined August 2011
United States1080 Posts
Last Edited: 2015-06-24 04:00:31
June 24 2015 03:59 GMT
#12863
Did you do an implementation in C and actually get faster results?

600 million 32 bit integers is like less than 3GB of RAM.

What is the actual algorithm you're using to count triangles? What data structures are you using?

How many edges do you actually have for 100K nodes? (Depending on the real-world application this may be only around 1M total edges, which you should easily be able to fit in memory in Java on a single machine).


Have you actually verified you're running out of memory? If so:

How are you invoking java? Are you giving the JVM enough RAM? For example a lot of default jvm will stick you at 1G max heap, if you're allocating >1G of objects you'll have a bad time.

try

java -XX:+PrintFlagsFinal -version | grep HeapSize


if your max heap size is tiny, try -Xmx to bump it up
Who after all is today speaking about the destruction of the Armenians?
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
Last Edited: 2015-06-24 13:22:00
June 24 2015 13:11 GMT
#12864
--- Nuked ---
Acrofales
Profile Joined August 2010
Spain17970 Posts
Last Edited: 2015-06-24 14:41:49
June 24 2015 14:40 GMT
#12865
On June 24 2015 12:25 Blisse wrote:
It's an assignment, Java is required ^^

It's a 100K by 100K dense graph I need to count triangles on. Multi-threading is easy to do because I can abuse a little bit of parallelism for what to parse at what time. But I need to implement a single-threaded solution that doesn't cramp up. Currently we split the edges in half to do triangle counting, but we have to do about 600M set contains() operations on essentially sets of ~100K integers, which Java chokes on because it can't keep all the edges in memory at the same time.

I just have no clue what to do to keep all the integers in memory so when I can call contains() on different sets of size 100K and not have it cramp up. The contains() operations are incredibly fast (<5ms), but having 600M of them in serial hurts.

Been trying to partition the data so they're potentially allocated closer together but the benefits are intangible/worse most of the time. I might have to just accept this is as fast as I'm going to be able to call contains() 600M times.

I'm more of a C guy so doing this in Java is much more work :p

Well, clearly you have to think about your data structure. You don't have lists of integers, you have a dense graph with 100K nodes. At least, that's what I understand from your 100K by 100K (which makes it sound more like a matrix, but presumably that is your matrix representation of the graph?) There are many ways of representing that, the most common being a graph datastructure and a matrix (basically a 2D array), both of which have some good Java libraries for manipulating.

However, given your task, I actually doubt your graph is dense. A fully connected graph of 100K nodes has C(100K, 3) triangles, or to be precise: 166,661,666,700,000 triangles in a fully connected graph of 100K nodes. Now obviously dense does not mean fully connected, just mostly connected. Which if you store this as a text file using 1 byte per triangle (unrealistic assumption, you will use more), that is still ~166TB to say what the triangles are. Now even if it is not fully connected, it only decreases linearly for a very long time, and for a densely connected graph you can count on a lower bound of about 100K^2 triangles, or using 1 byte per triangle, ~10GB of triangles.

In a real problem, these can be the kind of sizes you are looking at. However, in real problems graphs are usually sparsely connected (small world). I did a quick google and cannot actually find a real-world problem that is represented as a dense graph.

For a sparse graph you can use different datastructures and algorithms. While you can always represent it as a dense matrix (read: a matrix where the 0s are explicitly represented), it is horribly inefficient.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
Last Edited: 2015-06-24 16:24:54
June 24 2015 16:20 GMT
#12866
On June 24 2015 12:59 phar wrote:
Did you do an implementation in C and actually get faster results?

600 million 32 bit integers is like less than 3GB of RAM.

What is the actual algorithm you're using to count triangles? What data structures are you using?

How many edges do you actually have for 100K nodes? (Depending on the real-world application this may be only around 1M total edges, which you should easily be able to fit in memory in Java on a single machine).


Have you actually verified you're running out of memory? If so:

How are you invoking java? Are you giving the JVM enough RAM? For example a lot of default jvm will stick you at 1G max heap, if you're allocating >1G of objects you'll have a bad time.

try

java -XX:+PrintFlagsFinal -version | grep HeapSize


if your max heap size is tiny, try -Xmx to bump it up


600M contains() operations. We use about 7-8GBs of memory on the servers last time I checked, we might be at 5GB now.

Our input is the vertex and a space separated list of other vertices it is connected to (v1, v2) edge pair. There are 100K vertices and 10M edges.

We're not running out of memory (we probably will when we have to use the not-as-good servers, but that's for later).

The algorithm is, for each vertex, generate all size-2 ascending order combinations from the list of adjacent vertices by partitioning the list into less than the current vertex and greater than the current vertex. For each size-2 combination, check if the edge exists. If it exists, this is a unique triangle, otherwise not.

We store the left lower combinations in a 100K list of list of integers in sorted order (sorting takes <1s and gets us 15s faster due to locality). We store the right upper combinations in a 100K list of sets of integers by a hashset (so the contains operations can be as fast as possible). The triangle check is just a for loop over >600M iterations of calls to contains().

This is as efficient as we think we can be without any more assumptions about the graph layout.

I think 10M edges in a graph of 100K vertices is like 0.1% the total number of edges, which I assume can be considered dense. The reason I'm asking about this is because I believe I'm at the limits for using standard data structures in Java (only HashSets and sorted ArrayLists really), but I have no clue because I use C/C++ more. LinkedHashSets to get sorted ordering adds too much iteration overhead to get the locality benefits.
There is no one like you in the universe.
Acrofales
Profile Joined August 2010
Spain17970 Posts
June 24 2015 16:57 GMT
#12867
Sounds sparse to me. You only have 100 times as many edges as you do nodes. While you might make an argument for it being dense for a small world graph, any small world graph is a sparse graph.

If I understand your algorithm, what you do (conceptually) is:

for each node do:
neighbours = { v in node.getNeighbours() | v.id > node.id }
for each neighbour do:
seconds = { v in neighbour.getNeighbours() | v.id < neighbour.id }
for each second in seconds do:
if second in neighbours then:
add (node.id, neighbour.id, second.id) to triangles
endif
endfor
endfor


That seems like a pretty good approach. I can't think of any direct improvements.

All of these operations seem like they would be more efficient storing all the nodes in a hashmap with their id as key, and in each node you keep two lists of neighbours: those with ID > node.id and those with ID < node.id. The total number of neighbours is on average about 200 IDs long, making your entire hashmap about 100k * 200 + 100k = 20.1million integers + overhead, which should not be a problem at all. Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.
Blisse
Profile Blog Joined July 2010
Canada3710 Posts
June 24 2015 17:53 GMT
#12868
Yup that's basically our algorithm, though we already do the pre-processing. We have two 2D arrays, one array of arrays for the ID < node.id, one array of sets for ID > node.id. We originally had HashMaps instead of ArrayLists but it doesn't seem to have a significant impact for the outer array, no noticeable performance difference.

Might have to consider trying to store the < and > lists in a single class though instead of two lists, which might help a bit with locality...
There is no one like you in the universe.
Ropid
Profile Joined March 2009
Germany3557 Posts
June 24 2015 18:14 GMT
#12869
Is that ArrayList<Integer>? What's the difference between Integer and int? Would the 8GB turn into 2GB or something if it could be int instead of Integer?
"My goal is to replace my soul with coffee and become immortal."
spinesheath
Profile Blog Joined June 2009
Germany8679 Posts
June 24 2015 18:15 GMT
#12870
On June 25 2015 01:57 Acrofales wrote:
Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.

That doesn't seem quite right. Is it that way in Java? That would be an awful implementation of a hashtable.

If your performance issues come from cache misses because the data is too large (do you have a way to properly verify that?), you definitely should look into ways to split the data up into multiple chunks and process them individually. Even if you have to "overlap" the chunks and execute a lot more instructions because of that, it still should be significantly faster. Cache misses are very expensive in high performance computing scenarios.
If you have a good reason to disagree with the above, please tell me. Thank you.
Acrofales
Profile Joined August 2010
Spain17970 Posts
June 24 2015 18:24 GMT
#12871
On June 25 2015 02:53 Blisse wrote:
Yup that's basically our algorithm, though we already do the pre-processing. We have two 2D arrays, one array of arrays for the ID < node.id, one array of sets for ID > node.id. We originally had HashMaps instead of ArrayLists but it doesn't seem to have a significant impact for the outer array, no noticeable performance difference.

Might have to consider trying to store the < and > lists in a single class though instead of two lists, which might help a bit with locality...

But then I don't know why you are having problems with your contains() calls. On average, every node should have 200 edges (so even if it's scale-free, the average is 200), and that check should be negligible: each edge is checked at most twice, so you will have at most 2e contains calls (in practice far less). If you have a scale-free topology and are ending up doing your contains calls on the highly connected nodes all the times, switch your ordering (check the least-connected nodes first instead of the most-connected ones). In big-O complexity it makes no difference, but in practice it means you will have lots and lots and lots of small contains() calls, instead of a few large ones.

Acrofales
Profile Joined August 2010
Spain17970 Posts
Last Edited: 2015-06-24 18:47:19
June 24 2015 18:30 GMT
#12872
On June 25 2015 03:15 spinesheath wrote:
Show nested quote +
On June 25 2015 01:57 Acrofales wrote:
Lookup in a hashtable is O(log n), so total time complexity for the algorithm this way is O(n * e * log n), which seems about as good as you can get it.

That doesn't seem quite right. Is it that way in Java? That would be an awful implementation of a hashtable.

If your performance issues come from cache misses because the data is too large (do you have a way to properly verify that?), you definitely should look into ways to split the data up into multiple chunks and process them individually. Even if you have to "overlap" the chunks and execute a lot more instructions because of that, it still should be significantly faster. Cache misses are very expensive in high performance computing scenarios.

Yeah. Whoops. It's O(1). And you can probably make it a faster O(1) by implementing your own map for integers, as Blisse seems to have done. However, the log n in the complexity is for checking the presence of "second" in the "neighbours" list. This is log n (assuming neighbours is sorted, as Blisse said it was), not for look up in the hashtable.
Acrofales
Profile Joined August 2010
Spain17970 Posts
Last Edited: 2015-06-24 18:38:55
June 24 2015 18:36 GMT
#12873
On June 25 2015 03:14 Ropid wrote:
Is that ArrayList<Integer>? What's the difference between Integer and int? Would the 8GB turn into 2GB or something if it could be int instead of Integer?

http://stackoverflow.com/questions/8661420/do-the-java-integer-and-double-objects-have-unnecessary-overhead

Integers have 12 bytes overhead (which, given that a 32-bit integers is 4 bytes, is quite significant). Now this seems to be referencing Java 1.3 (which is ancient), so perhaps it has changed.

Using a library (or building your own datastructure) that allows for the use of primitives might very well make a difference. However, it shouldn't be 2GB in the first place for the size of graph he is talking about. So doing more in-depth profiling of what is eating all the memory and where the performance issues are coming from seems necessary before scrapping the standard library.
phar
Profile Joined August 2011
United States1080 Posts
June 25 2015 03:52 GMT
#12874
Ok back up, I am confused. You are or aren't running out of memory? The initial post mentioned something about your data set getting paged out, yet you say you have only 10M edges, which should trivially fit in memory (like honestly 10M int is only 40MB or something). Where are you getting 8GB from??



What's the distribution of your dataset like? Is it uniform, or following a more real-world (i.e. power-law) distribution? If the latter, there's a rather large number of better algorithms you could use.
Who after all is today speaking about the destruction of the Armenians?
RoyGBiv_13
Profile Blog Joined August 2010
United States1275 Posts
June 25 2015 04:08 GMT
#12875
On June 24 2015 04:18 sabas123 wrote:
Show nested quote +
On June 23 2015 09:53 RoyGBiv_13 wrote:
A dynamically sized buffer or string can be declared on the stack, if it is small (my rule of thumb is under 100 bytes or 1 line in a file), or placed on the heap if it is large. It should always be passed by reference.

You mean if the buffer is big you want to pass it by reference right?

Nice guide btw, I enjoyed it alot.


The data has to live somewhere on the system. You can declare it on the stack in say, main(), then pass by reference to helper functions to modify/generate/use the data. Once you have the buffer placed in memory, you should almost always pass by reference, since you need the address of the buffer anyway to iterate through it.
Any sufficiently advanced technology is indistinguishable from magic
SixStrings
Profile Blog Joined August 2013
Germany2046 Posts
June 25 2015 09:45 GMT
#12876
Hey girls,

I probably have no business posting here, but here goes:

I just finished the Khan course "Introduction to Javascript" and I enjoyed it in a brain-teaser / puzzle kind of way, so I want to get deeper into programming. There's a Python course starting in October, what could I do in the meanwhile?
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
June 25 2015 11:19 GMT
#12877
--- Nuked ---
Isualin
Profile Joined March 2011
Germany1903 Posts
June 25 2015 11:37 GMT
#12878
I am really interested in procedural generation of mazes and cities after playing dozens of roguelikes. I checked out /r/proceduralgeneration and this stackoverflow link. Visualization part looks really hard to me because i have no experience in shaders, 3d rendering or directx/opengl.(I work as fullstack web-developer) Can anyone recommend me some stuff to work on?
| INnoVation | The literal god TY | ByuNjwa | LRSL when? |
Acrofales
Profile Joined August 2010
Spain17970 Posts
June 25 2015 13:32 GMT
#12879
On June 25 2015 18:45 SixStrings wrote:
Hey girls,

I probably have no business posting here, but here goes:

I just finished the Khan course "Introduction to Javascript" and I enjoyed it in a brain-teaser / puzzle kind of way, so I want to get deeper into programming. There's a Python course starting in October, what could I do in the meanwhile?

Hey cutie,

Well shucks, I think you should probably stick to embroidery or crotcheting. But if you actually want advice, how about you start by showing respect to the community, and use a gender neutral address. If not, then address the vast majority of people here, which are almost certainly going to be guys: you are on a gaming website dedicated to Starcraft (don't remember the stats, but I believe it was about 90% male players), and in the programming thread (my CS course had about a 95% male population): combine the two populations and the only girl here is the one giving you a blowjob while she's phoning her boyfriend + Show Spoiler +
Dating how's your luck thread never forgets
.
Nesserev
Profile Blog Joined January 2011
Belgium2760 Posts
Last Edited: 2015-06-25 13:47:49
June 25 2015 13:47 GMT
#12880
--- Nuked ---
Prev 1 642 643 644 645 646 1031 Next
Please log in or register to reply.
Live Events Refresh
Next event in 5h 49m
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
RuFF_SC2 182
ROOTCatZ 80
StarCraft: Brood War
Leta 856
Icarus 8
LuMiX 4
Dota 2
NeuroSwarm179
League of Legends
JimRising 844
Heroes of the Storm
Khaldor164
Other Games
summit1g10240
tarik_tv5806
WinterStarcraft285
ViBE224
ProTech42
SortOf0
Organizations
Other Games
BasetradeTV50
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 19 non-featured ]
StarCraft 2
• Berry_CruncH303
• Hupsaiya 79
• davetesta61
• practicex 29
• Kozan
• Migwel
• sooper7s
• AfreecaTV YouTube
• intothetv
• IndyKCrew
• LaughNgamezSOOP
StarCraft: Brood War
• Pr0nogo 3
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
League of Legends
• Lourlo777
• masondota2748
• Stunt397
Other Games
• Scarra3168
Upcoming Events
RSL Revival
5h 49m
Clem vs Classic
SHIN vs Cure
FEL
7h 49m
WardiTV European League
7h 49m
BSL: ProLeague
13h 49m
Dewalt vs Bonyth
Replay Cast
1d 19h
Sparkling Tuna Cup
2 days
WardiTV European League
2 days
The PondCast
3 days
Replay Cast
3 days
RSL Revival
4 days
[ Show More ]
Replay Cast
4 days
RSL Revival
5 days
FEL
5 days
RSL Revival
6 days
FEL
6 days
FEL
6 days
Liquipedia Results

Completed

BSL 2v2 Season 3
HSC XXVII
Heroes 10 EU

Ongoing

JPL Season 2
BSL Season 20
Acropolis #3
KCM Race Survival 2025 Season 2
CSL 17: 2025 SUMMER
Copa Latinoamericana 4
Championship of Russia 2025
RSL Revival: Season 1
Murky Cup #2
BLAST.tv Austin Major 2025
ESL Impact League Season 7
IEM Dallas 2025
PGL Astana 2025
Asian Champions League '25
BLAST Rivals Spring 2025
MESA Nomadic Masters
CCT Season 2 Global Finals
IEM Melbourne 2025

Upcoming

2025 ACS Season 2: Qualifier
CSLPRO Last Chance 2025
2025 ACS Season 2
CSLPRO Chat StarLAN 3
K-Championship
uThermal 2v2 Main Event
SEL Season 2 Championship
FEL Cracov 2025
Esports World Cup 2025
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.