The Big Programming Thread - Page 182

Prev 1 180 181 182 183 184 1031 Next

Thread Rules
1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution.
2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20)
3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible.
4. Use [code] tags to format code blocks.

Craton

United States17247 Posts

October 26 2012 03:40 GMT

#3621

On October 25 2012 15:22 phar wrote:

Show nested quote +

On October 25 2012 08:10 Craton wrote:
Technologies involved:
Oracle 10g
Java 1.4.2
Perl

Background:
+ Show Spoiler +

I work as a contractor for a government institution (most accurately, my company is a contractor). For the work I do, we receive large data files of real data (i.e. real information about people), which we then use to modify (for testing purposes, i.e. "dirtying" up the data) or to manufacture additional test data. The data is all kept within the same division and on secure servers, so the realness of it is not an issue.

More about the files:
These are ASCII files with very large line lengths (biggest so far has been 60,000 characters a line, though there is one more type of file that I believe is worse). Additionally, these are all fixed length and padded with spaces. The meaning of these characters is derived from their position on each line. Example, someone's first name might be found at positions 500 through 550. If it's shifted at all, everything is ruined.

Because these are completely non-normalized flat files, we have over 5,000 "columns" worth of data.

Given the brief span of time these are in our system, we do not make an effort to normalize them, as they go back out the same way as they came in (just with different characters).

Oracle has a column limit of 1,000 per table. Sensible if you have a properly normalized (or even remotely normalized...) database, but this isn't. Our current implementation is basically to make multiple tables (each with ~900 columns) until all the data has been read in. There is a column that maintains the line number of each record that came in, which can then be used to join everything together when it's time to kick it back out.

Amusingly, the UTL_FILE of Oracle that is used to write tables to files has a character limit of 32767 bytes. We were never able to get around this limit, even by breaking things up and trying to do puts (over putline), flushing buffers, manually adding carriage returns, etc). Eventually, we went another route; instead of trying to output all the tables as one file, we output each related table as a separate file and then generate a Perl script which joins them (the server runs on Unix).

This works, but it isn't very elegant. There was a lot of headache involved in even getting this far, including getting permissions from the DBAs and Unix admins.

Hoping to improve on this, I wrote up a Java stored procedure that tries to accomplish the original goal: output everything at once, nice and neat. For those who don't know, you can run Java classes through Oracle as Stored Procedures.

I got this Java method working finally, but it's very slow. It grinds to a snails pace when trying to handle the fully joined resultset (i.e. 100,000 records of ~6,000 columns comprising of ~60,000 characters per record)

.

My question is simply, what, if anything, can be done to make this thing outperform my initial method. I suspect it might not be possible.

In any event, my program logic is pretty simple. It uses JDBC to grab a resultset via a SQL statement that's passed in (which is really just select * from all the tables). It then iterates each record. Within each record, it iterates each column.
Our control fields are all numerics and those get stripped out when iterating the resultset (i.e. if the column type <> CHAR/VARCHAR, go to the next field). NULL fields get padded with blanks, such that a null field in a column width of 10 would become 10 blanks. This is necessary to keep everything fixed length as intended.

Beyond that, there really isn't much logic. Any ideas?

The only thing I've come up with would be to replicate the Oracle tables and then basically update every single column programmatically to replace nulls with padded blanks (so I wouldn't have to iterate/process so much in Java) but I don't know if I'm going to get any performance increase.

Cool

Couple of clarifying questions:

1) What's the overall task you need help on here? I wasn't hugely clear on that by the time I got done reading your blurb. Do you need a better way to do the whole flat file -> X -> flat file computation? Or do you simply need the flat file -> X format (settled on oracle?), and then someone else is handling the later parts? Or are you doing only X -> flat file, where something else is handling the first part? How much of the existing implementation can you change?

Ideally we move everything to a single Java procedure that is at least as performant as the whole spitting out of multiple files and then joining via Perl. It will make things easier to use and maintain. Right now, the Java implementation SUCKS. It's orders of magnitude slower.

2) (The relevance of this depends on the answer to #1, so feel free to ignore if not applicable) Can you use technologies other than the ones listed at the start?

Unlikely. Oracle is locked in stone and the data transformations and what not are all done inside Oracle. It's just getting it out of Oracle and into an ASCII file that's the issue.

~60GiB (100k records * 60k bytes / record) is really not much data. Can you just throw more hardware at the problem (assuming we clear up ? If you can use other technologies, I'd honestly just say throw it into AWS.

Size on disk is not an issue. File compression reaches 99% and reduces things to a trivial size.

3) What state is necessary to know what computations you need to do on a given row? (i.e. is each row independent, or do you need to do operations on a single row that depend on other rows? If so, what rows?)

At the point where it's being pushed out, each row has no inter-row dependencies, but the order must be maintained. Line one of the incoming file must still be line one of the outgoing file, as with line two and so forth.

The records never affect other records, frankly. Everything is self-contained. However, line 10 on table#1 may effect line 10 on table#2 (as they are essentially 1 very large record spread across multiple tables). This has not yet been an issue, but may be in the future. We are also handling a redesign of a large system of theirs (strongly related to this), so that might be done before such a problem does arise.

4) What state is necessary to know what computations you need to do within a single row? (dependent on other entries? if so, which ones?)

As mentioned above, there can be a LOT of intermingling between variables in different columns. It's too much to list, but there is business logic for all of this. Everything from if field 1 has a value then field 2 becomes X to field 1 2 3 and 4 are translated to some new values based on some logic.

Current business process is simply to migrate data from one layout of the table to another, which basically involves creating both table structures and then inserting / updating from one side to the other. We have various ways of mapping fields from one version to another.

And finally, having been previously forced to adhere to strange data formats for the gov't, my condolences.

I find it amusing more than annoying most days.

------------------

As a related topic, anyone have experience with database modeling in JDeveloper? I would like to format the table columns like how Visio does.

Example of how Visio does it:

-------------------------
     SOME_TABLE
-------------------------
PK | PID_FIELD
FK | FID_FIELD
-------------------------
   | SOME_FIELD

Also, non-null fields are bolded.

With JDev, the PK/FKs are placed separately beneath the columns formatted as:
«PK» NAME_OF_PK (COL1, COLm, ... COLn)

-------------------------------
     SOME_TABLE
-------------------------------
PID_FIELD
FID_FIELD
SOME_FIELD
-------------------------------
«PK» SOME_TABLE_PK1 (PID_FIELD)
«FK» SOME_TABLE_FK1 (FID_FIELD)

I've only managed to hide the column names and keep the PK/FK name, which is the opposite of what I want. I haven't figured any way to format individual columns. Google is no help.

phar

United States1080 Posts

October 26 2012 05:56 GMT

#3622

On October 26 2012 12:40 Craton wrote:Size on disk is not an issue. File compression reaches 99% and reduces things to a trivial size.

Sorry, I should have been more specific. I meant RAM. 60GiB is small enough that you can cheaply hold everything in memory, at which point you can do some interesting things. However based on your other answers this appears to be completely moot - if you have to load everything into Oracle :\

If you don't HAVE to use Oracle:

+ Show Spoiler +

On October 26 2012 12:40 Craton wrote:At the point where it's being pushed out, each row has no inter-row dependencies, but the order must be maintained. Line one of the incoming file must still be line one of the outgoing file, as with line two and so forth.

The records never affect other records, frankly. Everything is self-contained. However, line 10 on table#1 may effect line 10 on table#2 (as they are essentially 1 very large record spread across multiple tables). This has not yet been an issue, but may be in the future. We are also handling a redesign of a large system of theirs (strongly related to this), so that might be done before such a problem does arise.

As mentioned above, there can be a LOT of intermingling between variables in different columns. It's too much to list, but there is business logic for all of this. Everything from if field 1 has a value then field 2 becomes X to field 1 2 3 and 4 are translated to some new values based on some logic.

Current business process is simply to migrate data from one layout of the table to another, which basically involves creating both table structures and then inserting / updating from one side to the other. We have various ways of mapping fields from one version to another.

No inter row dependencies at all?

I'm wondering why you need to load it into a DB at all. A single record is really easy to hold in memory, why not just do the entire computation with no db? If you can do this in a literally 100% streaming fashion (read in one row, output row after doing computation once), you should be able to make this roughly as fast as the I/O of reading from disk, plus time for maintaining order.

Now that I understand the entire problem end to end, it feels like an embarrassingly parallel one.

The fact that you have to maintain initial order makes it a little bit tricky, but it should still be doable.

Is the data classified at all? It may be worth your time to cook up some quick and dirty parallel implementation using AWS or what have you. Better yet, if you have your own resources (boxes), dump the entire thing into apache hadoop or this weird thing http://blog.cloudera.com/blog/2011/10/introducing-crunch/

Just write up a proof of concept that you can use to test timings on.

But, if it's locked in stone and you HAVE to load it into oracle... :\

I'd need to think more about that.

Craton

United States17247 Posts

October 26 2012 13:12 GMT

#3623

Our processing logic has many steps and we have a business need to hold onto the data at different stages in its lifecycle.

We have a very large Oracle-based tool that has been built that handles all of this.

You're hung up on the data processing and not the outputting of a table to a file. These are different matters.

Thorakh

Netherlands1788 Posts

October 26 2012 14:02 GMT

#3624

Alright, I've been trying to write a program that solves a soduko (the program works but I have a question). The sudoku grid is stored into a one-dimensional integer array (with the first row being array positions 0-8, the second 9-17, etc.). Like this:

+ Show Spoiler +

0  1  2  3  4  5  6  7  8  
9  10 11 12 13 14 15 16 17 
18 19 20 21 22 23 24 25 26 
27 28 29 30 31 32 33 34 35 
36 37 38 39 40 41 42 43 44 
45 46 47 48 49 50 51 52 53 
54 55 56 57 58 59 60 61 62 
63 64 65 66 67 68 69 70 71 
72 73 74 75 76 77 78 79 80

The program has a method that checks whether a specific move is legal:

+ Show Spoiler +

private static boolean isLegal(int[] grid, int candidateCell, int trialValue)
	{
		//Check the row of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int rowLocation = candidateCell;
		while ((rowLocation+1) % 9 != 0)
		{
			rowLocation++;
		}
		for (int i=rowLocation-8; i<=rowLocation; i++)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the column of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int columnLocation = candidateCell;
		while (columnLocation - 9 >= 0)
		{
			columnLocation -= 9;
		}
		for (int i=columnLocation; i<81; i+=9)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int boxRowLocation = candidateCell % 3;
		int boxColumnLocation = (candidateCell / 9) % 3;
		int[] box = new int[9];
		box[boxColumnLocation*3 + boxRowLocation] = candidateCell;		
		int boxLocation = boxColumnLocation*3 + boxRowLocation;
		int gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation < 9)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 2)
			{
				gridLocation += 7;
			}
			else
			{
				gridLocation++;
			}
			boxLocation++;
		}		
		boxLocation = boxColumnLocation*3 + boxRowLocation;
		gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation >= 0)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 0)
			{
				gridLocation -= 7;
			}
			else
			{
				gridLocation--;
			}
			boxLocation--;
		}
		for (int i=0; i<box.length; i++)
		{
			if (grid[box[i]] == trialValue)
			{
				return false;
			}
		}
		
		return true;
	}

The method takes three arguments, the sudoku grid (int[] grid), the array position of the cell it is testing (int candidateCell) and the value it wants to test (int trialValue). It first checks if trialValue already occurs in the row or column of candidateCell. I don't think these two blocks of code can be simplified much. What I am wondering about is the third block of code where it checks if trialValue already occurs in the 3 by 3 box that candidateCell belongs to.

I had to think long and hard about how I would approach the problem of acquiring the 9 cells that belong to the box candidateCell is part of. I solved it by taking an integer array of size 9 (representing a 3 by 3 box), calculating the position of candidateCell in that block and putting candidateCell at that position (the value of candidateCell, not the value of the cell at position candidateCell in the sudoku grid!). Then I calculate the positions of the rest of the cells in the box by going back and forth from the position of candidateCell (if the position of candidateCell in the box is 5, I calculate 6, 7 and 8 first, and then I go back from 5 to 4, 3, 2, 1 and 0). After that I replace every position in the box by it's real sudoku value and check if trialValue matches any of those values. If so, the move is illegal.

My question is if there's a better solution to find and check a box a cell belongs to (if the sudoku grid is stored into a integer array as described in the first paragraph)? My code seems really long and obfuscated in relation to the other two blocks of code (which check the row and column).

edit: it's in Java btw, and my 1500th post!

Hairy

United Kingdom1169 Posts

October 26 2012 14:13 GMT

#3625

Woops..... just fixed a really devious bug in C++ caused by passing a pointer into a function by value, instead of by reference. Long story short, the program was then messing around in memory in places it should not have been...... weirdness occurred.

Oh how much difference a '&' makes.

Deleted User 101379

4849 Posts

October 26 2012 14:15 GMT

#3626

On October 26 2012 23:02 Thorakh wrote:
Alright, I've been trying to write a program that solves a soduko (the program works but I have a question). The sudoku grid is stored into a one-dimensional integer array (with the first row being array positions 0-8, the second 9-17, etc.). The program has a method that checks whether a specific move is legal:

+ Show Spoiler +

private static boolean isLegal(int[] grid, int candidateCell, int trialValue)
	{
		//Check the row of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int rowLocation = candidateCell;
		while ((rowLocation+1) % 9 != 0)
		{
			rowLocation++;
		}
		for (int i=rowLocation-8; i<=rowLocation; i++)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the column of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int columnLocation = candidateCell;
		while (columnLocation - 9 >= 0)
		{
			columnLocation -= 9;
		}
		for (int i=columnLocation; i<81; i+=9)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int boxRowLocation = candidateCell % 3;
		int boxColumnLocation = (candidateCell / 9) % 3;
		int[] box = new int[9];
		box[boxColumnLocation*3 + boxRowLocation] = candidateCell;		
		int boxLocation = boxColumnLocation*3 + boxRowLocation;
		int gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation < 9)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 2)
			{
				gridLocation += 7;
			}
			else
			{
				gridLocation++;
			}
			boxLocation++;
		}		
		boxLocation = boxColumnLocation*3 + boxRowLocation;
		gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation >= 0)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 0)
			{
				gridLocation -= 7;
			}
			else
			{
				gridLocation--;
			}
			boxLocation--;
		}
		for (int i=0; i<box.length; i++)
		{
			if (grid[box[i]] == trialValue)
			{
				return false;
			}
		}
		
		return true;
	}

The method takes three arguments, the sudoku grid (int[] grid), the array position of the cell it is testing (int candidateCell) and the value it wants to test (int trialValue). It first checks if trialValue already occurs in the row or column of candidateCell. I don't think these two blocks of code can be simplified much. What I am wondering about is the third block of code where it checks if trialValue already occurs in the 3 by 3 box that candidateCell belongs to.

I had to think long and hard about how I would approach the problem of acquiring the 9 cells that belong to the box candidateCell is part of. I solved it by taking an integer array of size 9 (representing a 3 by 3 box), calculating the position of candidateCell in that block and putting candidateCell at that position (the value of candidateCell, not the value of the cell at position candidateCell in the sudoku grid!). Then I calculate the positions of the rest of the cells in the box by going back and forth from the position of candidateCell (if the position of candidateCell in the box is 5, I calculate 6, 7 and 8 first, and then I go back from 5 to 4, 3, 2, 1 and 0). After that I replace every position in the box by it's real sudoku value and check if trialValue matches any of those values. If so, the move is illegal.

My question is if there's a better solution to find and check a box a cell belongs to (if the sudoku grid is stored into a integer array as described in the first paragraph)? My code seems really long and obfuscated in relation to the other two blocks of code (which the row and columns).

edit: it's in Java btw, and my 1500th post!

Go with the simple solution:
Lookup tables.

Create an array of arrays and have each block represented as indexes to the original array in it.
I.e.:


0 => 0, 1, 2, 9, 10, 11, 18, 19, 20;
1 => 3, 4, 5, 12, 13, 14, 21, 22, 23;
...

You can define that as constants and then you just loop through the array for the block.

To find out in which block you are, you should be able to calculate it. I don't have a formula memorized but i threw this together, should work but you should test it first since i only ran through it in my head:


blockIndex = ((currentIndex % 9) / 3) + (3 * (currentIndex / 27))

Thorakh

Netherlands1788 Posts

October 26 2012 14:54 GMT

#3627

On October 26 2012 23:15 Morfildur wrote:

Show nested quote +

private static boolean isLegal(int[] grid, int candidateCell, int trialValue)
	{
		//Check the row of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int rowLocation = candidateCell;
		while ((rowLocation+1) % 9 != 0)
		{
			rowLocation++;
		}
		for (int i=rowLocation-8; i<=rowLocation; i++)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the column of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int columnLocation = candidateCell;
		while (columnLocation - 9 >= 0)
		{
			columnLocation -= 9;
		}
		for (int i=columnLocation; i<81; i+=9)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int boxRowLocation = candidateCell % 3;
		int boxColumnLocation = (candidateCell / 9) % 3;
		int[] box = new int[9];
		box[boxColumnLocation*3 + boxRowLocation] = candidateCell;		
		int boxLocation = boxColumnLocation*3 + boxRowLocation;
		int gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation < 9)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 2)
			{
				gridLocation += 7;
			}
			else
			{
				gridLocation++;
			}
			boxLocation++;
		}		
		boxLocation = boxColumnLocation*3 + boxRowLocation;
		gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation >= 0)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 0)
			{
				gridLocation -= 7;
			}
			else
			{
				gridLocation--;
			}
			boxLocation--;
		}
		for (int i=0; i<box.length; i++)
		{
			if (grid[box[i]] == trialValue)
			{
				return false;
			}
		}
		
		return true;
	}

The method takes three arguments, the sudoku grid (int[] grid), the array position of the cell it is testing (int candidateCell) and the value it wants to test (int trialValue). It first checks if trialValue already occurs in the row or column of candidateCell. I don't think these two blocks of code can be simplified much. What I am wondering about is the third block of code where it checks if trialValue already occurs in the 3 by 3 box that candidateCell belongs to.

I had to think long and hard about how I would approach the problem of acquiring the 9 cells that belong to the box candidateCell is part of. I solved it by taking an integer array of size 9 (representing a 3 by 3 box), calculating the position of candidateCell in that block and putting candidateCell at that position (the value of candidateCell, not the value of the cell at position candidateCell in the sudoku grid!). Then I calculate the positions of the rest of the cells in the box by going back and forth from the position of candidateCell (if the position of candidateCell in the box is 5, I calculate 6, 7 and 8 first, and then I go back from 5 to 4, 3, 2, 1 and 0). After that I replace every position in the box by it's real sudoku value and check if trialValue matches any of those values. If so, the move is illegal.

My question is if there's a better solution to find and check a box a cell belongs to (if the sudoku grid is stored into a integer array as described in the first paragraph)? My code seems really long and obfuscated in relation to the other two blocks of code (which the row and columns).

edit: it's in Java btw, and my 1500th post!

Go with the simple solution:
Lookup tables.

Create an array of arrays and have each block represented as indexes to the original array in it.
I.e.:


0 => 0, 1, 2, 9, 10, 11, 18, 19, 20;
1 => 3, 4, 5, 12, 13, 14, 21, 22, 23;
...


blockIndex = ((currentIndex % 9) / 3) + (3 * (currentIndex / 27))

Hmm, that's a lot simpler than what I was doing ^^, especially that one formula (it's correct). I'll try implementing this.

edit: was able to reduce my giant block of box calculating code to this:
+ Show Spoiler +

//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
int[][] blocks = new int[9][9];
blocks[0] = new int[] { 0,  1,  2,  9, 10, 11, 18, 19, 20};
blocks[1] = new int[] { 3,  4,  5, 12, 13, 14, 21, 22, 23};
blocks[2] = new int[] { 6,  7,  8, 15, 16, 17, 24, 25, 26};
blocks[3] = new int[] {27, 28, 29, 36, 37, 38, 45, 46, 47};
blocks[4] = new int[] {30, 31, 32, 39, 40, 41, 48, 49, 50};
blocks[5] = new int[] {33, 34, 35, 42, 43, 44, 51, 52, 53};
blocks[6] = new int[] {54, 55, 56, 63, 64, 65, 72, 73, 74};
blocks[7] = new int[] {57, 58, 59, 66, 67, 68, 75, 76, 77};
blocks[8] = new int[] {60, 61, 62, 69, 70, 71, 78, 79, 80};
int blockIndex = ((candidateCell % 9) / 3) + (3 * (candidateCell / 27));
for (int i=0; i<blocks[blockIndex].length; i++)
{
	if (grid[blocks[blockIndex][i]] == trialValue)
	{
		return false;
	}
}

and it works! Thanks. Now I'm just trying to figure out how the blockIndex formula works.

b3n3tt3

595 Posts

October 26 2012 15:04 GMT

#3628

or you could loop from 1 to 80 and put them each into the 2dimensional array

Deleted User 101379

4849 Posts

October 26 2012 15:24 GMT

#3629

On October 26 2012 23:54 Thorakh wrote:

Show nested quote +

On October 26 2012 23:15 Morfildur wrote:

private static boolean isLegal(int[] grid, int candidateCell, int trialValue)
	{
		//Check the row of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int rowLocation = candidateCell;
		while ((rowLocation+1) % 9 != 0)
		{
			rowLocation++;
		}
		for (int i=rowLocation-8; i<=rowLocation; i++)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the column of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int columnLocation = candidateCell;
		while (columnLocation - 9 >= 0)
		{
			columnLocation -= 9;
		}
		for (int i=columnLocation; i<81; i+=9)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int boxRowLocation = candidateCell % 3;
		int boxColumnLocation = (candidateCell / 9) % 3;
		int[] box = new int[9];
		box[boxColumnLocation*3 + boxRowLocation] = candidateCell;		
		int boxLocation = boxColumnLocation*3 + boxRowLocation;
		int gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation < 9)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 2)
			{
				gridLocation += 7;
			}
			else
			{
				gridLocation++;
			}
			boxLocation++;
		}		
		boxLocation = boxColumnLocation*3 + boxRowLocation;
		gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation >= 0)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 0)
			{
				gridLocation -= 7;
			}
			else
			{
				gridLocation--;
			}
			boxLocation--;
		}
		for (int i=0; i<box.length; i++)
		{
			if (grid[box[i]] == trialValue)
			{
				return false;
			}
		}
		
		return true;
	}

The method takes three arguments, the sudoku grid (int[] grid), the array position of the cell it is testing (int candidateCell) and the value it wants to test (int trialValue). It first checks if trialValue already occurs in the row or column of candidateCell. I don't think these two blocks of code can be simplified much. What I am wondering about is the third block of code where it checks if trialValue already occurs in the 3 by 3 box that candidateCell belongs to.

I had to think long and hard about how I would approach the problem of acquiring the 9 cells that belong to the box candidateCell is part of. I solved it by taking an integer array of size 9 (representing a 3 by 3 box), calculating the position of candidateCell in that block and putting candidateCell at that position (the value of candidateCell, not the value of the cell at position candidateCell in the sudoku grid!). Then I calculate the positions of the rest of the cells in the box by going back and forth from the position of candidateCell (if the position of candidateCell in the box is 5, I calculate 6, 7 and 8 first, and then I go back from 5 to 4, 3, 2, 1 and 0). After that I replace every position in the box by it's real sudoku value and check if trialValue matches any of those values. If so, the move is illegal.

My question is if there's a better solution to find and check a box a cell belongs to (if the sudoku grid is stored into a integer array as described in the first paragraph)? My code seems really long and obfuscated in relation to the other two blocks of code (which the row and columns).

edit: it's in Java btw, and my 1500th post!

Go with the simple solution:
Lookup tables.

Create an array of arrays and have each block represented as indexes to the original array in it.
I.e.:


0 => 0, 1, 2, 9, 10, 11, 18, 19, 20;
1 => 3, 4, 5, 12, 13, 14, 21, 22, 23;
...


blockIndex = ((currentIndex % 9) / 3) + (3 * (currentIndex / 27))

//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
int[][] blocks = new int[9][9];
blocks[0] = new int[] { 0,  1,  2,  9, 10, 11, 18, 19, 20};
blocks[1] = new int[] { 3,  4,  5, 12, 13, 14, 21, 22, 23};
blocks[2] = new int[] { 6,  7,  8, 15, 16, 17, 24, 25, 26};
blocks[3] = new int[] {27, 28, 29, 36, 37, 38, 45, 46, 47};
blocks[4] = new int[] {30, 31, 32, 39, 40, 41, 48, 49, 50};
blocks[5] = new int[] {33, 34, 35, 42, 43, 44, 51, 52, 53};
blocks[6] = new int[] {54, 55, 56, 63, 64, 65, 72, 73, 74};
blocks[7] = new int[] {57, 58, 59, 66, 67, 68, 75, 76, 77};
blocks[8] = new int[] {60, 61, 62, 69, 70, 71, 78, 79, 80};
int blockIndex = ((candidateCell % 9) / 3) + (3 * (candidateCell / 27));
for (int i=0; i<blocks[blockIndex].length; i++)
{
	if (grid[blocks[blockIndex][i]] == trialValue)
	{
		return false;
	}
}

and it works! Thanks. Now I'm just trying to figure out how the blockIndex formula works.

The formula is actually quite simple if you split it into the 3 steps it combines:
1. get the column:
column = (currentIndex % 9) / 3

In this step we don't care about which row it's in, so we can modulo 9 it which basically reduces it to the position in the uppermost row, i.e. to a value between 0 and 8. Dividing it by 3 we get the column of the block
0 - 2 = 0
3 - 5 = 1
6 - 8 = 2

2. get the row
row = currentIndex / 27

Every row of blocks consists of 3 * 3 * 3 = 27 numbers, so simply dividing the index by 27 gives us the current row.
0 - 26 = 0
27 - 53 = 1
54 - 80 = 2

3. the block index
We now have two numbers between 0 and 2 that we need to combine. Since each row consists of 3 columns we can multiply the row by 3 for the number of columns above the current row.
row 0: 0
row 1: 3
row 2: 6
Add that to the column from step 1 and we get the blockIndex.

Fyodor

Canada971 Posts

October 26 2012 16:27 GMT

#3630

On October 26 2012 23:13 Hairy wrote:
Woops..... just fixed a really devious bug in C++ caused by passing a pointer into a function by value, instead of by reference. Long story short, the program was then messing around in memory in places it should not have been...... weirdness occurred.

Oh how much difference a '&' makes.

Yeah I just got my book on C++ and I'm making damn sure I read the part about Pointers and References like 20 times because of the warnings in this thread.

http://www.amazon.com/Primer-5th-Edition-Stanley-Lippman/dp/0321714113

this is it. Can't say how "Correct" it is by virtue of being a noob but pedagogically I love it. They don't spend too much time explaining why some things are important but at 1000 pages I'm not sure I can blame them.

BTW, what CAN you do with a pointer or a reference lol? If I understand correctly I've been exclusively passing data by value if I've been using Java. What's the point of references and pointers, then, if Java works fine without them?

Deleted User 101379

4849 Posts

October 26 2012 16:42 GMT

#3631

On October 27 2012 01:27 Fyodor wrote:

Show nested quote +

Pointers and references are very similar in nature with very small differences. Most modern languages removed pointers completely, so they just use references but due to the language design, they don't need pointers for things where pointers were used over references in C++. Java (and C#) generally uses pass-by-reference, though i think some native stuff (integers, chars) are pass-by-value, though i don't remember exactly. The main reason for that is performance since you don't have to copy the potentially huge object but instead just give the reference to it.

With a reference, you basically change the original object while with pass-by-value you only work on a copy, i.e. the original object stays untouched and any changes inside of that method are not visible outside of it.

I tried to formulate the difference between pass-by-pointer and pass-by-reference but scrapped that part of the post because it's very much behind the scenes stuff that doesn't really influence how it's used. The biggest difference is probably that pointers can be NULL, references can't. There are some small differences but more often than not you don't actually have to worry about that. In general, always try to use references where you can and avoid pointers, it makes your life easier.

Hairy

United Kingdom1169 Posts

October 26 2012 17:20 GMT

#3632

It's all just different tools to get the job done. I don't know Java, but in C++ you can pass things to functions in two ways:

Pass by value
The function gets a local COPY of the variable, and any changes you make to that variable stay within the function. If you want to get data "out" of the function again, you need to return something. In C++ everything (I think) is passed by value by default.
Pass by reference
A variable passed by reference is basically given usage of the original variable. The function doesn't get a copy of the variable, it gets "THE" variable. If you make changes to the value of that variable, you the original variable is altered. To pass by reference, you use the '&', eg
```
void MyFunction(int& myVariable)
```

Pointers are often a subject that causes confusion - they are variables that store a MEMORY LOCATION of a value, not the value itself. For example, if you have an int x, it's value might be 10. If you assign a pointer to 'x', the value of that pointer would be the MEMORY LOCATION of where x's value is stored. If you "de-reference" the pointer, you can gain access to that value.

The best analogy I can think of is to compare it with a library. The title of each book would be a variable name, and the contents of each book would be it's value. A pointer's "value" would be the location in the library that you could go find the book's contents.

The bug I mentioned earlier was because I was incorrectly passing a pointer into my function by value, instead of by reference, and my function was altering the value (aka the memory location) of my pointer. This meant that, outside my function, my pointer hadn't been updated, and still held its original value (a memory location)! So I was then messing around with bits of memory I shouldn't have been touching, and that's where things can get mysterious and messy....

If you thought that was too easy, consider that pointers can also point at other pointers. Then it all gets a bit "inception-esque"....

Thorakh

Netherlands1788 Posts

October 26 2012 18:57 GMT

#3633

On October 27 2012 00:24 Morfildur wrote:

Show nested quote +

On October 26 2012 23:54 Thorakh wrote:

On October 26 2012 23:15 Morfildur wrote:

private static boolean isLegal(int[] grid, int candidateCell, int trialValue)
	{
		//Check the row of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int rowLocation = candidateCell;
		while ((rowLocation+1) % 9 != 0)
		{
			rowLocation++;
		}
		for (int i=rowLocation-8; i<=rowLocation; i++)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the column of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int columnLocation = candidateCell;
		while (columnLocation - 9 >= 0)
		{
			columnLocation -= 9;
		}
		for (int i=columnLocation; i<81; i+=9)
		{
			if (grid[i] == trialValue)
			{
				return false;
			}
		}
		
		//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
		int boxRowLocation = candidateCell % 3;
		int boxColumnLocation = (candidateCell / 9) % 3;
		int[] box = new int[9];
		box[boxColumnLocation*3 + boxRowLocation] = candidateCell;		
		int boxLocation = boxColumnLocation*3 + boxRowLocation;
		int gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation < 9)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 2)
			{
				gridLocation += 7;
			}
			else
			{
				gridLocation++;
			}
			boxLocation++;
		}		
		boxLocation = boxColumnLocation*3 + boxRowLocation;
		gridLocation = boxColumnLocation*3 + boxRowLocation;
		while (boxLocation >= 0)
		{
			box[boxLocation] = candidateCell - ((boxColumnLocation*3 + boxRowLocation) - gridLocation);
			if (boxLocation % 3 == 0)
			{
				gridLocation -= 7;
			}
			else
			{
				gridLocation--;
			}
			boxLocation--;
		}
		for (int i=0; i<box.length; i++)
		{
			if (grid[box[i]] == trialValue)
			{
				return false;
			}
		}
		
		return true;
	}

The method takes three arguments, the sudoku grid (int[] grid), the array position of the cell it is testing (int candidateCell) and the value it wants to test (int trialValue). It first checks if trialValue already occurs in the row or column of candidateCell. I don't think these two blocks of code can be simplified much. What I am wondering about is the third block of code where it checks if trialValue already occurs in the 3 by 3 box that candidateCell belongs to.

I had to think long and hard about how I would approach the problem of acquiring the 9 cells that belong to the box candidateCell is part of. I solved it by taking an integer array of size 9 (representing a 3 by 3 box), calculating the position of candidateCell in that block and putting candidateCell at that position (the value of candidateCell, not the value of the cell at position candidateCell in the sudoku grid!). Then I calculate the positions of the rest of the cells in the box by going back and forth from the position of candidateCell (if the position of candidateCell in the box is 5, I calculate 6, 7 and 8 first, and then I go back from 5 to 4, 3, 2, 1 and 0). After that I replace every position in the box by it's real sudoku value and check if trialValue matches any of those values. If so, the move is illegal.

My question is if there's a better solution to find and check a box a cell belongs to (if the sudoku grid is stored into a integer array as described in the first paragraph)? My code seems really long and obfuscated in relation to the other two blocks of code (which the row and columns).

edit: it's in Java btw, and my 1500th post!

Go with the simple solution:
Lookup tables.

Create an array of arrays and have each block represented as indexes to the original array in it.
I.e.:


0 => 0, 1, 2, 9, 10, 11, 18, 19, 20;
1 => 3, 4, 5, 12, 13, 14, 21, 22, 23;
...


blockIndex = ((currentIndex % 9) / 3) + (3 * (currentIndex / 27))

//Check the box of candidateCell for duplicate numbers. If a duplicate is found, the solution is illegal.
int[][] blocks = new int[9][9];
blocks[0] = new int[] { 0,  1,  2,  9, 10, 11, 18, 19, 20};
blocks[1] = new int[] { 3,  4,  5, 12, 13, 14, 21, 22, 23};
blocks[2] = new int[] { 6,  7,  8, 15, 16, 17, 24, 25, 26};
blocks[3] = new int[] {27, 28, 29, 36, 37, 38, 45, 46, 47};
blocks[4] = new int[] {30, 31, 32, 39, 40, 41, 48, 49, 50};
blocks[5] = new int[] {33, 34, 35, 42, 43, 44, 51, 52, 53};
blocks[6] = new int[] {54, 55, 56, 63, 64, 65, 72, 73, 74};
blocks[7] = new int[] {57, 58, 59, 66, 67, 68, 75, 76, 77};
blocks[8] = new int[] {60, 61, 62, 69, 70, 71, 78, 79, 80};
int blockIndex = ((candidateCell % 9) / 3) + (3 * (candidateCell / 27));
for (int i=0; i<blocks[blockIndex].length; i++)
{
	if (grid[blocks[blockIndex][i]] == trialValue)
	{
		return false;
	}
}

and it works! Thanks. Now I'm just trying to figure out how the blockIndex formula works.

Ah yes, thank you, that makes sense.

And now on to my next question, why is using layout managers in Java such a pain in the ass. I simply cannot get my components to show up with the correct size. I mostly just avoid layouts and use setLayout(null), but that obviously is not a good solution. No matter what I do (setPreferredSize(), setMaximumSize(), setMinimumSize(), setSize(), setLocation(), setBounds(), etc.) Java just says FUCK YOU and resizes the components as it sees fit. I'm getting mighty tired of wrestling with Swing to get everything to show up in the right place. Am I simply doing something wrong or is it really a pain in the ass.

Trying to get a JTable of a specific size? Not gonna happen, it resizes as it sees fit and won't even show up with setLayout(null). Changing to a GridLayout with JTextFields as components? Good luck specifying the sizes of the text fields, the layout manager doesn't pay attention to it.

I must be doing something wrong...

phar

United States1080 Posts

October 27 2012 01:37 GMT

#3634

On October 26 2012 22:12 Craton wrote:
Our processing logic has many steps and we have a business need to hold onto the data at different stages in its lifecycle.

We have a very large Oracle-based tool that has been built that handles all of this.

You're hung up on the data processing and not the outputting of a table to a file. These are different matters.

Ok, but it feels like optimizing the really short part of the problem. How much time does it take to output vs. time to input & process around oracle?

Getting data out of oracle is going to be an I/O bound problem any way you cut it. The possibility for speedup there is just not all that interesting.

That said: What raw output speed can you get from Orcale? (If you try to read & drop records as fast as possible, how fast is it?) If it's fast enough, can you read from oracle from multiple boxes at the same time? Have you tried just splitting up the work set into multiple machines (2, 5, 300?) and having them merge their results together? (Skip the stored procedure thing altogether).

Fyodor

Canada971 Posts

October 27 2012 02:36 GMT

#3635

Thanks Morfildur and Hairy. Gives me a lot to think about.

Craton

United States17247 Posts

October 27 2012 07:18 GMT

#3636

On October 27 2012 10:37 phar wrote:

Show nested quote +

I wouldn't even rate this as Alpha currently, so a lot of things are run step by step to bring things in and then modify it. The actual processing time for reading things in and then modifying them is pretty low (much lower than the time to output).

Oracle is housed on a secure server and we simply call whatever output procedure to do its thing (it dumps it to the filesystem of said secure server). Data CANNOT leave secure servers. Even moving from one server to the next requires a secure, semi-automated queuing system to move any files.

Because of the complexity of the data and what we do with it, as well as how much is already designed around Oracle, it makes no sense to try and redo everything in an environment outside of Oracle.

If you want to answer my question, then do so. Stop fighting me on existing business processes.

All I wanted to know is if I could match or better the output speed of what I have now versus a Java procedure that accomplished the same task more cleanly. Trying to point at everything EXCEPT what I asked is extremely aggravating.

supereddie

Netherlands151 Posts

October 27 2012 09:04 GMT

#3637

You should be able to figure out what is the bottleneck in your code. If there is no clear bottleneck and the CPU is not running at 100% capacity, it might be possible to use more than one thread to do the processing (maybe one thread per line, max of 10 threads). But perhaps this is not allowed when using Java stored procedures.

ArdentZeal

Germany155 Posts

October 27 2012 17:06 GMT

#3638

Hello there,

i could need some help in c++.
I have several classes (State1, State2, State3, ...) derived from the base class State.
They all have the same method, which is used sligthly differently depending on the State i am in.

So I wrote the method Automat::setCurrentState(State* _nextState); to set the States accordingly.

But if i try to call this method with one of the derived classes, i get a compiler error. Do i really have to overwrite Automat::setCurrentState(State* _nextState); with each of the 20 States or is there a simple solution i am missing?

Hairy

United Kingdom1169 Posts

October 27 2012 17:16 GMT

#3639

Something is missing, or something isn't quite right. I can't really say more without seeing the code.

Thorakh

Netherlands1788 Posts

October 27 2012 18:20 GMT

#3640

+ Show Spoiler +

private static boolean containsDuplicates(int[] array)
{
	for (int i=0; i<array.length; i++)
	{
		if (array[i] != 0)
		{
			for (int j=i+1; j<array.length; j++)
			{
				if (array[i] == array[j])
				{
					return true;
				}
			}
		}
	}
	
	return false;
}

What on earth would cause this code to hang? I'm trying to check an array for duplicate integers, ignoring zeroes. Sometimes it hangs, sometimes it doesn't O_O

Prev 1 180 181 182 183 184 1031 Next

Please or register to reply.

The Big Programming Thread - Page 182

Completed

Ongoing

Upcoming