Sanding our assholes with 150 grit.

So far so good (Linux file creation experiment)

I ported the code to Linux and it's here beside me creating 15,000 files/second. 19 million files as we speak, only half of the RAM in use, 15 GB.

Windoze already gave up the ghost at this stage, with identical code, proving it is either the Controllers' Windows driver (with a correct Linux driver) or the Windows kernel. Since poolmon showed no driver leaks, I conclude that indeed Windows kernel has a catastrophic bug.

(20 million)

My question to linuxheads:

How can I see the number of threads a process has spawned? (They are not visible in System Monitor in Gnome and they can't be made to show either in thesettings)

Thank you and best regards,

A fellow Linuxhead
Permalink Dr. Horrorwitz 
March 5th, 2012 3:03pm
21 million, no leaks whatsoever..
Permalink Dr. Horrorwitz 
March 5th, 2012 3:03pm
49 million. Disks sound like it's raining..
Permalink Dr. Horrorwitz 
March 5th, 2012 3:42pm
Good for you! Best of luck indexing the internet on your PC and becoming a billionaire!
Permalink MS 
March 5th, 2012 3:58pm
Lol I remember folks like you.

When I said I was gonna make a Go pattern program, I got derisive comments. I made 100 grand, even when I was sick like a dog and my software was boycotted everywhere.

Then I made a shop with a turnover of a million bucks a year, ignoring comments on how everything is much harder than it seems. We have three employees now and we're pushing the competition out of the market.

What I am developing now is going to make me independently wealthy, and I will have to stop posting here because I do not want to keep updating folks here until the product is out and you guys can start sabotaging the market - not that that will work - it's B2B, targeted to the most cunning marketeers out there.

I've learnt my lesson.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:08pm
From what I know - you would use Top to look at the threads of an application and what their status is.  Probably have to supply some credentials...
Permalink Send private email xampl9 
March 5th, 2012 4:09pm
Oh it's not a "PC".
I have one computer right now, with one controller.
The controller handles 24 3 TB disks, and 72 TB is much more than I need to index anything I need.

I did my homework, see. I index only what I can sell.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:10pm
Thank you xampl, very useful.

It was exactly 20 years ago that I did C/Unix and my memory is rusty (not sure it existed under Interactive Unix then)
Permalink Dr. Horrorwitz 
March 5th, 2012 4:12pm
Actually I see no way of showing how many threads a process owns.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:14pm
Hey xampl - how do I get back to my Gnome desktop after pressing Ctrl + Alt + F1?
Permalink Dr. Horrorwitz 
March 5th, 2012 4:16pm
Never mind - Ctrl + Alt + F7..
Permalink Dr. Horrorwitz 
March 5th, 2012 4:18pm
Wow, Linux is cool..
Permalink Dr. Horrorwitz 
March 5th, 2012 4:18pm
72 million files.
Filesystem struggling a little. CPU values down from 100% to "heart attack" graphs.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:19pm
>> it's B2B, targeted to the most cunning marketeers out there.

And the linguists among them, no doubt. Arr, arr, arr.

I figured it's some kind of deep web search or keyword sifting or SEO thingamabob.

I find it rather extraordinary that the memory leaks you've run into in Windows are not well known. I'm sure anyone with some gumption could prove or disprove your results by writing a program to test the theory. I also don't really doubt what you're saying. It's just odd that a deep flaw like this is not generally known. And it kind of removes Windows from consideration for any kind of lengthy sustained batch operation (like internet search engine DB work.)
Permalink Bored Bystander 
March 5th, 2012 4:19pm
And, MS, I have enough money to fill a rack with high-end servers and colocate it where I get 10 gbps.

But there is no need for it. I have all I need right here. Server has enough oomph, and 200 mbps is more than enough at the moment.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:21pm
Hi BB,

I found a couple of people (two) with the identical problem, and, like myself, they tried everything to solve it, incl. extensive tuning of the registry to the point of bluescreens.

I can now unequivocally (sp.?) say that it's a Windows kernel bug and that MS is not interested in fixing it, because one of those guys had a lengthy session with them and they finally forwarded him to some MS forum, where everyone was stumped.

It's only 100 bytes per GetFileAttributes (or createfile/closefile) but that's unacceptably much when you're doing crazy stuff like creating half a dozen files for every website out there.

I agree that Windows is not a professional OS, meaning it's only usable for games and office use, as I now know.

No way you can run any type of web indexing operation from a Windows OS. But I think the good lord (I'm actually tipsy right now) that there is a way out. I invested a lot of time and money in this. 40 ccTLD's, worldwide trademarking (EU & US), very expensive hardware plus a full year of coding. The crawler and all the rest is basically ready and debugged, just need the higher layers now.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:26pm
>I find it rather extraordinary that the memory leaks
>you've run into in Windows are not well known.

I've heard it said before that it's a Bad Idea to create millions of files on windows. I think they assume that you oughta be buying a SQL server license at that stage.

I also heard about the "special" file names that you couldn't name stuff - lpt, com, etc.
Permalink Colm 
March 5th, 2012 4:27pm
You can easily write a program to reproduce it, but you must create DIFFERENT files.

Create a directory hierarchy with random names and random filenames (no more than 1000 files per folder or so) and watch the OS crash and burn.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:28pm
77 million files and counting :-)
Permalink Dr. Horrorwitz 
March 5th, 2012 4:28pm
Open Source for the win!
Permalink Dr. Horrorwitz 
March 5th, 2012 4:29pm
FUCK database. I don't need no stinkin' database!
Permalink Dr. Horrorwitz 
March 5th, 2012 4:29pm
My gf said I should send Reiser a postcard.
No idea whether they'll let him have it.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:30pm
She spent hours reading his story.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:30pm
read the top man page carefully ( "man top" ).

there are gazillions of options, and I think one of them shows the threads/processes in tree form.

Or maybe "ps" is the command you need. It doesn't output continually. You can easily write a shell/perl script to run it each second thouigh and send it into /tmp/mylog.txt

Then do "tail -f /tmp/mylog.txt" from another terminal to follow what is happening.

have fun!
Permalink eek 
March 5th, 2012 4:35pm
See, Magny's struggling now (80 million files):

http://www.moyogo.com/Screenshot-System%20Monitor.png

Used to be all 100%. 500 threads that all try to write files to 16 HDD's.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:35pm
ps is what you want. You can select a single process by it's PID, get a tree output ... all kinds of stuff.

http://unixhelp.ed.ac.uk/CGI/man-cgi?ps
Permalink eek 
March 5th, 2012 4:36pm
Easy on the perl scripts eek - I'm an old man, an old WINDOWS man..
Permalink Dr. Horrorwitz 
March 5th, 2012 4:36pm
OK, ps, here I come :-)
Permalink Dr. Horrorwitz 
March 5th, 2012 4:37pm
do it in bash then ... purist ...!
Permalink eek 
March 5th, 2012 4:37pm
I guess it's just the kernel waiting for Mr. reiser to get his shit together, with his Big Kernel Locks and all.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:42pm
Isn't perl some kind of demented BASIC?
Permalink Dr. Horrorwitz 
March 5th, 2012 4:43pm
I had a similar BASIC on the Acorn Atom. PRINT was abbreviated to ? etc.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:44pm
yup ... very demented ...

I used to have a bunch of logging scripts that did all this stuff where I last worked, but don't think I have them anymore. You'll have to roll your own ...
Permalink eek 
March 5th, 2012 4:45pm
Show threads with "top -H"

Once in the monitoring loop, you can sort by any field by pressing captial O and selecting from one of the options shown.

Typing "Oo" (uppercase O, lowercase o) will sort by the image name.

Example output below shows that I have python open, and kicked off two new threads with thread.start_new_thread(lambda:sleep(200),()):

Tasks: 107 total,  1 running, 106 sleeping,  0 stopped,  0 zombie
Cpu(s):  4.7%us,  0.3%sy,  0.0%ni, 93.4%id,  0.0%wa,  0.0%hi,  1.7%si,  0.0%st
Mem:    775572k total,    70504k used,  705068k free,    5772k buffers
Swap:  262120k total,        0k used,  262120k free,    37424k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
2667 foo    15  0 22672 3588 1964 S  0.0  0.5  0:00.58 python           
2671 foo    16  0 22672 3588 1964 S  0.0  0.5  0:00.00 python           
2672 foo    16  0 22672 3588 1964 S  0.0  0.5  0:00.00 python           
2608 foo    15  0 20368  11m 7500 S  5.6  1.5  0:04.59 xfce4-terminal   
2315 root      16  0  7736 2220 1476 S  0.0  0.3  0:00.07 console-kit-dae
Permalink CircusAttraction 
March 5th, 2012 4:48pm
Thanks but also that doesn't do yet what I need.

It shows my threads but I spawned 500 and I need to know how many terminated or crashed.

So I nned to see just one single number. How many active threads a process owns. I'll figure it out.

Or to be able to page through a top listing.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:54pm
I think I could pipe top's output to a file or something.
Permalink Dr. Horrorwitz 
March 5th, 2012 4:55pm
if you pipe either top or ps to a file, you can analyse it after.
Permalink eek 
March 5th, 2012 4:57pm
Can't you build some lightweight cheap monitoring into the threads themselves?

Something no more expensive than doing some atomic operations on global counters in your process?
Permalink Attila 
March 5th, 2012 5:02pm
Yeah actually I'm doing that already and they seem to be doing allright. Good idea, I'll make them occasionally update their status. Good idea.

I piped top to a file and I see only 59 or so threads (of 500).

So that's weird. Perhaps the others are done already, because the files have been created. Anyway. Everything OK :-)
Permalink Dr. Horrorwitz 
March 5th, 2012 5:05pm
you could get them to log to a file(s) when they are created and when they die ...

owner PID, thread ID, (start/stop), timestamp

(or whatever stuff you need)

or create a file per thread with whatever in it.
Permalink eek 
March 5th, 2012 5:14pm
It's hard to grasp the enormity of such a database - a database for the entire Internet.

I created now 100 million files and each file only contains a string with 16 characters (a hash value).

They occupy 1 GB per disk, 16 GB worth of files.

Seems a lot, until you realize it's only 172 bytes per file..
Permalink Dr. Horrorwitz 
March 5th, 2012 5:15pm
One of the things I now need to learn is how to format disks (and unmount/mount them) with different file systems and block sizes, and run the experiment again and again until I have the best file system.
Permalink Dr. Horrorwitz 
March 5th, 2012 5:17pm
Good idea eek.
Permalink Dr. Horrorwitz 
March 5th, 2012 5:17pm
> gonna make a Go pattern program

Amazing.  The talent it must have taken to steal everyone else's collection of games. It's so hard to be a crook! Why don't I get any credit for being a criminal!
Permalink save the banksters 
March 6th, 2012 7:59am
Let your haters be your motivators!
Permalink MS 
March 6th, 2012 10:09am

This topic is archived. No further replies will be accepted.

Other topics: March, 2012 Other topics: March, 2012 Recent topics Recent topics