Oops, 7 Days. Hey look I don't update on weekends.

Can a bot search a site set up with HTACCESS username/password?

I put together a site for my daughter's class so people can have access to all the usual stuff that gets misplaced (yearly calendar, reminders, blah blah blah).

Setting up HTACCESS with username/password was easy enough through my hosting company. I simply flipped the on switch and defined which root folder (and thereby all subfolders) to secure.

Feedback has been positive. But a few folks are concerned that their personal information (i.e. phone number & email) will be found by search engine bots.

Can search engine bots crawl and serve up folders & pages locked down by htaccess? If so, is there a way to create the .htaccess file to restrict bot access? Does anyone have any better way to do this with something other than .htaccess?
Permalink Curious George 
January 6th, 2006
Bots would have to log into the page the same as a person. So if your passwords are secure, then no.
Permalink Mark Warner 
January 6th, 2006
If there's an .htaccess file denying access to anyone without the username and password, then there should be no way for anyone, including a search engine to get to it.

However, it may be possible to encode he username and password in the URL. (http://username.password@domain.com or something... I forget where the @ symbol goes exactly).

I don't know if .htaccess permission works that way, but if it does, it could be possible for a simple link to get you in to the site, in which case the information could be made available to a search engine.

That's the only thing I can think of.
Permalink MarkTAW 
January 6th, 2006
External search engines have no special accesss, though of course if you or your provider run machine search utilities building a corpus, then you could have a problem.

You can thwart the major search engines with a simple robot.txt in the root, though some nefarious ones ignore it.
Permalink Dennis Forbes 
January 6th, 2006
It is possible to encode the user name and pass into a URL, but you'd have to actually DO that in order to make it vulnerable.
Permalink Mark Warner 
January 6th, 2006
Oh, yeah absolutely. But with a corpus of users, who knows what they might do.

See the referrer thread for examples.
Permalink MarkTAW 
January 6th, 2006
I did not intentionally encode the username and password in the URL.

I have not run machine search utilities building a corpus, and I don't think my provider does, though I'll have to double check.

Therefore, I'd imagine that unless I purposely did either of the above, there's no way a bot - given the feedback so far - could possibly crawl and serve any of that info.

Right?
Permalink Curious George 
January 6th, 2006
As long as none of your users link the page from somewhere, encoding their username and password into the URl.
Permalink Mark Warner 
January 6th, 2006
Does HTTP support username/password combos in the request line? While I've seen that for FTP, I've never seen it with http and basic authentication.
Permalink Dennis Forbes 
January 6th, 2006
I just tried sending the username and password to a protected site i have via the query string and it didnt work
Permalink Phil 
January 6th, 2006
I've seen it for HTTP, but that was such a big security issue in the past that I think newer versions of IE no longer supports it. That's because you can actually put http://www.citibank.com:authentication@user.com/etc.htm and it looks so convincing that tons of people were falling for it. For all I know, Apache may not even support it anymore.

Oh, and it's http://username:password@domain.com - with a colon, I just googled it.
Permalink MarkTAW 
January 6th, 2006
Whether or not that works would be up to the hosting company, and if they know jack about security they would not have it turned on by default.

So, I'd say you are perfectly safe.
Bots can't do anything a human user can't do.
Permalink Eric Debois 
January 6th, 2006
Isn't the user:password@domain a browser feature and not part of http? I just watched with Firefox and Live HTTP Headers and see a Authorization: header sent after getting access denied, but don't see the user/password anywhere else. Trying it through telnet got a 400 reply instantly. So unless it's a smart bot, it's probably not able to handle those urls.
Permalink j4b 
January 6th, 2006
Dunno. Like I said, I know it was disabled in IE a while back.

There are probably bots out there that *look* for URLs like this.. it is the internet, you know.
Permalink MarkTAW 
January 6th, 2006
I don't have SSL enabled. Does that mean anyone with a little know-how can easily get the username/password when someone logs in?
Permalink Curious George 
January 6th, 2006
They'd have to somehow get in between the computer with the user and the computer being logged in to.

I'd say you're safe on this one.
Permalink MarkTAW 
January 6th, 2006

This topic was orginally posted to the off-topic forum of the
Joel on Software discussion board.

Other topics: January, 2006 Other topics: January, 2006 Recent topics Recent topics