Forums >> General >> Newb question: How to regexp by yourself?

Newb question: How to regexp by yourself?



Elaborate
Default-avatar
10 posts
Hi!
I've used regexps enough in other contexts that I should be able to do it here as well. All the same, I don't want to screw things up. So:

1: Is there a how-to guide somewhere I missed? Best practices, that sort of thing?

2: Is there a way to check that the regexp I made works before committing?

3: If I still screw up the regexp, can I edit it myself, or do I need to report it as 'bwoken'?

I was thinking of adding makingofneon.com, and looking at the sourcecode, and comparing with the other templates, I *think* the following would work:
<div id="comic">\\s*<img src="http:\\/\\/makingofneon.com\\/wp-content\\/uploads\\/\\d+\\/\\d+\\/(.*.png)"

Did I do it right? Well, I guess I assumed the comics would always be PNGs, and never a flash or GIF or something...
sirgarberto
4pzyyfz
49 posts
I know this doesn't answer your question, but since makingofneon has an RSS feed, you'd be better off setting the Update method to full and the Update url as http://makingofneon.com/feed/
corruption
Users%2fcorruption%2fthumb%2favatar
693 posts
The staff are the ones who activate the comics for being checked for updates. If you can't figure out a regexp, we will.

What I do is I find a page, normally the RSS, new comic page or the archives, and find a bit of coding that changes there each time a new update is made, and only exists there, and set that as the regexp.
In this case I checked out the RSS feed and noticed one thing I automatically check out: <lastBuildDate>
I use (.*?) to mean the bot looks to see if there is anything different there from the last time it checked.
This would make it <lastBuildDate>(.*?)</lastBuildDate>

EDIT: I just out it up and activated it for you.
Elaborate
Default-avatar
10 posts
OK, thanks, and the "look for a feed first" tip will be useful for other newbies looking to contribute.
But that still doesn't really answer my main question, which is "How do I add a comic while causing the least work for anyone else?"

I use regexps near-daily, but I know there are small differences between systems and inputs; for one thing I noticed in the examples that the backslashes themselves need to be escaped, but quotation marks don't seem to need it.
So I'm resigned to the fact that the first time I use a new system, I will likely screw up. I just want to be able to discover my screw-up quickly and fix it before someone else has to clean up my mess.
And I haven't found any way to do this, searching the forums.

So what I want to know is basic things like:
*"Last output" is the current output of the regexp, right?
*"Update url" is the page that's actually searched through with the regexp, right?
*If I add a comic, can I edit its regexp afterwards if I got it wrong?
*Can I tell I got it right/wrong without waiting for the next update? Ideally before even adding the comic...
corruption
Users%2fcorruption%2fthumb%2favatar
693 posts
Firstly, it is the mods and admins who activate the comic, and do any editting. We can tell when we get it right or wrong by trying to update the comic.
Normal members can suggest a comic, and even try to suggest a reg_exp, but in the end, we are the one swho have to do it.

Most sites' coding varries in different ways, so there is no one sollution to everything.
Some people know the coding well, but I'm a one trick pony when it comes to this.
I just know that (.*?) is used to tell the bots where to look for changes.
For example <code>(.*?)</code> tells the bots to look for <code> and </code>.
If they are not there, at the exact web address, then the update fails.
If they are there, and what is between them is the same as before, then it is not updated.
If they are there, and what is inbetween them is different then before, then it is an update.


Forums >> General >> Newb question: How to regexp by yourself?


Insert link to comic