Hey,
trying to write a script that will take a html file and removing everything but the local links.
menu.html
faq.html
stuff/morestuff.html
etc.
I think the sed command is what I want but maybe there’s a better way
I tried sed – e ‘/"//g’ index.html
and this removed all of the double quotes but I can’t make it remove all of the single quotes
I guess how I think the program should work is to remove all double and single quotes then remove all lines that don’t include <a and href then remove everything that’s not between href= and > after that removing everything after .html
anyone think of a better command, algorithm, how to do the single quote thing, or how to do something that will replace everything BUT