Searching \ for '[EE] need code' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: massmind.org/techref/index.htm?key=need+code
Search entire site for: 'need code'.

Exact match. Not showing close matches.
PICList Thread
'[EE] need code'
2009\07\02@211302 by Dr Skip

picon face
I'm sure others before me have wanted something like this, and I can't really
come up with search terms unique enough to get reasonable results, so I'm
turning to the list.

I'm looking for something as simple as a Perl script to something as a
stand-alone windows program that will take web pages (locally stored perhaps)
and 'compile' them into a single document. Canon sold one at one time (Win95
era) - it would do it as whole pages, and shrink to fit (so it must have done
them as images, but they were very clear), and I think you could tell it how
deep you wanted or by individual pages. Evernote is close, but it is very
limited in formatting output or editing. I have downloading down, so it can
just work locally and not include tcpip code.

Basically, I need to take various docs which are in a format which has a "go to
next page" link at the bottom of each page and link them into one doc for
printing a pdf. Now, this could get ugly too, so an added bennie would be to
only pull text (and maybe images linked to within the text) and create one html
or pdf or text doc from them. The source html can be in a directory (maybe
without links to each other, or ignored) so everything in the directory gets
catted together, or it could follow links (local or on a net). I'd even be
willing to piece something together at this point with multiple passes. We've
got docs from way-back (and not so way back) when it was cool to put a
paragraph on one page and make the user click to go forward. Very annoying
IMHO. Now pdf docs are needed (I guess I'm not the only one who got annoyed). I
would rather not have to find someone to retype, or go through almost as much
writing a custom piece of s/w for this...

Thanks in advance.
Skip

2009\07\02@212821 by John Coppens

flavicon
face
On Thu, 02 Jul 2009 21:12:44 -0400
Dr Skip <spam_OUTdrskipTakeThisOuTspamgmail.com> wrote:

> 'compile' them into a single document.

Firefox can save ('print') pages to either postscript or PDF. Pstools can
take postscripts and 'bind' them together. Not sure those tools exist
under Windows though.

John

2009\07\02@213147 by Tamas Rudnai

face picon face
I am not sure what do you want exactly, but there are tools for converting
HTML to PDF (for example you can look for the phrase 'html2pdf' on google).
Also you may can convert the entire page with all pics and everything into a
CHM very easily with Internet Explorer, which then can be converted to PDF
again or you can use a CHM viewer, or the IE itself to read the document. Or
with few cheats you may can try doxyGen to generate PDF or PS. Or you may
can try to open the page with OpenOffice and export it to some other formats
(like PDF or DOC).

Tamas



On Fri, Jul 3, 2009 at 2:12 AM, Dr Skip <.....drskipKILLspamspam@spam@gmail.com> wrote:

{Quote hidden}

> -

2009\07\02@214750 by Dr Skip

picon face
I've probably got thousands of pages, so I'm looking at pointing some code at a
directory with a few hundred html docs in it and getting one doc of some type
in the end. Since some 'books' contain a lot of redundant header 'stuff' on
each web page, it would be nice to drop all that in the process too, as a wish ;)

Firefox in Windows doesn't have a pdf option, and I'm looking for a
semi-automatic way to do this. If I have to do the same set of clicks and
strokes thousands of times (as would be needed for a page-by-page save or
print), I'd end up in the funny farm...

Skip

Tamas Rudnai wrote:
{Quote hidden}

2009\07\03@040048 by Alan B. Pearce

face picon face
>I'm looking for something as simple as a Perl script to
>something as a stand-alone windows program that will take
>web pages (locally stored perhaps) and 'compile' them
>into a single document.

You mean the way IE8 will save a web page as a .mht file, with pictures and
all in it? I don't know how compressed it is.

2009\07\03@124216 by Peter Restall

flavicon
face

On Thu, 02 Jul 2009 21:12:44 -0400, Dr Skip wrote:

> I'm looking for something as simple as a Perl script to something as a
> stand-alone windows program that will take web pages (locally stored
> perhaps)
> and 'compile' them into a single document. Canon sold one at one time

As Tamas has mentioned, this can be done with CHMs in Windows.  I think the
HTMLHelp tool from Microsoft compiles these - but I've only ever used that
utility as part of the Sandcastle toolchain (auto-documents .NET XML commented
code), so don't know how flexible it is.  You'll need to grab it from the MS
website - there's two of them IIRC, version 1 and version 2.  Hope this
helps.

Regards,

Pete Restall

2009\07\03@184430 by Dr Skip

picon face
Alan B. Pearce wrote:
>
> You mean the way IE8 will save a web page as a .mht file, with pictures and
> all in it? I don't know how compressed it is.
>

Not sure about IE8, but Firefox with addons will do that, but it's a page at a
time. There are other addons that suck web sites or dirs too, but they don't
create one file with all the pages concatenated together.

{Quote hidden}

I'll try to take a look at this, but I suspect it will be view a page - save
the page - give the page a name, etc, go to next page and do the same, etc.

I may never be heard from again.... :-O

Optimally, I need something that will cat a directory worth of html files with
some limited intelligence to strip out headers and metadata and such so the
whole lot would end up as one file that is readable. Maybe a <printing> page
break between what used to be the individual pages.

Even a command line tool. I'm no Perl expert, but I think Perl would be well
suited (but beyond my abilities these days). It would have to incorporate a lot
of html knowledge though to selectively strip out stuff as it wrote the one big
file as a one file html doc...

-Skip

2009\07\03@192834 by AGSCalabrese

picon face
Doing the cat sounds good.
How would you arrange the concatenation ?
     That is ......  what file goes first, second , 3rd ...........
Python or PHP might better a better choice.
Do you have any budget at all ?
Gus


{Quote hidden}

2009\07\03@193320 by AGSCalabrese

picon face
I presume there will be images to include in the concatenated
documents.  How do you plan to deal with them ?
Does one find an image URL in the HTML and grab the image, put it
in a local folder with an updated name and updated link in the
concatenated HTML ?

Gus

2009\07\03@200024 by Dr Skip

picon face
The older tool from Canon (it was commercial ware but in the deep discount $5
bin when I got it for Win 95) did all that. Point it at a page, give it some
selection as to how deep to go and what domains (much like HTtrack or wget) or
dirs and filespecs if any, and it fetched and put it in one big doc. I don't
remember its file format in the end, but it could be printed to any printer
including pdf (if one had acrobat in those days). I never got it to work on XP
or NT, so I don't even know where it is now. Probably went to the thrift shop.

It was very useful, but I think Canon just didn't want to be in the software
biz unless it was based on a specific hard product of theirs. I also thought it
was such an obviously useful tool that there would be more like it as the web
took off.

Now I can't find anything like it, but for web spiders that will recreate the
site dir locally, but not put them all in one doc...

Budgeting here is an odd activity - don't ask. ;) Given the fact that the
function seems so obvious for a tool like Canon had, and the condition of the
economy, et al, it just needs to get done in one's 'extra' time...

-Skip

AGSCalabrese wrote:
> I presume there will be images to include in the concatenated
> documents.  How do you plan to deal with them ?
> Does one find an image URL in the HTML and grab the image, put it
> in a local folder with an updated name and updated link in the
> concatenated HTML ?
>
> Gus

2009\07\04@045412 by Peter Restall

flavicon
face

On Jul 03, 2009; 11:44pm, Dr Skip wrote:

> I'll try to take a look at this, but I suspect it will be view a page - save
> the page - give the page a name, etc, go to next page and do the same, etc.
>
> I may never be heard from again.... :-O

I pointed out the HTMLHelp compiler precisely because of this; there is a
command-line interface for HTMLHelp that the Sandcastle tool uses in its
scripts.  Basically Sandcastle takes one or more XML files of comments,
transforms them via XSLT into HTML documents (one document per page - which
is one per class/method/property, etc. - a fair few in a large project) and
then invokes the HTMLHelp tool to compile them altogether into a single CHM
that can be viewed in the Windows Help Viewer (that dodgy util that pops up
when you hit F1 in an application).  Very useful for documenting .NET APIs.

AFAIK, there are two HTMLHelp versions - 1.4 (I think) is phased out in Vista,
where it's the new fangled 2.0 (again, I think).  Sandcastle can be downloaded
from:

http://www.microsoft.com/Downloads/details.aspx?FamilyID=e82ea71d-da89-42ee-a715-696e3a4873b2&displaylang=en

But you won't want Sandcastle; scroll down the page and there's the list of
links to the required/optional software.  It looks like they've changed the
name to HTML Help Workshop, although I reckon it's the MS Help Compiler that
you're after.

I'll insert a disclaimer in case I'm leading you on a wild goose chase :)
I've only ever used this in the context of running Sandcastle, which calls
this tool in its scripts; therefore the (rather large ?) assumption is that
you should also be able to call this tool from within a simple script - even
a batch script - that will glue your HTML bits and pieces together.  But
reading your other messages, it may not be what you're after - if you're
looking for a PDF of all the pages then this tool won't do.  Although
obviously once you have generated a CHM there are other manipulations you can
do to it.

In regards to your comments about stripping out headers and the like, it may
be worthwhile looking into XSLT; but that will only work if the HTML files are
reasonably well-formed and pretty uniform (otherwise you'll end up writing as
many XSLTs as there are file variations - we'd never see you again if you
ended up doing that either :)  And if you've never used XSLTs before, then as
fantastic as they are, they're likely to turn you into a basket case as well
(another 'write-only' language) !

Without too many specifics and a fair amount of assumptions, I'd reckon an
initial stab at a batch script would be something like:

       wget your HTML files
       for %%f in ( *.html ) do xslt transform
       call htmlhelp to generate chm from transformed xslt

But it may be more work than you're after - if you can find an off-the-shelf
tool that does the job for a couple of quid then it would save you a great
deal of time and effort.  Unfortunately I do not know what they would call
such a tool !

Regards,

Pete Restall

2009\07\04@101853 by M.L.

flavicon
face
On Thu, Jul 2, 2009 at 9:12 PM, Dr Skip<drskipspamKILLspamgmail.com> wrote:

>
> I'm looking for something as simple as a Perl script to something as a
> stand-alone windows program that will take web pages (locally stored perhaps)
> and 'compile' them into a single document.


If the HTML is relatively simple it sounds trivial to write a script
that appends HTML files into a single HTML file but cuts out
extraneous header and style info.

--
Martin K.

2009\07\04@145949 by Marechiare

picon face
> I'm looking for something as simple as a Perl script to
> something as a stand-alone windows program that will
> take web pages (locally stored perhaps) and 'compile'
> them into a single document. Canon sold one at one
> time (Win95 era) - it would do it as whole pages, and
> shrink to fit (so it must have done them as images, but
> they were very clear), and I think you could tell it how
> deep you wanted or by individual pages.

I'm afraid I don't get the main point - how is that connected to [EE]
tag. Do you need to pack the code into some Eectronics? What
Electronics do you target with Perl script and stand-alone windows
program?

Thanks

2009\07\05@112805 by Gerhard Fiedler

picon face
Dr Skip wrote:

> I'm looking for something as simple as a Perl script to something as a
> stand-alone windows program that will take web pages (locally stored
> perhaps) and 'compile' them into a single document.

I'm not sure what custom pre-processing you want to do, but just
printing all documents in a folder to PDF doesn't seem so complex.
pdfFactory for example is a PDF printer, and until you (manually) save
the document, it just accumulates everything printed to it into a single
PDF.

So get pdfFactory (they have an eval version), print a few files to it
and see whether that's what you want. Then write a simple script that
prints all files in a directory to it and you're done. Manual
intervention is needed to save the PDF when it's done printing all files
in the directory.

There are probably other PDF printers out there that work similarly.

Gerhard

More... (looser matching)
- Last day of these posts
- In 2009 , 2010 only
- Today
- New search...