I’ve been digging around for a good way to archive thread topics outside of the Bookmark feature.
The file print / save as PDF method isn’t really great as all the code blocks don’t wrap or extend, so stuff gets clipped.
Cruising around the discourse developer forum it seems like there’s a way to view the raw markdown
eg- https://scsynth.org/raw/threadNumber but this only shows one thread at a time, with no author attribution, etc. Kind of clunky to iterate over.
Reading through the discourse thread on how they set up printing/saving as PDF it seems like it’s not much of a feature they care about supporting - getting info out of the platform - which is such a shame, but I digress. Just wondering if someone’s found a good workflow to archive? My current workaround has been copy/pasting into text files which is not ideal… the raw markdown seems a bit more promising, but also not great.
The best I can figure out is to append /print to a conversation url (Saving/Downloading Threads as PDF, Markdown etc?). This gives you a relatively static, printable HTML page. You can print from there, or e.g. grab it via Pocket or some other kind of “archive-for-reading” app. I didn’t see bad problems with code formatting, but if you DO see formatting things, the best course is probably to edit the css (either in developer mode of a browser, or in the saved files).
Ah I didn’t think to edit the CSS when using /print and also rather than cmd + p, appending /print like that does make it easier. But for any code block that has long line lengths or more than a certain amount of lines, it does get cropped, for instance - Time-aware merging of two Event Pattern streams. I’ll mess with the CSS to see if there’s a relatively simple 2-3 step process.
I was reading about httrack, but only briefly, and didn’t really keep going when I saw something about Javascript needing to be deactivated. That Jupyter notebook/BeautifulSoup scraper you mention @bovil43810 seems promising.