Converting markdown to schelp (using pandoc?)

Hello everyone

I have this dream of writing things in markdown (which is the format I write all my text in so that it can be easily converted to html or pdf using pandoc) and being able to convert to schelp from a markdown document. Before I dive into this rabbithole and try to devise a way myself, has anyone else done this before? For me at least this would make the process of writing help files (especially guides and tutorials) a bit less painful.

1 Like

markdown is inherently less expressive than schelp and most other documentation formats like javadoc and restructured text. to provide the full feature set of schelp syntax, you’d need to create your own special commands for instance methods, special notes, warnings, footnotes, etc. etc., at which point you’d basically be reinventing the documentation format itself. markdown is just simply not designed or equipped for this purpose. schelp is not a difficult format to learn and is already convertible to html, i’d recommend sticking with that.

if all you need is to convert something simple in markdown to schelp, i don’t know of any solutions but a python script could probably do the job.

brian

1 Like

You might laugh at me, but I still write my help originally with the old WYSIWYG SC help system (html on mac, you’d need 3.5 on an old mac I think), where you can take existing help files as templates. Then I’m quite fast in converting them to schelp with copy and paste (from old help and existing schelp files) and just a few core commands.

I got part of the way toward an Emacs org-mode exporter to scdoc syntax, but I never finished it (for lack of time).

Org-mode is more expressive than markdown and supports many varieties of tags and properties. I think it could be made to reflect scdoc metadata.

If interested, I could try to track it down on a backup drive somewhere.

hjh

1 Like

Although I like the format, I hate to have to edit help in a different file, because it’s more work, harder to keep track and prone to be left behind thus making it inconsistent. I like using comments for documenting help files, like with Doxygen. I’m seriously considering on making a Doxygen-to-SCHelpSystem converter, or something similar. Has anybody started working on it already? I’d be happy to join.

Actually, I like the idea of keeping code and docs together (how long before someone brings up literate programming? :slight_smile: Actually I’m not a huge fan of true literate programming, but that’s just me I guess).

If it 's not possible/practical to introduce a doxygen-like system then some key combination to immediately switch from class to doc and vice versa (adding non-existing files on-the-fly) in combination with a live preview pane (or wysiwyg mode) for schelp docs would be nice to have as well.

The scdoc system has a true parser, which should mean that there’s a strong relationship between the scdoc documents and, for example, the rendered HTML. If anyone is invested in improving usability of scdoc, a REALLY good place to start would be to embed file location info in the HTML. This would allow you to do in-browser editing via editable and then apply those changes back to the original scdoc. This doesn’t solve the problem of writing an scdoc from scratch, or making structural changes, but it makes it very easy to e.g. pre-populate a template doc (this functionality already exists), and then do all the copy editing in the browser. Before any attempts are made to introduce new layers/frameworks/tooling to the doc system, I think this sort of thing should be investigated - I think technically this is NOT very difficult work, but has a really high user benefit.

As a reminder also - these discussions about the docs system come up every year or so - I think with good reason, as there are some real issues here. But just so it’s not in a vacuum: basically all of these concerns and proposals (e.g. doxygen, literate programming, wysiwyg) were discussed at length when the scdoc system was created. I don’t agree with every detail, but there were strong reasons for all of the decisions that were made. It’s worth digging through the sc-dev / sc-users for these convos if you’re interested in how we ended up with the system we have. I’ll post ML archive links if i can find some (or maybe someone else has an idea?)

2 Likes

bit of a shameless plug, but recently scvim gained support for schelp syntax. that update is not yet part of the mainline SC repo, but if you clone the repo separately (or update it to the latest master, if you have cloned the main SC repo and have it as a submodule), you can get it.

hey there,

sorry to bump this very old thread, but I was wondering if anybody ever came up with a different solution for this? i’m about to start writing documentation for a bigger project which needs to have the documentation in multiple formates - in the SC-IDE help viewer, as a printed zine and in a GUI. i was planning to write everything in markdown and then output the different versions with pandoc or paged.js. does anybody have a tip? :–)

all best,
moritz

1 Like

Hello, I made a converter using sclang:

I recently used it to convert Fosc’s README.md to schelp. However, since the code was originally written to convert BiLETools’ README.md, I had to modify some parts manually.

I also used ChatGPT, but its conversion was not perfect either, so I still had to make some manual adjustments.

2 Likes

There is also GitHub - capital-G/scdoc_py: A python wrapper for the SCDoc parser · GitHub which is a Python wrapper for SCDoc - this should allow one to transform a md AST to a schelp AST.

3 Likes

Oh great, thank you both for your answers! @dscheiba From what I can tell, it’s not possible to write .schelp files with your parser, right? Do you have a hint how that could be accomplished?

@schmolmo
Could you test the following code? I gave ChatGPT, Claude, Gemini, and Grok only the task description and the reported errors, but none of them could complete the task. After I shared my initial code—the one shown above—Grok started producing something usable, although not the final version I am posting here. Before reaching that final version, I also reviewed the code myself and pointed out the incorrect parts. The whole process took about four hours. I am not sure whether that was productive, unproductive, or somewhere in between. Still, I think this could become a productive workflow in the near future.

(
var path, source, output, lines, level1Headings, processInline, convertCode;

// Set path to the file
path = "/Users/prko/Dropbox/prko/Downloads/untitled folder 2/test";
source = File(path ++ ".md", "r");
output = File(path ++ ".schelp", "w+");

// 1. Read all lines and preprocess (remove comments, convert images, remove <br>)
lines = [];
block {
	var line, trimmed, url, start, end;  // Declare all vars at the top of the block
	while {
		line = source.getLine;
		line.notNil
	} {
		// Remove <br> tags
		line = line.replace("<br>", "");
		
		// Strip whitespace only for checking (keep original for output, but trim for matching)
		trimmed = line.stripWhiteSpace;
		
		// Skip HTML comments (single/multi-line, <!-- ... -->)
		if(trimmed.beginsWith("<!--").not and: { trimmed.endsWith("-->").not }) {
			// Convert images without regex: check if starts with '![', extract url between ( and )
			if(trimmed.beginsWith("![")) {
				start = trimmed.find("(");
				if(start.notNil) {
					end = trimmed.find(")", start + 1);
					if(end.notNil) {
						url = trimmed[(start + 1) .. (end - 1)];
						
						// Replace entire line with image tag
						line = "image::" ++ url ++ "::";
					} {
						("Warning: No closing ')' found in image line: " ++ trimmed).postln;
					};
				} {
					("Warning: No '(' found in image line: " ++ trimmed).postln;
				};
			};
			
			lines = lines.add(line);
		};
	};
};
source.close;

// 2. Process Level 1 Headings and insert header tags if missing
level1Headings = lines.selectIndices({ |line| line.beginsWith("# ") });

if(level1Headings.size >= 1) {
	var titleText = lines[level1Headings[0]][2..].stripWhiteSpace;
	output << "title::" << titleText << "\n";
	output << "summary::fill this field\n"; // Fill with appropriate summary
	// output << "categories::fill this field\n"; // Fill with categories
	// output << "related::fill this field\n\n"; // Fill with related topics
	output << "description::\n";
	lines.removeAt(level1Headings[0]); // Remove the first h1 (used as title)
} {
	// Insert placeholders if no Level 1 heading
	output << "title::fill this field\n"; // Fill with document title
	output << "summary::fill this field\n"; // Fill with appropriate summary
	// output << "categories::fill this field\n"; // Fill with categories
	// output << "related::fill this field\n\n"; // Fill with related topics
	output << "description::\n";
};

// 3. Inline processing function (URLs, bold, italic, etc.) with nil/type safety and safe regex (no infinite loop)
processInline = { |str|
	var out;
	str = str ? "";  // Nil safety: default to empty string
	out = str;
	
	// Convert bold: __text__ -> strong::text:: (safe regex, single pass)
	if(out.notNil and: { out.size > 0 }) {
		var matches = out.findRegexp("__([^_]+)__");  // [^_]+ to avoid issues
		var j = 0;
		while { j < matches.size } {
			var full = matches[j][1] ? "";  // Full match
			var text = matches[j+1][1] ? "";  // Group1 (text)
			out = out.replace(full, "strong::" ++ text ++ "::");
			j = j + 2;  // Step by full/group
		};
	};
	
	// Convert italic: _text_ -> emphasis::text:: (safe regex, single pass)
	if(out.notNil and: { out.size > 0 }) {
		var matches = out.findRegexp("_([^_]+)_");  // [^_]+ to avoid issues
		var j = 0;
		while { j < matches.size } {
			var full = matches[j][1] ? "";  // Full match
			var text = matches[j+1][1] ? "";  // Group1 (text)
			out = out.replace(full, "emphasis::" ++ text ++ "::");
			j = j + 2;  // Step by full/group
		};
	};
	
	// Convert standalone URLs to link::url:: (safe regex, single pass, but skip if already in image::)
	if(out.notNil and: { out.size > 0 }) {
		var matches = out.findRegexp("https?://[^ ]+");  // Simple [^ ]+
		var j = 0;
		while { j < matches.size } {
			var full = matches[j][1] ? "";  // Full match (URL)
			if(out.contains("link::" ++ full).not and: { out.contains("image::" ++ full).not }) {  // Added check to skip image URLs
				out = out.replace(full, "link::" ++ full ++ "::");
			};
			j = j + 1;  // Step by match
		};
	};
	
	out
};

// 4. Code conversion function (existing) with nil safety
convertCode = { |source|
	source = source ? "";  // Nil safety: default to empty string
	if(source.notNil and: { "^`.*`$".matchRegexp(source) }) {
		"\ncode::\n" ++ source[1 .. source.size - 2] ++ "\n::\n"
	} {
		source
	}
};

// 5. Parse body (using iterator for safety) with additional nil checks
{
	var iter = lines.iter;
	var previous = nil;
	var current = iter.next;
	
	while { current.notNil } {
		current = current ? "";  // Nil safety for current line
		
		output << (
			case
			// Code blocks: ```supercollider or general ```
			{ current.stripWhiteSpace.beginsWith("```") } {
				var isSupercollider = current.stripWhiteSpace == "```supercollider";
				var codeLines = "";
				var codeLine = iter.next;
				while { codeLine.notNil and: { codeLine.stripWhiteSpace != "```" } } {
					codeLines = codeLines ++ codeLine ++ "\n";
					codeLine = iter.next;
				};
				"\ncode::\n" ++ codeLines ++ "::\n" // End with ::
			}
			// Reference links: [text]: url -> link::url##text::
			{ current.beginsWith("[") and: { current.contains("]: ") } } {
				var splitIndex = current.find("]: ");
				var name = current[1 .. (splitIndex - 1)];
				var url = current[(splitIndex + 3) ..].stripWhiteSpace;
				"link::" ++ url ++ "##" ++ name ++ "::\n"
			}
			{ "^=.*=$".matchRegexp(current) } {
				"title::" ++ previous ++ "\n" ++
				"summary::" ++ previous ++ "\n" ++
				"categories::" ++ previous ++ "\n" ++
				"related::" ++ previous ++ "\n\n" ++
				"description::\n"
			}
			{ "^-.*-$".matchRegexp(current) } {
				"\nsection::Introduction\n"
			}
			{ "^## ".matchRegexp(current) } {
				"\nsection::" ++ current[3..] ++ "\n"
			}
			{ "^### ".matchRegexp(current) } {
				"\nsubsection::" ++ current[4..] ++ "\n"
			}
			{ "^#### ".matchRegexp(current) } {
				"\nsubsubsection::" ++ current[5..] ++ "\n"
			}
			{ "^`.*`$".matchRegexp(current) } {
				"\ncode::\n" ++ current[1 .. current.size - 2] ++ "\n::\n"
			}
			{ previous == "" && "^\\*".matchRegexp(current) } {
				var subpartLineCurrent = iter.next;
				var text = "\nlist::\n\n##" ++ current[2..];
				
				while { subpartLineCurrent.notNil and: { "^[^#].*".matchRegexp(subpartLineCurrent) } } {
					var subpartLineNext = iter.next;
					subpartLineCurrent = if(subpartLineNext == "") {
						subpartLineCurrent ++ "\n::"
					} {
						if("^#".matchRegexp(subpartLineNext)) {
							subpartLineCurrent ++ "\n::"
						} {
							convertCode.(subpartLineCurrent)
						}
					};
					text = text ++ "\n" ++ (
						case
						{ "^\\*".matchRegexp(subpartLineCurrent) } {
							"\n##" ++ subpartLineCurrent[2..]
						}
						{
							convertCode.(subpartLineCurrent)
						}
					);
					current = subpartLineCurrent;
					subpartLineCurrent = subpartLineNext;
				};
				text ++ "\n"
			}
			{ "^##### ".matchRegexp(current) } {
				var subpartLineCurrent = iter.next;
				var text = "\ntable::\n\n##" ++ current[6..];
				while { subpartLineCurrent.notNil and: { subpartLineCurrent == "" || "^[^#].*".matchRegexp(subpartLineCurrent) } } {
					var subpartLineNext = iter.next;
					subpartLineCurrent = if(subpartLineNext == "" && "^#".matchRegexp(subpartLineNext)) {
						subpartLineCurrent ++ "\n::"
					} {
						convertCode.(subpartLineCurrent)
					};
					text = text ++ "\n" ++ subpartLineCurrent;
					current = subpartLineCurrent;
					subpartLineCurrent = subpartLineNext;
				};
				text ++ "\n::\n"
			}
			// General line: apply processInline (URLs, bold, italic, etc.)
			{
				processInline.(current) ++ "\n"
			}
		);
		previous = current;
		current = iter.next;
	};
	
	output.close;
	"Conversion finished.".postln;
}.value; // Execute the block
)

Oh right, I looked into the C++ sources of SCDoc, but it only goes one way, sorry.
You could write a AST → schelp if you a need native schelp document, see Representing Code · Crafting Interpreters on how to do this - but this is probably out of scope here.

I think the code is somehow convoluted (no offense)? You should think about how markdown organizes data/information and represent this in a class, this makes it easier to reason about it :slight_smile:

I think it is best to split the task into the following form:

  • Translate markdown into an AST (other languages have libraries for this, so consider those?). You would need a proper parser for markdown - regex is not enough b/c markdown can be nested, which are not properly capturable via regex (regex don’t have a memory - you can read into the limitations of regular languages). Writing a md parser is not trivial, that is why you should maybe pick a language which has one already implemented as a base.
  • Convert this AST to a schelp by using something like the visitor pattern from above. This isn’t too complex, at least compared to the parser from above.

Basically you are translating one formalized language into another, which is the work of a transpiler, e.g. TypeScript is a good example for this.[1]

To convert schelp to markdown, so the other way around, use the SCDoc parser in sclang (or the python interface above) and translate its AST into markdown.


My 2 cents: I found markdown not a good format for organizing multiple documents or larger documents b/c it was never designed to do this - use the right tool for the job, e.g. rST or latex, which were designed w/ this in mind. Use pandoc (or else) to transform your precise format down to markdown if necessary. Or just copy everything manually once the final document is finished^^ Sometimes this is easier than a clever solution.


  1. I have recently written a transpiler for translating sclang code to dyngen code, see DynGen/src/Classes/DynGenTranspiler.sc at main · capital-G/DynGen · GitHub and DynGen/src/HelpSource/Classes/DynGenTranspiler.schelp at main · capital-G/DynGen · GitHub if you are interested in a transpiler written in sclang - though there are some interesting limitations on what is possible. ↩︎

2 Likes

The idea done made the rounds (two or three, I think).

This is a pandoc writer for scdoc, markdown in, schelp out, tested light, what I had on hand from last time. The mapping isn’t one-to-one, and I won’t pretend it is. So, of course, it’s all in need of adjustments. Barely touched real-world files, needs testing and comments to be useful in general, so the code shift if somebody got a cleaner way to fill the bars.

It works in the sense that it compiles, it runs, and it converts markdown files into scdoc files. But since the formats don’t really fit, we need some convention. I don’t know much about the other attempts - maybe you guys got this.

(edit: to be clear, I’m not saying markdown can’t produce full schelp files. It can. The question is how, what conventions, what attribute syntax, what YAML keys. That’s the part I’d like input on.)

In theory, it could become part of Pandoc, since it is a writer module like other ones there, just not mature and still untested. This project is built as a stand-alone command-line application with pandoc as a dependency. Pandoc is a big project, so it can take a minute to compile.

1 Like

Thanks for the feedback.

I agree that some parts of the code are convoluted, and some are probably unnecessary. I left them as they were to show how the AI had modified my original version.

The code was generated by AI through a process in which I gave it tasks, tested the results, and provided feedback based on the errors.

Of course, I’m responsible for posting it—I’m just explaining the context, not making excuses.

Since I tested it on only two .md files, I can’t guarantee that it will work on other documents without errors. To do this properly, it would make more sense to use the schelp syntax directly rather than .md files that use only a subset of the SCDoc tags.


As for the Markdown conversion, I agree that using a proper parser or other dedicated tools would make more sense. I considered that approach, but I thought working in sclang would be quicker than learning a new toolchain and setting it up.

After reading @schmolmo’s response yesterday, I thought it would be interesting to see how far AI could get with the code if I kept guiding it.

1 Like

Just wanna add that your script is worth having in the open too, limits & all. Not every schelp file is a class doc with nested method/argument structure — for simple guides, a zero-install .scd you can evaluate right in your editor? That’s a virtue, not a shame. And the sketch is a useful jump-off for others who want to extend it in that direction. It takes some courage to post working code to a room full of strangers, and I’d rather see more of that than less. In fact, not specific to a particular forum, but in general

You got some good architectural advice: AST-based parsing is the right shape once you need to scale, and “crafting interpreters” is a good pointer for anybody who ain’t built one before. No argument on the technicals.

I did work on the pandoc-based angle from the Haskell side before — something I’ve mentioned already a few years, happy to share it again if it’s useful for the AST-to-schelp case. But I don’t think any of these attempts really in competition — different slices, same problem. Good to have all of it in the open.

2 Likes

Just an update on the pandoc writer for .schelp files. Still experimental, but it has grown beyond the narrow markdown converter it started as.

More inputs work now. Markdown, reStructuredText/RST, and org-mode are the tested ones so far, but other pandoc-supported formats can be selected with -f as well. The same metadata fields are supported: title, summary, categories, related, redirect, and keywords — but they can now be read from YAML frontmatter in Markdown, RST docinfo, or org-mode #+KEY: lines, then emitted as scdoc key:: header. Sections named Description, ClassMethods, InstanceMethods, and Examples map to Sscdoc directives instead of falling back to generic section:: entries.

As before, methods and arguments can be marked explicitly with pandoc attributes — ## ar {.method}, ## freq {.argument}, but now the writer can also infer them from heading structure and section names.

See the example/ folder, README, and CHANGELOG for details. Feedback welcome as always.

3 Likes

Thank you very much for posting this, I’ll check it out as soon as I have some spare time! :–)

All best,
Moritz

1 Like