I.e. you must not read Lisp data at run time, if it contains symbols, because that will call intern.
> You should avoid using a list as anything besides a container of elements of like type.
I could reduce this guide by a good 30% with "You should avoid using Lisp as anything as Go or Java".
But that could be seen as defining a macro, which you must seldom do.
This approach is one of the things that make Lisp Lisp; if it gives you an allergic reaction, use something else.
And thank dog for that.
1. No more than 17 functions handle this datum, spread among no more than three source files.
2. The structure contains no more than 8 conses.
3. In a long-running application under a typical production load, no more than 10,000 of these objects are freshly allocated in any five minute period.
4. A major software component (such as a library) can internally have at most three separate instances of such a data type, and they are not to be involved in the APIs between major software components.
Okay, that's now a target we can enforce without wishy washy judgment calls.
Sure, you can have an exact set of rules like that, and feel free to have an automated enforcement of your own set of exact rules. There are good reasons coding style guides often include things which are not exact rules, and the target for them is often not automated enforcement but supporting human judgement that balances multiple factors. Yes, that results in fuzzy boundaries, but it's because experience has both shown that there is an issue but has not provided (yet) sufficient basis for a quantifiable boundary, because the set of factors being balanced is complex and multidimensional. Reducing the dimensionality for simplicity of automated enforcement is easier, but not necessarily better.
So what happens when you have a datum that is used in exactly 17 functions, but you need to add another feature?
Or, what happens when people combine functions into larger ones to avoid having to define a real datatype, or...
These concerns exist for each thing.
"Don't use a list where you really have a struct" is much more concrete and quantifiable.
You have to add your feature in such a way that the resulting code meets the rules.
It's exactly the same like when you need to add something to a line that is already 79 characters long (maximum allowed by your coding convention). Or if you have to add lines of code into a function that is already 200 lines long (coding style max). Or if you need another argument in some API that already has the maximum of 8. You have to step back to some extent and change more of the surrounding program, than just shoving your intended change into it.
> Or, what happens when people combine functions into larger ones to avoid having to define a real datatype
If the code remains under the quantified limit for function size, then it is complying with the document.
> "Don't use a list where you really have a struct" is much more concrete and quantifiable.
There are two answers to "where do you really have a struct?" One is purely opinionated, and one is objective. The objective answer is: "you really have a struct in all situations where the list isn't a variable length container of items of the same type".
So the concrete and quantifiable (and therefore the right) interpretation of the rule amounts to never using a fundamentally characteristic Lisp technique, in Lisp!
MAXIMA (née MACSYMA) is known for lists-as-objects not only being depended upon, but also getting so out of hand it’s nearly impossible to refactor now. MAXIMA at least has the excuse of being very old software.
I feel you’re interpreting this style guide as some kind of draconian law of writing Lisp code at Google, and making awful conclusions from it (e.g., “well if you’re allergic to objects as lists might as well not use Lisp” or “most of this can be summed up as writing Lisp like Java”).
I don't disagree with that; a coding standard that is not enforceable without generous judgment calls is far less useful than a rigorous one.
Make it exact, and then get everyone to stick to it.
The basic mentality was if you couldn't responsibly follow the style guidelines after working there for a year, then you should be looking for work elsewhere.
The one exception I can think of is procedures with variadic arguments.
(There is at least a coffee machine, health benefits, and a halfway ergonomic chair to sit on, hopefully.)
The lisp I use (scheme) does not have pattern-matching built-in and nobody has made a good enough case to add it as a special form.
There's your problem. Here's a nickel (toss). Get yourself a real Lisp.
or use a non-interning reader. Clojure does exactly that - instead of clojure.core/read, you use clojure.edn/read to read data without running it as code
According to this coding guideline, you cannot develop a .fasl format that is made of Lisp read syntax, or exploit Lisp for sophisticated, structured data formats in general.
It seems that the reader returns symbols just fine.
((juxt type identity) (clojure.edn/read-string "x")) [clojure.lang.Symbol x]
The documentation for EDN says that "nil, booleans, strings, characters, and symbols are equal to values of the same type with the same edn representation." The only way two symbol values can be equal is if they are actually same symbol, I would hope.
Why is this important? Specifically, why do symbols need to be interned?
In Clojure, "Two symbols are equal if they have the same namespace and symbol name." In general, "Clojure’s = is true when comparing immutable values that represent the same value, or when comparing mutable objects that are the identical object." 
Because symbols are used to refer to things, whether or not they are mutable can be blurry. You can make symbols as immutable as you want, but as soon as you make one of those symbols a key which refers to a mutable object, such as a global environment, then effectively, the symbol appears as a gateway to something mutable, and you can't necessarily tell whether the mutability is in the symbol itself or something beyond it.
For instance, let's consider global variables. The definition of a global variable has an effect which we can inspect if we have a boundp function:
That can be made to work by mutating the symbol (the global binding information can be right inside the symbol). Or it could be working by keeping the symbol immutable, but mutating some hash table of bindings.
(boundp 'x) -> nil (defvar x) (boundp 'x) -> t
Either way, the symbol looks interned, because we have mentioned it several times, and those mentions seem to be connected. The (defvar x) has an effect on (boundp 'x) and so they are referring to an x which is somehow the same.
It could work with x actually be a kind of character string, which got separately allocated three times. As long as we can't show any property of the system indicated by x to be different based on which copy of x we are using to enquire (e.g. boundp reports true for one x and false for another), then x looks interned.
With that said, I always thought symbols would intern, but that's not the case. It is true with keywords, however.
(identical? (clojure.edn/read-string "x") 'x) => false
(= (clojure.edn/read-string "x") 'x) => true
(identical? (clojure.edn/read-string ":x") :x) => true
They are, at best, cargo culted symbols: character strings with a tag bit which says "read/print me without quotes, so I visually look like something out of Lisp".
Whether Clojure's object model and equality semantics as a whole make sense is certainly up for debate. It's highly opinionated and no silver bullet.
But once it's in place, the decision of whether to intern symbols is a trivial implementation detail.
I incorrectly assumed they were interned for seven years of using Clojure professionally, it has never made a difference, and I can't come up with a scenario where it plausibly would.
In other Lisps the detailed semantics of symbols are more important including the identity/interning thing.
Rich Hickey was a Common Lisp user before making Clojure so there's a fair chance he knew how symbols worked there, so the cargo culting characterisation should be applied only light heartedly :)
With this is mind, you could stop interning keywords and damn near every Clojure program would continue to work just fine - but with a noticeable slowdown.
Or, more sensibly and to bring it back to the theme of the thread, for adding a second non-interning Keyword type which can safely be generated while deserializing user input in a long running process, that you can use interchangeably with standard keywords, but will be garbage collected away with the reset of the deserialized data when you're done.
You do pay a hefty penalty here because you're hiding everything behind interfaces and abstractions. It's totally fine to not like the system, or believe it's not worth the performance hit.
But it does mean that a potentially equal but not identical symbol isn't some off brand low quality replacement as GP suggests, it's just... a symbol.
Pastebin demo: https://pastebin.com/cbWiNyEL
The default test function is EQL, which is using EQ to test symbols. In Common Lisp #:a would be an uninterned symbol with the name "A".
> (eq (read) (read)) a a T
setting the value of a symbol will basically work in all Lisps with symbols in similar fashion like this:
> (find 'a '(#:a a)) A > (find 'a '(#:a a) :test #'string-equal) #:A
This last example will for example run unchanged in Emacs Lisp and Common Lisp.
> (dolist (item '(a b c a)) (set item (if (and (boundp item) (numberp (eval item))) (1+ (eval item)) 1))) NIL > (mapcar 'eval '(a b c a)) (2 1 1 2)
Though a typical use is in macros, where macros introduce new symbols and these should never clash with any existing symbol and to which there should be no access via the name.
Example: A macro which writes the form, the value and which returns the value. GENSYM generates a named/counted uninterned symbol.
If we look at the expanded code of an example, we can see uninterned symbols:
> (defmacro debugit (form &aux (value-symbol (gensym "value"))) `(let ((,value-symbol ,form)) (format t "~%The value of ~a is ~a~%" ',form ,value-symbol) ,value-symbol)) DEBUGIT
We can also let the printer show us the identities of these symbols, labelling objects which are used multiple times in an s-expression:
> (pprint (macroexpand-1 '(debugit (sin pi)))) (LET ((#:|value1093| (SIN PI))) (FORMAT T "~%The value of ~a is ~a~%" '(SIN PI) #:|value1093|) #:|value1093|)
Thus we can see above that it's just one uninterned symbol used in three places.
> (setf *print-circle* t) T > (pprint (macroexpand-1 '(debugit (sin pi)))) (LET ((#2=#:|value1095| #1=(SIN PI))) (FORMAT T "~%The value of ~a is ~a~%" '#1# #2#) #2#)
> (debugit (sin pi)) The value of (SIN PI) is 1.2246063538223773D-16 1.2246063538223773D-16
Interesting that gensym returns uninterned symbols, thanks.
Keep in mind that this is a guide from a Lisp using company (bought by Google) who wrote specifically two large applications partly, but significantly, in Lisp: a search engine for flight travel and an airline reservation system. Other application teams may have different rules&requirements, given that they may use Lisp in very different ways.
I know the story of ITA well, it and PG's writings are what got me interested in lisp in the first place. Which makes me feel old. But not comp.lang.lisp old, it's all relative!
Interning is used outside of LIsp. See the XInternAtom function in the X Window system:
or RegisterClass in Win32:
Atom XInternAtom( Display *display, char *atom_name, Bool only_if_exists );
ATOM RegisterClassA( const WNDCLASSA *lpWndClass );
It's interesting that despite this keywords are serialized all the time in Clojure land (eg in the transit format that is commonly used for frontend/backend communication).
Most json libraries will convert string keys to keywords, and they're not weak references.
An attacker can probably just send a few dozen gigabytes of random json to the average Clojure app and it's going to go down.
If the 'reader' reads an s-expression like (EMACS LISP IS A LISP DIALECT) then both occurrences of LISP are the same identical Lisp object, both are the same symbol.
If your language is doing something different, then it's not using symbols like Lisp-like languages usually do since the dawn of time.
Even AutoCAD's AutoLisp (the old one from the 1980's) has interned symbols.
How symbols work goes back all the way to the original MacCarthy work, and all of its actual (not cargo-culted) descendants.
It is not "Common Lisp" elitism.
With that in mind, however, basing your critique of Clojure on the extent to which it carries the lisp tradition is bizarre. Your criticisms are born of an ignorance of the value proposition Clojure provides, which would not be terribly different even if it had eschewed lisp syntax in favor of something else.
If you actually learned Clojure, there is zero chance you'd be complaining about symbol interning. It's just so ridiculous. You'd probably still think the whole thing is a waste of time, and I'm sure you'd have a big long list of actual, meaningful complaints.
I've seen people criticize TXR for its ugly syntax once or twice here on HN (I pay close attention to lisp posts here), and I thought that was dumb at the time. I'm not interested in learning it but I'm glad you're trying something new. It's a shame to see you stoop to the same level of drive by dismissal.
But whatever. Let's flip each other's bozo bits and move on.
A remark was made somewhere that interned symbols are held with a non-weak reference. But it occurred to me that this isn't something engraved in stone. A package should be able to hold on to its interned symbols via weak references. This means that if the only reference to a symbol is from within a weak package, that symbol can be removed from the package and relclaimed by the garbage collector.
Since a package uses hash tables, and hash tables support weak keys, it's trivial to put the two together. I added an argument to make-package to specify a weak package.
In the following test, the symbols interned into package foo get reclaimed because it is weak. Those interned into bar don't get reclaimed:
I'm committing to this as a documented feature.
(defun weak-package-test (name weak) (let ((wp (make-package name weak))) (let ((*package* wp)) (let ((obj (read "(a b c d e f)"))) (mapcar (op finalize @1 prinl) obj))))) (weak-package-test "foo" t) (sys:gc t) (weak-package-test "bar" nil) (sys:gc t) $ ./txr weak-package.tl foo:a foo:b foo:c foo:d foo:e foo:f
Clojure is a shitty substitute for CL, but that's just not what it is trying to be. It is an interesting and worthy system in its own right.
Don't use code to calculate character strings, which are then converted to symbols via INTERN. The main exceptions to this rule are structs (which generate slot-reader functions by combining the structure name and slot names).
Macrology which calculates names using code, which are then supposed to be explicitly referenced in code, is pretty stinky.
Here, you're suposed to Just Know that the above is referenced as blob-foo and not foo, because internally it catenates "BLOB-" onto (symbol-name 'foo), and calls intern on that.
(define-blob foo ...)
to produce the symbol FOO but no other symbols.
That's the problem of code guides - they try to avoid problems with some abuses, but to make a good judgement when some rare decision is justified is hard. So the guide misses the mark by making an approximate limitation - often on the safe side.
It has benefits to write on boring, safe subsets of languages. Still writing code guidelines is hard.
> Not only does [INTERN] cons, it either creates a permanent symbol that won't be collected or gives access to internal symbols. This creates opportunities for memory leaks, denial of service attacks, unauthorized access to internals, clashes with other symbols.
It even has some advice on using wrappers for INTERN if you really need it.
The document has provisions for exceptions to the rules. There's discussion about using EVAL despite the fact the rule says you must not use it.
Also, "should avoid" means that you need a good reason, addressed in a comment and code review. Many examples you're probably thinking about are easily containers of elements of like type anyway (allowing mild cases of sum types and such). Though things tend to be more robust with intentionally created data types, I find.
If you want to avoid having the symbols stick around forever, you can create a temporary package with MAKE-PACKAGE and then use DELETE-PACKAGE when you don't need it anymore.
> Good-bye, code-is-data.
Could you regard that as a list of AST nodes?
You should see their C++ style guide, it basically bans most of modern C++. Unsurprisingly enough, Google C++ libraries (like Tensorflow, or GoogleTest) are some of the ugliest open source C++ libraries out there.