To get a sense of how to interact with chatGPT to write code collaboratively, I started a small Google Appscript project that I’d been thinking would be useful to me. I built and maintain a Google Sheets add-on that has a bunch of useful utility functionality, databases, combinatorics, data shaping and normalization, set operations, etc, and wanted to add something (the capability of easily building UI in Sheets that would invoke macros or functions as a no-code approach).
I described the functionality I wanted, the API, and the intent and got a bunch of code from chatGPT, including lots of stuff I would have had to look up onerously to do this. It didn’t actually work. I asked ChatGPT why it didn’t work. It apologized and wrote some more code, diagnosing what it had gotten wrong (why didn’t it just ask itself the question of whether it would work after it generated its first attempt and move on to the more refined answer itself?)
It proposed at least 1 kind of solution I would not have thought of for a security domain issue, and many fixes that were just plain wrong for other issues. At this point, I was pretty impressed with the ability to move quickly forward in collaboration with it. I found myself asking “Please, now can you….” really anthropomorphizing it.
As I kept refining, and getting more and more convolute and much incorrect code which I had to understand to diagnose, and had it build debugging bootstrapping into it, my opinion got more muddled.
At this point, it seems to me to be a very prolific, savant-ish in some ways but generally short-sighted assistant that makes really stupid mistakes frequently and writes much incorrect, duplicative, poorly factored, and hard-to-understand code, at least on its first pass. It is very sure in its responses and then, when something doesn’t work, very sure about its fixes, which leaves me confused about the authoritativeness of its proposals—I don’t know what to believe and what to be skeptical of.
I do feel like there’s a real benefit to be had in this collaboration, but I need to be able to use it better than I have, perhaps changing more of its code myself by hand, which requires understanding what it’s done (that’s ok, but far from a no-hands approach). I also get the sense that it is hamstrung by not being able to run the code itself and observe whether it works or not and then iterate without me. Surely that’s coming. Looping on its own results without my having to input code and run it would surely help. As would it building in tests and diagnostics/unit tests it can check. Perhaps I need to ask it to do that. But I’m scared of going further down the rabbit hole!
I feel like I want to collaborate with this prolific coding assistant, but am resentful that it generates reams and reams of code I then need to sort of understand and debug. This isn’t helped by the substandard diagnostic/error/debugging available in AppScript, I realize.
I also tried Bard for the same task with similar prompts. It was much worse. It just hallucinated functions that didn’t exist, outside of maybe an incomplete and downvoted StackOverflow entry, and which it didn’t implement. I didn’t give it as much of a chance to redeem itself though.
I’d be curious as to your experience! Please chime in in the comments and tell me how to use this better. Perhaps at the pace of improvement of generative AI, I just need to wait 2 weeks and it will do everything I want. Before even being asked.
Marc Meyer is a Silicon Valley technologist, founder (6 startups, 4 exits, 1 IPO), engineer, executive, investor, advisor, teacher, and coach. He has invested in and advised over 150 companies. He advises and works with accelerators and funds including Alchemist, 500 Startups, HBS Alumni Angels, and Berkeley SkyDeck, where he chairs the Advisor Council. He has an Executive Coaching and Advising practice helping leaders achieve their greatest potential.