That section is short and I encourage you to read it in full. Here's what it says about how removing the GIL affects existing Python code:
• Destructors and weak reference callbacks for code objects and top-level function objects are delayed until the next cyclic garbage collection due to the use of deferred reference counting.
• Destructors for some objects accessed by multiple threads may be delayed slightly due to biased reference counting. This is rare: most objects, even those accessed by multiple threads, are destroyed immediately as soon as their reference counts are zero. Two places in the Python standard library tests required gc.collect() calls to continue to pass.
That's it with respect to Python code.
Removing the GIL will require a new ABI, so existing C-API extensions will minimally need to be rebuilt and may also require other changes. Updating C-API extensions is where the majority of work will be if the PEP is accepted.
This will be an opt-in feature by building the interpreter using `--disable-gil`. The PEP is currently targeted at Python 3.13.
> existing C-API extensions will minimally need to be rebuilt and may also require other changes.
This vastly understates the work involved. Most C extensions and embeddings will require major structural changes or even rewrites. These things are everywhere and are a major reason for python's popularity. A typical financial instition, for example, will have a whole bunch of them, with a morass of python code built on top. Many of these companies took ages to transition to python 3 (and many still have pockets that haven't!). For them to remove GIL reliance from their C extensions is a much bigger ask than migrating to 3 was, and their response to being asked to do it is likely to be blunt.
The fact that only a small minority of developers involved in python directly use the C API (and fewer understand the consequences of GIL removal for them) means the issue tends to be overlooked in discussions like this, but it's the reason a GILectomy will be worse than 2 to 3. In practice it's likely to end up being either a fork or a mode you have to switch on/off, both of which would be miserable for everyone.
The GIL is part of python's success story. It made it easy to write extensions, and the extension ecosystem made the language popular. Every language doesn't have to converge to the same endpoint. Different tools are suited to different jobs. Let it be.
First, if the extension isn’t being used in a multithreaded environment, nothing should change. Yes, it isn’t thread safe, but it doesn’t really matter in that context. And given how bad GIL-Python works with threads, I doubt the majority of extensions are written for multithreaded applications.
And the ones that are written for multithreaded are probably already releasing the GIL for long running computations, so they should be written with at least a little bit of thread safety in mind. And in the worst case it shouldn’t be too difficult to hack the code to include a GIL-like lock that needs to be held by library users in order to ensure thread safety without really changing the architecture of the extension.
There is no way of knowing whether a Python module is used in multithreaded environments or not. The point is that C extensions that previously worked fine in multithreaded environments probably will not work fine in multithreaded environments with the GIL disabled.
I'm sure you know this, but as a general psa: threads are the lowest common denominator way of doing this kind of thing. Where possible, it is nicer to use epoll/kqueue/whatever the windows equivalent is. For a library though I totally get how annoying it would be to interface with the platform event system on each os. Indeed, if there isn't already a python library that can handle subprocesses for you on a single thread then it might be something worth building (I don't use python personally so I don't know the ecosystem).
> Most C extensions and embeddings will require major structural changes or even rewrites.
A lot of those important "extensions" are actually C and C++ libraries designed for parallelism in their native use which have been made available in Python via bindings (e.g. Pytorch). The cores of these libraries are fine, only the binding layers (often automatically/semi-automatically generated) may need to be updated. I suspect only the C-API extensions designed from scratch to only be extensions are going to have major problems with a no-GIL world.
> Every language doesn't have to converge to the same endpoint. Different tools are suited to different jobs.
People have decided they want to use Python frontends. Plenty of alternative languages could have won out, which would have provided native threading, faster runtimes etc, but they didn't; we are stuck with Python. The continued existence of the GIL is incredibly restrictive on how you can leverage parallelism both in Python itself and in extensions. Only in the simple cases can you make the Python<->extension boundary clean: the second you want to be able to call Python from extension code (e.g. via callbacks or inheritance) the GIL stands in your way.
For every popular, well-engineered extension, I suspect there are a dozen hacked-together ones which will break the moment the GIL guarantees disappear.
I'm sure there are a lot of hacky extensions out there (particularly hiding in proprietary codebases), but letting the entire future of Python be held hostage to the poor SWE choices of third parties who almost certainly contribute nothing back is not a sustainable path.
But it is a programming language. You do not break backwards compatibility lightly. Nobody has ever chosen Python for its runtime performance. Unfortunately, sometimes you have to accept the technical debt cannot be escaped.
> Nobody has ever chosen Python for its runtime performance.
No, they choose it for the ease of using its performant extensions. And those extensions are fundamentally limited in performance by the existence of the GIL. And the authors of those extensions (and their employers) are behind the work to get rid of it.
There are three groups here:
1. "Pure" Python users, whose code makes little/no use of extensions. GIL removal is an immediate win for these uses.
2. "Good" extension users/authors, whose code will support and benefit from no-GIL. GIL removal is an immediate win for these uses.
3. "Bad" extension users/authors, whose code doesn't/can't support no-GIL. GIL removal probably doesn't break existing code, but makes new uses potentially unsafe.
Maintaining the technical debt of the GIL indefinitely so that group 3 never needs to address its own technical debt is not a good tradeoff for groups 1 and 2 which actively want to move the language forwards.
The above arguments could apply to any breaking changes that a host wants to implement. Those who don’t directly consume the change, those who can adapt, and everyone who is fine with the status quo.
Python is free to do as it pleases, but this breaking change is going to result in a lot of churn.
> GIL removal is an immediate win for these uses.
As a minor aside, the GIL free Python version was actually a performance regression, somewhere between 5-10% impact. https://peps.python.org/pep-0703/#performance . So, an immediate loss for nearly everyone.
No, this is false. Relying on the GIL isn't bad practice in a python extension. It's something you have no choice but to do. The GIL is the thread safety guarantee given to the extension by the python intepreter. It's the contract you have and you have no choice but to code to it. Removal of the GIL requires an alternative contract representing a finer-grained set of thread-safety guarantees for people to code against. Programmers who coded against the contract that existed rather than somehow divining the future and coding against a different contract that hadn't been invented yet (while also managing to make it compatible with the existing one) weren't doing something wrong.
But besides being incorrect, the idea that extensions that break on removal of the GIL are "bad" is irrelevant. The question is what the ecosystem will bear. The people who own the codebases I'm talking about are not pushing for GIL removal. They have large, working codebases they're using in production and the cost of redesigning large chunks of them would be real. GILectomy is being pushed by people with varied motives but they certainly don't speak for everyone and their case isn't going to be made successfully by casting aspersions on those with different priorities.
Relying on a published guarantee of CPython is not technical debt!
What will happen is that extensions that don't work with the new behavior will be assimilated by the proponents. Support for the new behavior will be hacked in, probably with many mistakes.
The bad results will then be marketed as progress.
it is now! just like that bill you didn't know you owed until it came in the mail, technical debt builds up regardless of when you know when it's owed.
keeping a gil when entering and exiting c extensions seems like an obvious steppingstone.
That doesn't contradict what the GP said. Those modules are largely written in C, not Python. And that is because nobody would choose Python for its performance.
> And that is because nobody would choose Python for its performance.
Python the runtime may not be the most performant by itself, but python the ecosystem only took of because of performance hacks like those popular modules, pypi, cython, ... .
> Those modules are largely written in C, not Python.
Yet people choose C to implement those modules because Python is not performant enough and isn't that a nice thought? Lets do something good for humanity and drive another stake into that old abominations heart by making it less relevant.
What? Pypy and cython are hardly the reason for python "taking off".
The issue with python is that the ecosystem is so big, everyone thinks their niche is the one python was "built for". Data science, web backends, sysadmin, app scripting, superclusters, embedding... They all think they're the biggest dog, when the reality is that they're more or less all the same. Pypy and cython are probably popular in some of those niches, but python is not just about them.
The reality is that python "took off" because of two things:
- the syntax
- the ease of interfacing with other worlds, typically (but not limited to) C/C++
Removing the GIL seriously threatens the second reason, because integrations that used to be fairly easy will have to be fundamentally rearchitected. Moving from Python 2 to 3 was trivial in comparison, and it took a decade; this transition might take much longer, or never really happen, with massive losses for the ecosystem. And all for what, some purist crusade that interests only a fraction of the whole community...?
Come now, how many people compile python to use it?
How many people will be surprised / upset that they have to do that once `--disable-gil` is mainlined into the default builds on python.org?
How long before 'build it yourself and use --disable-gil' becomes 'we've enabled --disable-gil by default for Great Victory for Most Users'?
I think it's very very naive to think that the flag will remain an obscure 'use if you want to build python yourself' if the PEP is successful.
> The global interpreter lock will remain the default for CPython builds and python.org downloads.
Is a joke. It means, for now. For the work in this PEP.
I mean, you've literally got the folk from numpy saying:
> Coordinating on APIs and design decisions to control parallelism is still a major amount of work, and one of the harder challenges across the PyData ecosystem. It would have looked a lot different (better, easier) without a GIL.
That isn't 'for a few people who are vaguely interested in obscure technical stuff'; this is the entire python data ecosystem, saying "this will be the default in the future because the GIL is a massive pain in the ass".
> If it doesn't work for you, use the GIL enabled.
Sure, it can be opt-out; but you're fooling yourself if you think this is going to be 'opt-in'.
It will be the default once it's completed, sooner or later, imo.
...and when the 'default' build of python breaks c-extensions, that is a breaking change, even if technically you can build your own copy of python with different compile flags.
Long term? I doubt it. Who’s going build those packages?
Two major parallel incompatible implementations of Python, maintained at the same time, duplicating all efforts at testing, dev, build, release train, etc. while both are maintained.
Does it sound familiar?
Even ignoring the obvious parallels to Python 3000, there’s a limited amount of time people will be bothered maintaining both.
Probably in the long-term the two build-mode will be merged and there will only be a "nogil" build-mode.
Which doesn't mean there will be no GIL. Contrary to popular belief the goal of PEP 703 is not to remove the GIL, but only to be able to disable at runtime if you need free threading, so it is an opt-in feature.
In the interim, it seems that Conda have volonteered to build the extensions that are frequently used in the scientific community.
Could you elaborate on how the Python ecosystem will suffer if these financial institutions get left behind? What sorts of contributions come from these users in particular?
Have you seen PIP, conda, asyncio from 3.4 to 3.8, the standard library documentation, the great 2 to 3 migration, type "annotations" and mypy, the variable scoping "rules" or the multiprocessing module?
We python users are more than ready for whatever pain we have to deal with in the future, we are used to misery we live it every day.
True Pythonistas love misery, almost as much as we love kvetching. Without the misery, we'd have nothing to talk about. Don't get me started on typing.
> Most C extensions and embeddings will require major structural changes or even rewrites.
Sam claims otherwise: "Most C API extensions don’t require any changes, and for those that do require changes, the changes are small. For example, for “nogil” Python I’m providing binary wheels for ~35 extensions that are slow or difficult to build from source and only about seven projects required code changes (PyTorch, pybind11, Cython, numpy, scikit-learn, viztracer, pyo3). For four of those projects, the code changes have already been contributed and adopted upstream. For comparison, many of those same projects also frequently require changes to support minor releases of CPython."
I'd also like to point out that the GIL does not inherently make C extensions thread safe. It only protects the interpreter itself. For example, calling in to the C API can already release the GIL as I explained in reply to /u/bjourne here:
The standard build that you get at python.org will work exactly as now (so with a Gil).
The disable-gil build (which will most likely be available through conda or similar) will have the option to run either with the GIL or without. It will automatically detect extension that are not compatible and switch to Gil mode but you can override this with an environment variable.
> This vastly understates the work involved. Most C extensions and embeddings will require major structural changes or even rewrites.
Why?
Once you have a threaded Python, you can allocate one thread to run with a GIL and the other threads without it. Old extensions can access one thread with the old GIL API and new extensions can access all the things with the GIL-less API.
People who want the performance will rewrite their extension. People who don't, won't.
I don't think that is possible to activate the GIL only for one thread, it has to be activated process-wide.
But, if you want free-threading but you have to use an old extension that is not GIL-less, then you still can, by garding all access to this extension with a lock and setting the PYTHONGIL variable to 0.
But, you will need to compile the extension with --disable-gil or to obtain that from somewhere, if the extension author doesn't provide it, as the ABI of "standard" python and "--disable-gil" python is different.
How --disable-gil builds of python and of extensions will be distributed is not yet totally clear. I don't think pypi has already a clear idea of how to handle those extension, for instance.
That's not quite good enough - people use threads to e.g. permit multiple blocking i/o operations to be in flight at once, which works perfectly well right now. So you're going to want to support multiple GIL threads if you want to be able to support existing code safely.
Sadly, you can't just trivially bolt on thread safety to code that was not designed with it in mind.
However, if you go the other way and introduce e.g. an @nogil decorator, similar to that seen in some cpython alternatives, people will have a straightforward path to opt in incrementally as they fix and verify the critical parts of their code, while preserving known working behaviour elsewhere without having to throw entire complex systems over the wall at once.
would a fork be so bad? maybe I'm off but I feel like most people have no real need for GILless and would leave it off if they were even aware it existed. Those who would need it the most are the most equipped to handle the cost of adoption.
And the probability of a given piece of C code that was written for a thread safe environment working when called from multiple threads at once is pretty low for anything not 100% purely functional.
Honestly, that’s not my biggest worry. My bigger worry is that now Python variables your extension is reading can be changed by another thread in real time. Before if you held the GIL, you knew nothing would change while you poked some Python datastructures.
That's not the implicitness I'm talking about. While it sounds like loading old C modules will just re-enable the GIL, the problem is that they will never be updated to not rely on the old Python concurrency model. All that C code was written implicitly assuming that certain blocks of code were surrounded by a GIL.
It could be a real headache for any hoped-for transition to nogil Python if lots of GIL-reliant C code is floating around where there's little hope of updating it without having to worry about subtle bugs popping up. And even if the conversion was risk-free (which I doubt), many organizations will still not want to dig into their legacy C codebases and make significant changes.
I'm the author of a Python C extension, and having the GIL gone will be a lot of work. Code currently looks like this:
* C function called with GIL held
* Extract data needed to do work
* Release GIL
* Do work
* Reacquire GIL
* Modify data, build result
* Return
As an example of the changes, a list could be passed in. I would need some form of locking while processing that list so that mutations while processing won't crash the code.
The GIL does currently result in robust code by default because data can't mutate underneath you. Without the GIL the code will appear to work, but it will be trivial for an attacker to use mutations to crash the code. Expect huge numbers of CVEs.
Nothing between Release GIL and Reacquire GIL needs to change. Depending upon your extension, possibly nothing needs to change for the other steps either. Per Sam Gross:
> Most C API extensions don’t require any changes, and for those that do require changes, the changes are small. For example, for “nogil” Python I’m providing binary wheels for ~35 extensions that are slow or difficult to build from source and only about seven projects required code changes (PyTorch, pybind11, Cython, numpy, scikit-learn, viztracer, pyo3). For four of those projects, the code changes have already been contributed and adopted upstream. For comparison, many of those same projects also frequently require changes to support minor releases of CPython.
If you're using the GIL to protect access to non-Python objects, that will need to change.
The PEP mentions a future HOWTO on how to updating existing extensions. I wish that were already written.
There's disagreement among Python maintainers in the discussion thread on the PEP in how much work will be involved and I don't expect to resolve it here.
As far as I know PEP703 includes provisions to make operations on containers thread-safe. That should at least avoid most crashes.
Borrowed references can be more problematic, but it seems most cases could be fixed by replacing GetItem with FetchItem calls.
Overall, as another C Python extension, I don’t really think it’s going to be that much of a pain. In fact, I could even get away with no changes (other than build fixes and such) if, for example, I guarantee that each “main object instance” is only accessed by one thread, and I’d still get a lot of benefits from the nogil.
Whether the individual operations are thread-safe is irrelevant. For the code to be thread-safe it needs to acquire lst's lock before the if statement and release it afterwards.
Your example is not thread safe because the GIL does not protect critical sections like that. The GIL only protects the Python internal interpreter state. PyList_SetItem discards a reference to the item being replaced. If the ref count of the item becomes zero, the item's destructor is run. The destructor can release the GIL.
IOW, your example is not much different than the equivalent Python code:
The "Defining Extension Types: Tutorial" also includes this warning in an example that uses Py_XDECREF:
But this would be risky. Our type doesn’t restrict the type of the first member, so it could be any kind of object. It could have a destructor that causes code to be executed that tries to access the first member; or that destructor could release the Global interpreter Lock and let arbitrary code run in other threads that accesses and modifies our object.
Yes, you are absolutely right, but that doesn't change the fact that many extensions are written as if innocuous function calls like PyList_SetItem won't context switch. It mostly works fine since custom destructors releasing the GIL are extremely rare.
I dunno, I'd rather have a reliable bug than a heisenbug.
Any class with a __del__() method is going to release the GIL. That's not uncommon. My intuition is that threaded Python is rarer than classes with a __del__() method.
I don't know about the complexity of your extension, but the PEP provides per-container locks ("This PEP proposes using per-object locks to provide many of the same protections that the GIL provides. For example, every list, dictionary, and set will have an associated lightweight lock. All operations that modify the object must hold the object’s lock").
This is automatic via the PyList_GetItem/SetItem API, so I guess the error you're talking about is that you read a list y = [A, ...], your code reads A and copies the data to (new) A2, and then you iterate over it again and see that A != A2 because another thread has modified y?
> This is nothing like the Python 2 > 3 transition.
Well what py2to3 promised was also not what happened and by far one of worst disasters of a migration in history of open source so you can't blame people for being skeptical
He, the Perl 5 to 6 migration is already forgotten? I think this says something about how well it went, because at the time Perl was more popular than Python.
There was no Perl 5 to Perl 6 migration. Perl 6 was announced, a bunch of design work happened on it, and then it became a different language run by different people rather than a version of Perl. People are still writing Perl code extensively, and Perl 5 is still maintained.
I would expect Python to follow a similar language split. People were disappointed at how few breaking changes occurred in 2->3. If there was going to be a 3->4 migration, will be a large number of proposals to correct other deficiencies in the language.
... I mean if you want to compare that it had exactly same problem so it is another evidence it's not something specific to Python, it's just utterly terrible way to go ahead.
Only difference is that they noticed it's a bad idea midway and went back to developing Perl 5
And I still can write "use v5.8" in my Perl script and write script that works in Centos 5 instance from 2007, if I needed to do something on legacy system, and it will work the same on latest Perl.
> because at the time Perl was more popular than Python.
I don't believe that was true at the time as P6 took like decade to decide on semantics and only then the implementations started to happen. More existing code, sure, but much of the web dev moved to PHP/Ruby and many other things to Python.
And unlike Py3 it was initially also much slower than P5, making migration kinda worthless.
Maybe I am off base here, but big migration efforts feel like they would be significantly easier in a compiled language. Potentially not a fair comparison when the tooling can automatically push so much code without errors.
> Posting a Wikipedia link without commentary or context is essentially a text meme. It doesn’t invite discussion.
You literally just replied and we are engaging discussion — empirically the wiki link above has done literally the opposite of what you are saying! At any rate, the intended relevance of the wiki link regarding loss aversion is the following:
When something bad happens to someone, they tend to overestimate how bad it is relative to something experienced as objectively equally good; it is thus important to be aware of such a bias — in addition to using regular old critical thinking.
In this case, critical thinking could look like the following: “although this is a ‘migration’, what do we mean when we say that? and is it really the same thing as py2 to py3?”
I would tentatively propose “no” to the latter question; and would additionally propose that calling this a “migration” is not terribly useful as although it’s not incorrect, it’s insufficiently specific and seems to invite a category error.
In addition to the loss aversion bias mentioned above.
> Destructors and weak reference callbacks for code objects and top-level function objects are delayed until the next cyclic garbage collection due to the use of deferred reference counting.
this actually does "break" a lot of things, as you would be surprised how much code implicitly relies upon cPython's behavior of immediately calling weakref callbacks when an object is dereferenced. This is why keeping test suites running on pypy can be difficult, because it has the latter behavior.
> Removing the GIL will require a new ABI, so existing C-API extensions will minimally need to be rebuilt and may also require other changes. Updating C-API extensions is where the majority of work will be if the PEP is accepted.
as you note, there will be *two* versions of the Python interpreter.
that means every C extension has to be built *twice*. against both versions of the interpreter. Go look at how many files one must have available when publishing binary wheels: https://pypi.org/project/SQLAlchemy/#files the number of files for py3.13 now doubles. not clear if we actually have to have both Python builds present, so I would have like /opt/python3.13.0.gil and /opt/python3.13.0.nogil ? if the gil removal changes almost nothing, why have two versions of Python?
I don't see why you couldn't use updated "No GIL" extensions with the GIL interpreter. The "No GIL" interpreter mode will simply abort if you load an unsupported C extension.
It's worth highlighting that the GIL will still be available even when compiled with --disable-gil.
> The --disable-gil builds of CPython will still support optionally running with the GIL enabled at runtime (see PYTHONGIL Environment Variable and Py_mod_gil Slot).
You've conveniently described python behavior but lots of C code relies on the gil implicitly and will need to add locking to be correct in a nogil world.
I'm not saying for sure this is bad! I do think it is dishonest though about the potential impact. Lots of critical libraries are written in C.
This is why the proposal is to, by default, reenable the GIL at runtime (and print a warning to stderr) whenever a C extension is loaded, unless that extension explicitly advertises that it does not rely on the GIL.
Dishonest? I didn't hide that fact: "Removing the GIL will require a new ABI, so existing C-API extensions will minimally need to be rebuilt and may also require other changes. Updating C-API extensions is where the majority of work will be if the PEP is accepted."
Yeah but everyone knows that is obviously the biggest issue. Nobody was really concerned about the Python code which is the part that you said will go swimmingly—that's table stakes for a change at all. We already assumed the pure Python code would upgrade easily; a change would have no chance whatsoever of being accepted otherwise. Everyone is worried exclusively about the C extensions, and always has been. This seems to have been presented as a new approach to GIL removal that fixes the problem with C extension breakage but it's just the exact same old approach we've always been considering that breaks the C extensions. No-GIL ain't done until all the C extensions run, especially NumPy.
Most of the other comments here at the time were similarly from people who clearly hadn't read the PEP saying it would break all existing Python code.
So I did my best to represent the backwards compatibility section of the PEP. I told folks to go read it and linked to it. I cited the portion relevant to Python code. There were too many bullet points for the C-API, so I summarized it with the disclaimer "Updating C-API extensions is where the majority of work will be if the PEP is accepted."
I also read the PEP discussion thread where there was disagreement from Python maintainers on how much work would be required of C extension authors, but most of the folks stating it would be a lot of work hadn't seemed to actually have tried to port anything to the new API. Meanwhile Sam had asserted that:
> Most C API extensions don’t require any changes, and for those that do require changes, the changes are small. For example, for “nogil” Python I’m providing binary wheels for ~35 extensions that are slow or difficult to build from source and only about seven projects required code changes (PyTorch, pybind11, Cython, numpy, scikit-learn, viztracer, pyo3). For four of those projects, the code changes have already been contributed and adopted upstream. For comparison, many of those same projects also frequently require changes to support minor releases of CPython.
I don't think I've misrepresented anything.
> No-GIL ain't done until all the C extensions run, especially NumPy.
I think the word "minimally" makes it sound like the changes to existing libraries are "to an extremely small extent; negligibly." "At a minimum" would fit better, because the minimum of a set of things can still be very large whereas minimally implies the quantity is very small.
But it was downplayed in the comment when it was always the #1 reason for keeping the GIL. Python level changes were never a serious part of the argument.
What do you propose as an alternative? I write code in a lot of languages and I can't think of a single one where I don't have to consider the version. This applies to C, node, ruby, swift, gradle/groovy and java at least. Even bash. When developing for Android and iOS, I have to consider API versions.
Almost every third party ML model I look at seems to have different versions, different dependencies, and requires deliberate trial and error when creating container images. It's a mess.
Having interpreters and packages strewn across the machine is a nightmare. The lack of standard tooling has created a lawlessly dangerous wild west. There are no maps, no guardrails, and you have to beware of the hidden snakes. It goes against the zen of python.
As a counter example, Rust packs everything in hermetically from the start. Python4 [1] could use this as inspiration. Cargo is what package and version management should be, and other languages should adopt its lessons.
[1] Let's make a clean break from Python3 even if we don't need a new version right now.
The ML community has horrendous engineering practices. Everyone knows this. This isn’t the fault of Python, nor should Python cater to people who build shoddy scaffolding around their black boxes.
I mean, you're not entirely wrong but Python really really doesn't make it easy.
Consider R, which is filled with the same kind of people. There's one package repository and if your package doesn't build cleanly under the latest version of R, it's removed from the repo.
Don't get me wrong, this has other problems but at least it means that all packages will work with a single language version.
> I mean, you're not entirely wrong but Python really really doesn't make it easy.
That's a vast exaggeration. It is not "really really" hard to spin up a venv and specify your requirements. People just don't do it, and blame the tools for what are bad engineering practices agnostic to any language.
"Really really" not easy would be handling C, C++, etc. dependencies.
Generally that is a straight forward process of compiling, reading the error message, googling “$dist install $dirname of missing dep” running the apt-get / emerge / yum “ command and then repeating the compile command. Sometimes people will depend on a rare and not bundled dep, but not that often. Worst case you need to upgrade auto make tool chain or rebuild boost or something.
Maybe more time than getting python deps to work but more deterministic and takes less cleverness.
I work in data science in python (and the parent was about ML) and basically everything in that space has C and Fortran level dependencies and this is where Python is really really bad, so no it is not as simple as you're making out.
I really really wish it was, as then I wouldn't have had to learn Docker.
Python is a much older and generalist language than R, so yes, while it would be great to impose this kind of order on things, it’s not practical for its current extent of use.
That being said, after two decades of using Python professionally, the only really problems I’ve ever encountered are “package doesn’t support this version for {reasons}” and “ML library is doing something undocumented and/or dumb that requires a specific Python version.” The former is normally because the package author is no longer maintaining their package and the latter is because, again, the ML community is among the absolute worst at creating solid tooling.
I don't disagree that Python's place in the ecosystem ("generalist" - i.e. load-bearing distro fossilization in everything from old binary linux distros, container layers, SIEM/SOAR products, serverless runtimes...) leads to much packaging complexity that R just doesn't have
However, Python (1991) is only 2 yrs older than R (1993)
Rust and Node (via nvm) feel good. The worst I run into is “this version of node isn’t installed” and then I just add it. And I don’t have to worry about where dependencies are being found. Python likes to grab them from all over my OS.
I use direnv and pyenv. When I cd to a repo/directory, the .envrc selects the correct Python and the directory has its own virtual environment into which I install any dependencies. I don't find that Python grabs packages from all over the OS.
pyenv works locally, no matter what the project opts to use. The only thing it needs for a project 'to be managed' is a .py-version file, which you can throw in .gitignore
It doesn't matter what you do. The vast majority of code I'm using from other people doesn't. Even my personal python methodology differs from yours.
Plus, you now have to teach and evangelize your method versus the dozens of others out there. It's crazy town.
The negative thoughts and feelings I once had for PHP are now directed mostly at Python. PHP fixed a lot of its problems over the last decade. Python has picked up considerable baggage in that time. It needs to take the time to do the same cleanup and standardization.
I was describing a workflow that works for me to someone who didn't seem to have found an effective Python workflow in hopes that it can work for them too. I work across a variety of languages and none that I've worked with doesn't have some issue that I can't complain about[1]. I personally don't find Python all that painful to work with (and I've been working with it since 1.5.2), but I understand my experience is not universal.
[1] If it's not the language, it's the dependency manager. If it's not the dependency manager, it's the error handing. If it's not the error handling, it's the build process. If it's not the build process, it's the community. If not the community, the tooling. Etc. I have some languages I like more and some less. Mostly it comes down to taste. I'm not here to apologize for or defend Python. I'm only here to describe how I use it effectively, and to correct what I thought were inaccuracies with respect to removing the GIL.
I use direnv because I work with many languages and repos and I don't want each language's version manager linked into my shell's profile. As well, direnv lets me control things besides the language version. Finally, direnv means I don't have to explicitly run any commands to set things up. I just cd to a directory.
FWIW, I don't think it's nice that rustup fetches and installs new versions without prompting, but I suppose that other users like it or get used to it. Fortunately most Rust projects work on any recently stable version.
> rustup fetches and installs new versions without prompting
I don't think it's true. rustup installs new version only when you run `rustup update`. What parent is talking about is pinning a particular rustc version in Cargo.toml, which allows rustup to download that version of rustc to build that particular project/crate.
rustup will automatically download that version when you interact with that project, though, and that's what I mean. It doesn't sit right with me, comes as a surprise, but I guess it's not the biggest issue in the world.
Node do allow you to declare what node version supported in your package.json. The definition is there, but there isn't any tool that read the declaration and switch to it accordingly. I feel it is somewhat half-assed. But is could also caused by the fact the entity that distribute the package (npm) and node binaries (various of linux repository) isn't the same group of people. So there isn't really anyone can do anything about it unless we get something like corepack someday. (probably someone should name it 'corenode' ?)
Isn't this all handled by pip typically? Even though most models don't necessarily put it in the readme, the user should be using some sort of env manager.
I mean, Java seems like a pretty good alternative? Obviously it's trivially true that programmers have to care about versions, but they've done miracles in the VM without breaking compatibility.
Pragmas seem like the correct way to have done the Python 2->3 migration. Does anyone know of some technical limitation as to why they weren't used? It is very obvious solution in hindsight, but I wasn't there.
I saw some people mentioning changes like changing print statement to print function. That was actually one of the most trivial changes and you could import print_function from __future__ which worked like pragmas.
Similar problem could be with changing behavior for divisions (which actually was more challenging) but similarly you could enable that behavior.
The main problem with migration though was addition of Unicode. You can't just enable it on file by file basis, because once you enable the new behavior in a single file you, will start passing arguments in Unicode to other code in other files and if that code wasn't adapted or will break.
And it was even worse than that because that problem extended to your dependencies as well. Ideally dependencies should be updated first, then your application, but since python 2 was still supported (for a decade after python 3 was released) then there was no motivation to do it.
And if that wasn't enough python 2 already had Unicode support added, but that implementation was incorrect, so even if you imported Unicode_literals from __future__ you potentially broke compatibility with existing python 2 dependencies without guarantee that your code will work on python 3.
IMO that particular change couldn't be done with pragmas, the core issue is that python 3 put a clear separation between text and binary data, but Python 2 mangled them together. That still was true even when you used Unicode in python 2.
The proper way to perform the migration IMO would be to type annotate the code. And then run mypy check in python 3 mode.
Back when Python 3 was initially concieved, the language just wasn’t that widely used, and mostly by enthusiasts. Some breakage wasn’t considered a big deal - it was expected users would easily update their code.
But during the time it took to design and deliver Python 3, the language exploded in popularity and reached a much wider audience and 3rd party libraries like numpy became crucial. So when Python 3 was ready it was a completely different ecosystem which was much harder to migrate. But I dont think the core team really relized that before it was too late.
Asdf makes all of this pretty easy. For consulting I often need multiple versions of everything to match client projects -
Just install them with asdf and put a .toolversions file in the project folder with the desired tooling builds.
You would not need separate packages to do that (in fact, you can't do this with separate packages because dpkg will complain if two packages provide the same file).
The PEP actually states that the nogil version would also have env variable allowing to temporarily enable GIL. Although I guess in practice they might still build separate versions.
As for managing Python library dependencies, I use poetry (https://python-poetry.org), though unfortunately both it and pipenv seem to progressively break functionality over time for some reason.
pyenv is a third party tool that makes some of this easier, notably around creating more than just a virtual env in that you also choose the Python version.
Python is not hard to deal with in this regard I think people are just uninformed.
If you’re installing a Python package into the global site packages directory (ie, into the system Python) you might need sudo. That’s how permissions work.
I don’t know the -u flag on pip, never used it can’t find it in the docs.
With a virtual environment sudo is not needed. Assuming you created it, and/or it is owned by you.
Virtual environments are just directories on disk. They are not complex.
I don’t use conda because it’s never felt even remotely necessary to me.
how about when you are authoring script under your name, but then want to schedule it for cron to run periodically?
I often find myself working under my user on remote server, but then I want to schedule cron job - and run into all sorts of permissions / bugs and missing packages.
especially when multiple machines need to run this script, and I don't want to involve containers to run 20-lines simple python script.
this is why Golang is so popular - you can just scp a single binary across machines and it will just work.
Are you kidding me? The horrendous way Python does dependency management and virtual environments, and the fractured ecosystem around those, is one of it biggest pain points, often covered by core CPython developers and prominent Python third party developers, hardly "misinformed" people.
That comic is very old. In the days of 2.x it was a little harrier but nothing like people make it out to be.
The literal only thing you need to understand is “sys.path”. If you inspect this in a shell you will know what you’re up against. Python is all literally just directories. It’s so easy and yet people get so bent out of shape over it.
Create a venv, activate it, and use pip as normal. If you ever run into issues, look at sys.path. That’s it.
Which is irrelevant. We're talking about the dependencies/packaging/virtual environments situation, not whether "it's easy to be a Python developer" in general.
And you can disagree all you want, but it's simply wrong that Python's packaging/venv ecosystem is "just fine".
There are options to do things other ways, but most of the time I just use venvs and pip for everything.
Is it because people have to use venvs that people complain about it?
I’ll admit being able to install via an OS package manager, vs pip, vs anaconda etc etc can be confusing, but is any of that really Python (the language)’s fault?
I think the primary problem with removing the GIL is the performance from degraded garbage collection. To quote the issue with an implication that it isn’t a big deal because it’s not many words, doesn’t make the case for me. The delay in when garbage collection happens is the only reason to my mind why Java sucks balls as a memory hog vs C-based languages. Process performance is in large part based on when critical lowest latency memory resources can be freed for the top level working object set.
It’s an interesting debate ( money aside ). In the past 15 years, at least to my eyes, one of the most important changes in soft eng culture has been the now systematic search for correctness. Systematic, organized, large scale Testing is one side of the coin, but the real value , and significantly more important change to me is guarantees/proofs : static analysis, stronger type systems, etc. So type annotations in Python, typescript, rust instead of c++, etc. Tools as well : golang race detector, infer, coverity, pvs studio, code peer (Ada)/spark, compcert, etc.
In this particular case, I understand that removing the Gil would create potential new risks ( but provide better performance) and although tooling is mentioned, it’s not officially part of the plan, so it feels like a regression of some sorts.
It might be worth it though because somehow python had ended up being the lingua Franca in a many applications where performance matters a great deal (ML, scientific applications, etc). I don’t think the performance benefits for something deployed at such a large scale should be understated.
Although you are almost certainly correct, I’m curious if there is a case where faster doesn’t mean more power efficient I.e. more concurrency but more power overall
Concurrent execution is ~always less power efficient than serial execution. I have no idea what the parent comments are talking about. For almost any work X with time to run on a single core T it will run on two cores in time larger than T/2, and on three cores in time larger than T/3, ... This is due to synchronization overhead (e.g. networking, locks, delays) and also often due to shared resources getting saturated (e.g. VRAM on a GPU).
That is unless in the serial execution you still have to pay for some resources that are unutilized.
However, if you're running a computer with a 16-core CPU, power usage doesn't scale linearly with cores. There's a lot of overhead, especially if you're talking about a laptop/desktop with display, HID, etc.
I don't think that has really been a change in the past 15 years. Well, maybe there was just a lull in people caring about correctness and robustness... But it's not like before 15 years ago everyone was using dynamically typed languages and static analysers didn't exist.
I guess you could say the world went through a dynamically typed loosy goosy phase and then realised that wasn't so great.
I think one thing that actually has changed is that SMT solvers have massively advanced in the last 20 years to the point that formal verification is practical. Ish anyway. It still seems to require a PhD to do software formal verification. Hardware formal verification is easy though.
PEP 703 mentions the multiprocessing module which I used recently with great success. It is essentially message passing parallelism, i.e. the actor model. I would think even with the GIL removed message passing is still the best approach for a higher-level language, like Python, to handle parallelism as it is conceptually simple and lock free.
The primary disadvantage of the multiprocessing module is it uses separate processes which means my resource heavy Python extension needs to be duplicated multiple times in memory (at least on Windows). I think the move to supporting threads means the module could leverage them instead of separate processes, avoiding the need to duplicate C extensions in memory, and thereby reducing the overall memory pressure on a system.
I agree that message passing can be a really good choice. The problem is the overhead of passing large messages can be really substantial. In general, you'll need to store messages in memory 4x per message (at least temporarily): once on each end for the Python representation and the serialized "on the wire" representation. There's also the issue with `pickle` generally being a horrible serialization format and being the default.
There are ways around this, Python has a rather rudimentary and archaic shared memory subsystem[0] that can make this much more efficient.
On the other hand, the Python multiproccessing sync primitives actually work over a network with almost no code changes, so you can horizontally scale with little effort... but then you're paying the serialization and network overhead, so ymmv.
I have no experience with Windows, but at least on Linux, if the forking is handled intelligently (namely, late enough that identical assets don't need to be re-written), duplicated memory should be handled by Copy on Write.
Have you tried it? You are right in general, but even if you don't write anything in Python, CPython runtime constantly writes to heap, so everything is copied. You need specific support from CPython runtime to exploit copy-on-write.
I guess when GP said "heavy ... C extensions" I was thinking code pages from libraries and structures managed by C, yes python objects will require additional consideration.
CPython still needs to update reference counts after the fork even if your python code itself doesn't modify any data, making Copy on Write less helpful than it would first appear. Monolithic memory buffers like numpy arrays can be shared efficiently via CoW, but rich data structures with many PyObjects suffer from the refcounting issue.
Multiprocessing in Python isn't exactly like forking on other languages. There's no shared Python memory, it's just a new interpreter instance, and then you have to use sharing mechanisms (shared arrays/values, pipes, queues) to transfer data explicitly.
I know, but the entire point of CoW is you don't need shared memory, as long as the fork happens after the memory was written it'll just act like an entirely independent copy transparently to the application, with the OS duplicating the backing page on the first write.
Edit: Ah I see, you're saying the library itself discards anything created from python. That sounds like an implementation problem then, I'd expect it to at least keep imports so any static code pages don't need to be recreated (the interpreter itself should be fine at least, unless it's doing something really dumb).
Multiprocessing is great. Very easy to use and with the Manager(), passing data is easy enough, though very "slow" relatively speaking. It has to serialize the data, send it to the Manager process, then send to all the processes (I believe), and then deserialize it. Perhaps this nogil update can speed that up?
My experience with the multiprocessing module is exactly opposite. It's been a few years since I last used it seriously (I still use Python a lot) mainly because of how bad it was. The main issue is precisely communication between processes: Python uses the pickle module to serialize/deserialize data, which not only adds quite a bit of performance cost, but it's very hard to debug when for some reason some object is not supported by pickle (and it's a common occurrence).
> If PEP 703 is accepted, Meta can commit to support […] between the acceptance of PEP 703 and the end of 2025
Meanwhile, the Steering Council is dragging their feet, even though so many people in the community have chimed in their support for nogil being a thing (in the linked thread and two others).
This is a decision with huge ramifications and the PEP was submitted only 5 months ago. Caution is welcomed here. Specially with PEP 684 around the corner.
> The PEP was posted five months ago, and it has been 20 months since an end-to-end working implementation (that works with a large number of extensions) was discussed on python-dev.
I would expect the SC to be active participants in the community, including the python-dev mailing list, so they should have known what this change means by that point.
There's a great talk by David Beazley (I don't remember which one) that he basically explains that removing the GIL isn't difficult at all. After all, it's just one lock.
The issue is all the libraries and packages that were built around it. It's now been YEARS of building on top of the GIL.
So anyways, PEP 703 is a a nice effort, but I doubt we (everyday mortals) can enjoy it. Meta, and big companies with a specialized team might be able to exploit it by making sure their entire stack is GIL-free.
> Meta, and big companies with a specialized team might be able to exploit it by making sure their entire stack is GIL-free.
Not really. Most of the popular libraries will have nogil versions ready for sure (numpy, pandas, pytorch, etc). Server applications will probably benefit from being able to serve multiple requests in a real multithreaded environment without many changes (in fact, I bet a lot of HTTP libraries will see simplifications due to not needing to deal with processes for real parallelism). Even packages without changes will probably have at least a small degree of thread safety (for example, two instances of the same object should be independent and could be used in different threads) that allows users to still leverage parallelism.
> So anyways, PEP 703 is a a nice effort, but I doubt we (everyday mortals) can enjoy it.
I'd suggest we can be more optimistic. There is a lot of python, regular users write daily, where we delegate orchestration to established libraries - asyncio/web-frameworks/pytorch. The GIL limits how much they can parallelize your code, and its removal will help with that.
2016. A lot has changed since then and AFAIK numpy and other most popular libs with bindings in C have an experimental branch that is already nogil compatible.
Over time I've changed my mind regarding the GIL. Now I think it's existence actually is a good thing, since it makes a lot of things easy and robust by default.
The subinterpreters route sounds really promising since it strikes a good balance between allowing those that need same-process parallelization to be able to do so and keeping the status quo (like workers in Javascript).
I'd actually veto PEP 703 at this point, since the Python optimization team is making good progress, and this could actually take them back a bit, while diverting too much energy into a low-reward avenue.
Might be worth changing the title to "no-GIL". My first thought was "What is nogil CPython?" - thinking it was some specialized fork of CPython. I guess you could say that is technically true. I was previously aware of the GIL in Python, yet it still didn't occur to me immediately.
Whatever happens, I'm hoping we won't see another 2to3-like migration. I moved from Python a long time ago due to its multi-threaded limitations (among other things) to Java and JVM, and I really appreciate the stability the JVM team tries to maintain. Things are not perfect, but I'd take stability/compatibility over perfection any day.
At least take a look at PEP-703[1]. It will answer most of your questions. Significantly, even if this proposal is accepted, the GIL would remain enabled by default.
Could not agree more. The purposeful unnecessary breaking stuff was maddening- and then being lectured at like you were an idiot more so. No more u”” for you!
There is zero chance of that happening. Guido van Rossum (and other key figures in the Python community) have spoken very candidly for years about the mistakes they made in the 2-to-3 migration and their decisions in the year since have demonstrated their commitment to not repeating these mistakes.
I was curious about these comments and did a small web search. I found a video interview from May 21, 2021 [1] and have pasted some excerpted quotations from it from Guido van Rossum below for others who are curious.
“Python 4, at this point whenever it’s mentioned in the core development team, it is very much as a joke… We’ve learned our lesson from Python 3 vs 2, and so it’s almost taboo to talk about a Python 4 in a serious sense.”
[...]
“I normally talk about that as a mistake, because Python was more successful than the core developers realised and so we should have been much more aware and supportive of transitioning from Python 2 to Python 3”
[...]
“I’m not thrilled about the idea of Python 4 and nobody in the core dev team really is – so probably there never will be a 4.0 and we’ll just keep numbering until 3.33, at least”
[...]
“We now have a strict annual release schedule, so after 3.10 will be 3.11 and after that will be 3.12, and so forth. We can go up to 3.99 before we have to add another digit. Adding another digit is not completely trivial, but still much better than going from 3 to 4."
Seeing so many comments about how hard it is to just break compatibility and upgrade is sad. Instead of just throwing our hands up and saying it’s too hard, we could adopt the model the JavaScript ecosystem has seen more of which is codemods that upgrade the code for us.
If as a community we invest in those tools and make them easier to build, the cost of upgrading goes down and the velocity of high-impact changes can increase.
TC39 has a famous "don't break the internet" mantra. Even the leeway that Python has with deprecations/features JavaScript doesn't. It's versionless, code automatically gets updated to whatever the browser is using.
JavaScript evolves quickly but so does Python!
(Note that your approach is exactly what they said in the 2 to 3 transiton btw with a special tool that didn't work too well)
That’s not what that library was in my opinion, it was a compatibility layer not a rewriting tool which is what I referenced. Having a layer in between simply prolongs the issue and creates many types of problems based on adoption or not. On the other hand, when rewriting you can apply that either as an author or as an end user depending on the quality which meaningfully allows for different results.
codemod is a syntax conversion tool, using regexes. Thread-safety isn't even semantic--it's an emergent behavior question, the same as "does this code halt?" There is no general solution.
For example, is this code thread-safe?
foo(int* x) {
int z;
for (int* y = x + 1; y != 0 && *y < *(y - 1); ++y) {
z = *y;
}
return z;
}
You can't tell from static analysis of the function. It depends upon what guarentees are imposed upon the passed-in "x" value. For example, if "foo" is only referenced as a function pointer passed to "baz" (also in the library), and "baz" creates "x" and uses it in a thread-safe manner, then there's no problem. But there's no guarenteed mechanical way to determine if "baz" is indeed doing the right thing, or what changes should be made to make it so.
A fully general transformation from naive to thread-safe code seems like it would make you one of the giants of computer science alongside Knuth and Dijkstra.
I think this is where concepts from Rust and Go could come in handy. Things like Go’s race condition detection and rust’s compiler validation approaches can be used to statically analyze code. Sure, it’s a meaningful change from how many Python devs approach the language and challenging problem but not insurmountable given the existing work in the field.
I'm not sure what "codemods" means. Just static analysis code changer? Python is so highly runtime dynamic I'm not sure a tool is even possible to upgrade behavior, preserving correctness and intent (bugs and all).
But I think they're referencing the litany of transpilers and repackagers which exist for js. So you can add new features and then still have it run on really old systems like internet explorer 9 if you need to.
This has problems obviously and in my opinion for python it would be preferable.
My reasoning being that if you need your code to work on an older system being able to write and use current syntax is preferable to not, and the hard bifurcation that python did with 2 to 3 and now potentially with 3 to nogil seems to me just to break apart the ecosystem more.
That differs but is a reasonable understanding. I’m instead referring to automations that perform large scale refactoring as handled by Facebook, who would be contributing to this effort.
It sounds like what you are describing is what’s known as poly fills which convert code into a variant that maximizes function across implementations which isn’t really applicable here.
Why waste effort on speeding up the old CPython interpreter rather than working on PyPy, an actual compiler? [1] "On average, PyPy is 4.8 times faster than CPython 3.7"
Some c modules are incompatible with PyPy. Two common ones are psycopg2 (PostgreSQL driver) and pyodbc (ODBC driver). There are alternatives that work but they don't have quite as many features (psycopg2cffi and pypyodbc). Ansible also runs slower in PyPy than CPython.
I tested this with a playbook that installs a set of scripts and dependencies on a server. In all there's ~30 tasks. It takes ~1 minute using CPython 3.10 and ~1.5 minutes using PyPy 3.10 (v7.3.12). Ansible says libyaml is installed in both runtimes so that's ruled out. I wonder if it's the difference in interpreter start-up time, and it accumulates over time as Ansible starts so many Python processes as it runs its tasks.
It's hit and miss. Sometimes I get almost no improvement and sometimes I get 10x. But the slow warmup and up to 3x memory use (in my experience) really make me prefer CPython unless I have a very specific problem.
The FAANG etc failed me for anything open source sponsorship related.
Their business depends on it, they made a fortune out of it, yet they donate little or nothing to it, or worse, they forked it and was reluctant to upstream anything.
pick Amazon randomly, it went from exploiting to rip-off in my opinion, thanks to its boss , who sails on a luxury yacht and asks all his employees to be frugal. they all look similar to me as far as supporting OSS goes.
Between pytorch (Meta) and TensorFlow (Alphabet), I'd be surprised if there's much if any ML work today that doesn't rely on OSS projects from FAANG companies.
Chromium underpins nearly every major browser that isn't Safari or Firefox. And even more narrowly, V8 powers Node (and Deno).
More Google contributions: Angular, Kubernetes, gRPC, Golang.
Microsoft: C#, TypeScript, Language Server Protocol, VSCode.
Meta: React, zstd, Apache Cassandra, LLAMA.
This isn't even getting into open source patches (Google has consistently been one of the top 10 Linux kernel contributors for the last decade), systems papers that have inspired research and other systems (BigTable, Dynamo, MapReduce, Colossus, Spanner), standards work (HTTP/2 and HTTP/3 were both adapted from Google technology), security vulnerability work (Project Zero).
(The above list is definitely Google-skewed because that's what I know.)
Even stuff like UNIX, C, C++, to extend your example.
People apparently forget the amount of money that was and is injected into them.
Someone has to put the money on the table to attend ISO meetings, buy the standards, implement the features into existing C and C++ compilers, for example.
Talking about fundamental OSS projects that is not for-profit at all.
Many of Google's OSS projects are there to help its profits, if not, Google could stop or kill it at will which is fine and is 1 millions times better than closed code, still not the same as projects like python etc.
yeah i feel like there are tons of reasons to criticize these big companies, but open source, theyre pretty good for the most part. not sure what else you could expect from them.
there is graphql, rocksdb and folly. they also open sourced their js engine for react native too.
maybe a good criticism is we depend to much on these companies free stuff lol
A lot of these (but not all) strike me a flashy things for which you get promoted. Many people don't need any of these.
What FAANG does not do is support critical yet underappreciated Unix infrastructure. By this I mean directly donating to developers without trying to take over the projects.
The quality of FAANG code, especially in the fault-tolerant machine learning space, is questionable as well. I'm not sure why so many commenters (in other comments!) attribute near magical qualities to any output of FAANG.
Chrome, though, is a kind of mixed bag. It already has too many features, most likely the reason it underpins so many other browsers is that nobody can keep up with the rate at which Google shovels shit into there.
They should slow down. It would be nice if they could just spend a couple years working purely on stability/performance/security, but alternatively they could just stop, it would be better for the ecosystem.
In terms of companies that rip off open source and do not give back meta is certainly one of the more active "give back" companies. React/Native, docusaurus, buck, pytorch, rocksdb...
"Giving back" might be an understatement for Meta's open source contributions. They are the bedrock of so much OSS in the last decade, it's everyone _else_ that is "giving back."
yes some big companies did open source projects and Meta is a good role model, for many those OSS projects, they work for the companies themselves first then the community, though they are indeed still 1 million times better than closed and proprietary code.
I'm mainly referring to long standing fundamental projects that made the rest projects possible, not those projects that are created and managed by the big companies themselves.
Traditionally speaking, it's because they're the most qualified to do it. Every FAANG company takes from Open Source in order to succeed, so contributing back to a FOSS codebase is the ultimate act of corporate goodwill. You are burning real-life money and engineering hours to fix a problem for everyone and ensure that it stays protected under the same Open license that let your company succeed.
There is no legal obligation per-se. Open Source is a charity, but you can understand a lot about a company's motives if you compare their FOSS usage to their FOSS contributions.
It’s not a charity if it becomes or serves as part of a standard. Sort of how semis and the various suppliers/buyers of that industry wind up agreeing on standards despite being competitors in some cases. Definitely things like the Linux kernel make sense to give back to - you’re ensuring the kernel lines up with your own corporate investments in a sense. A company unilaterally needing to fork and maintain their own Linux compatible kernel would need to sink a lot of time and money to come up with something as developed as the modern kernel. And it would become more expensive, unwieldy, and akin to technical debt over time.
It’s clearly not perfect calculus since so many defect from OSS, but there’s also clearly some business sense there.
> It’s not a charity if it becomes or serves as part of a standard.
Then it just transitions to a charity that everyone relies on. It's been embarrassingly difficult for people to admit this vis-a-vis Linux and the like, but it's obvious in a situation like programming languages. We rely on the enduring charity of it's authors, license-holders and distributors to write and distribute software.
"Giving back" is not an inherently cynical process. Meta has no obligation to publish these patches in the first place, much less work as a part of the community to solve a common issue. But they do, and it's just kinda part of their culture at this point. Yes, they do still gain from it; but so do their direct competitors. It's difficult to characterize as a purely business-motivated decision, because there are much greedier avenues to pursue if that's their MO.
I think this misses the point. FAANGs don’t improve OSS code out of goodwill, they do it because it benefits them. Upstreaming those improvements has very little marginal cost, and it bolsters an environment where other companies do the same. Rising tide and all that.
Why would they? Nobody except a few home users needs RAID 5/6. There's only a small handful of talented filesystem developers (never mind the filesystem, it's equally applicable to all of them), and there's always something to do which will benefit all of us. Especially for something so feature-packed and complicated.
It is all about knowing individual engineers inside these companies. Nearly any engineer in FAANG can, with a little effort, donate their own work time to an opensource project in the form of submitting patches.
The project can help in various ways. Make sure licensing stuff is in place so that whatever IP review process the company has goes smoothly. Make sure PR's are reviewed fast. Delegate responsibility ('hi, would you like to be maintainer of this feature?'). Make sure contributions are publicly recognised (ie. Public page of maintainers/biggest contributors).
Remember that most engineers inside big companies are trying to generate good content for their performance review. "Wrote and open source some code" is usually one of the checkboxes.
> Nearly any engineer in FAANG can, with a little effort, donate their own work time to an opensource project in the form of submitting patches.
I know at least one of those letters that does not want you doing that without jumping through hoops and getting prior approvals. I will let you guess which one that is.
Thats the "with a little effort" bit. But as long as you have a business case (eg. "we are adding this feature for our use, but it will lower our costs if it is upstreamed rather than maintained as a private patch"), it will usually be greenlighted.
No, Meta’s contract has explicit carve-outs for Open-Source work done on personal time that are more generous than Canonical’s (I had to choose between the two 3 months ago).
Meta has its own production fork of CPython, Cinder:
This isn't legal/enforceable in California. Your own work on your own time on your own equipment is your own property. It's messier if you're building a direct competitor or using proprietary knowledge.
React alone is one of the biggest feast Meta brought to open source community, and shaped the web from monolithic pages (back then PHP and jQuery is the majority) towards modular components design. Later they even introduced functional programming paradigm in the mix.
Next.js exists too which mashed up PHP/Ruby style with React.
This is disappointing but certainly not surprising and to some part we as an open source community have to blame our selves.
I know it is not a popular opinion but the AGPL is there for a reason. If you release your code under a permissive license you should not be surprised that others benefit from it - sometimes massively. That is how it is supposed to be. You can still lament that it is unjust and I feel you, because in a sense it is. If it really bothers you, the solution is easy. Next time choose the AGPL.
I've released open-source projects under an Apache license because I want others to benefit from my work. I knowingly didn't choose GPL-flavored licenses because they limit people's ability to do this.
It's stupid sour grapes comments like this that make me hate HN commenters.
Meta and Google have both contributed an enormous amount to open source (creating PyTorch, TensorFlow, Chromium, contributing to clang/LLVM, Linux kernel, HTTP standards), but you're mad because they didn't sponsor your work and a bunch of people mindlessly upvoted it.
It's easy to blame the boss, but I don't think its the companies responsibility to give back to open source just because they use it. If you think open source is helping you, you could contribute it in your time without expectation to get paid for it. Most open source projects work like this. And if some company needs some extra feature, they should pay market rate for it.
Assuming the company doesn't have clause to not allow contributing to open source. These companies should be publicly shamed if they use any open source code.
From the many people who asks for things, only a tinny percentage are actually willing to pay for it. Take Microsoft, I had one of their employee asking me to support their azure stuff: https://github.com/mickael-kerjean/filestash/issues/180. When I found out the dude was actually employed by Microsoft, I somehow though he could give a hand, but no he was just there to make some feature request ...
I had no problems with this by the way, just saying maybe he can contribute 0.001% of his profit to those OSS projects that he depends on, it will help him to build more yachts too.
> If there’s one lesson we’ve learned from the Python 2 to 3 transition, it’s that it would have been very beneficial if Python 2 and 3 code could coexist in the same Python interpreter. We blew it that time, and it set us back by about a decade.
So it's not so easy to decide to go with again 2 different versions of Python.
I understand how caution they are.
However, this commitment from Meta, complemented with MS financing Guido, and the quality of Sam Gross proposal makes me optimistic.
> PEP-703: Concurrent collection requires write barriers (or read barriers). The author is not aware of a way to add write barriers to CPython without substantially breaking the C-API.
Jesus. Imagine a language runtime being this hamstrung.
It's three engineer years until 2025, and it's just a person mentioning on the discuss board. It sounds a bit weird to throw something like this in a public forum.
Why weird? The request was made there. The discussion on the PEP is there. And the person answering is a core Python dev and part of the Python team at meta/insta.
Their team wants it, and is offering hours to help get it done. Is this not what everyone wants from businesses using OSS?
You use smaller scale locks as and when you need them, and you spend a lot of time to carefully implement core threading constructs and similar to make sure they are correct.
Most of the time in an interpreter you’re iterating through the byte code for a function, and mutating variables. It doesn’t matter if two threads are executing that same byte code, it only matters if they are mutating the same object.
That’s not to say there aren’t some operations that need coordination across all threads. For that you implement safe points so you signal that something has to be done, wait for all threads to reach a safe point, do the work, and then release the threads again. You can often be even smarter about this and avoid stopping all threads at once but instead stagger things.
Allocation and GC you can make more thread local to avoid stopping the world too often.
Bytecode vs JIT doesn't really make a difference. But taking to question to managed languages (so Python, Erlang, JS, Ruby, .NET, JVM etc) - there isn't really a GIL problem to start with in language design, there's just design choices wrt what, if anything, to do about parallelism/concurrency. For example Erlang and recent browser Javascript skip shared memory parallelism and use message passing (JS web workers and Erlang processes). Python's approach is motivated by it's simple refcount-based garbage collection, other GCs have it easier.
Ruby [1], JavaScript (when running in NodeJS) and PHP are single-threaded. Perl is single threaded as well but it features something like to what is being proposed in PEP 554 (https://peps.python.org/pep-0554/).
I know Clojure is often described as a dynamic language and that it has some concurrency primitives, but my knowledge ends here.
[1] Just like Python, Ruby does have implementations that allow multi-threading.
Ruby is not single threaded even in its default implementation, but it does have a global value lock that is equivalent to the GIL. Not only does it have threads, it also has fibers which allow for more cooperative scheduling.
The GIL is a conscious trade-off between contention/performance against a simpler code.
Without the GIL, you'd still need to use different locking/synchronization mechanisms to ensure thread safety (if required), but they'd be more nuanced and spread over the entire codebase (complicating the backend).
I think comparing it to something like nodejs the way they work around the issue is that almost all their code is written in JavaScript itself. Python has a different model where a very large amount of the code is written in c and python itself is just a glue.
A huge amount of NodeJS is written in native code. JS is single threaded. Async work doesn’t work without native code. JS doesn’t need a GIL because it only has one thread.
10 years late. I remember being criticized for avoiding the GIL in some CPython C code in 2012 and I think I stopped contributing to CPython after that. There are (or were back then), curmudgeons trying to hang on to their "safe layer over 90s style C process" Python.
The GIL problem won't be solved by throwing engineer hours at it (not even Meta engineers). Fundamentally every single piece of python code ever written will have to stop and now worry about potential race conditions with innocent things like accessing a dictionary item or incrementing a value. It's a massive education and legacy code problem at least on the same scale as the python 2 to 3 migration.
I honestly don't think the community is ready for this change and don't expect to ever see stock cpython drop the GIL--perhaps there will be a flag to selectively operate a python process without the GIL (and leave the onous on you to completely test and validate your code works with potentially new and spooky behaviour).
That is a misconception. The GIL protects the internal state of Python. It don´t make all Python multi-thread code "safe".
The PEP 703 still preserve many actual characteristics such as access and writing to dictionaries.
Correct. Python multi-threaded code is not magically thread-safe. What it did do is make C functions atomic, like dictionary and list methods, since they're implemented by C functions. The reason is the GIL is released and re-acquired while executing Python code (every 100 ops if memory serves), but isn't released by most C functions, making them called with the global lock held, which makes them atomic. You can still have a thread switch between any two Python op codes. e.g. foo += 1 may or may not be atomic depending on how many op codes it compiles to (IIRC, in CPython, it is not atomic.)
One of the reasons I love using gevent (https://www.gevent.org) is that it's way of introducing concurrency that does preserve these atomicity guarantees! Broadly speaking, the only time your code will be interrupted by something else in the same process is if some function call yields control back to the event loop - so if your block is entirely made up of code you control that doesn't do any I/O, you can be sure it will run atomically. And you get it for free without needing to rewrite synchronous code into asyncio.
This does make me wonder, though, if gevent will survive PEP 703's massive changes to internal Python assumptions. That said, gevent does work on pypy, so there's some history with it being flexible enough to port. Hopefully it won't be left behind, otherwise codebases using gevent will see this as a Python 2 -> 3 level apocalypse!
Ah - it's more that you wouldn't typically need to run multiple threads in the same process to handle concurrent requests. For instance, gunicorn used with gevent workers will typically fork processes to ensure all cores are used, but wouldn't require multiple threads per process - gevent would handle concurrency on a single OS thread within each process.
As soon as you leave python, that C extension can give up the GIL and your other python threads can start running.
The GIL makes python code have cooperative threading. It does not protect from e.g. your thread's view of state mutating when you make a database call.
I also believe it is best practice not to mutate data without holding the GIL in extension code, not a requirement - but I have mucked with a lot of different extension API so I might be confused.
No, I wouldn’t call it cooperative threading, it’s still preemptive in that any Python thread can be switched with another at any instruction. That’s the same behavior as the operating system with the same potential for race conditions (except python instructions are higher level than machine code instructions.)
While C extensions can release the GIL, that only makes sense if they do enough work that a Python thread could get some things done in the meanwhile, and it wouldn’t be surprising to the caller. Obviously the C thread can’t interact with the Python world after the GIL has been released.
Having worked on moving a proprietary language from its own green threaded VM to the JVM, and working on TruffleRuby I’m saddened to see this FUD still being trotted out. The GIL and similar mechanisms do not make your code thread safe. It _might_ save you from a small set of concurrency bugs, but they are fewer than you might think, and mostly it will just make intermittent existing issues that little bit more obvious when you move to real threads. Occasionally we would need to fix something in a core library or add a mutex, but those bugs could often be seen in a stress test with green threads or Ruby’s GVL.
My guess is the GIL or smaller mutexes will be needed for C extensions and a few other areas, but it’s also likely that could be moved to an opt in mechanism over time.
I dont get this. Just because you don't have the GIL doesnt mean that your previously single threaded code is now multithreaded and stepping on itself.
From previous discussions it's my understanding the c integration is going to be the cause for the issues.
From a python perspective it wouldn't necessarily be a big change, but everything can branch out to c, and there you're going to get in trouble with shared memory.
The most significant impact of this change is that python threads can run at the same time. Before, calls to C API were essentially cooperative multithreading yield points - even if you have 100 python threads, they can't run concurrently under the GIL. 99 are blocked waiting on the GIL.
C extensions have always been able to use real threading. But now there is no GIL to synchronize with the python interpreter on ingress/egress from the extension.
No, but it means that your previous multithreaded code is no longer automatically prevented from stepping all over itself by having multiple threads accessing the same data:
GIL removal doesn't mean "make all of this lockless" it means "replace the GIL with fine-grained locking". So those problems are still solved for Python code. The three issues are the amount of work it takes to do right, the performance cost for single-threaded code, and the CAPI.
I'm simultaneously scared and enlightened seeing all these comments acting as if the GIL is some magic "makes your code thread/concurrency safe" pancea. I always saw it as a shim/hack to make cpython specifically easier to implement, not something that inherently makes code more thread safe or atomic. It's just more work to do things "the right way" across application boundaries, but from my understanding this PEP is Meta commiting to do that work.
Removing the lock creates problems in existing code in practice. This is an ecosystem that has less focus on standards and more on "CPython is the reference implementation".
What non-transparent GIL specific behavior are developers relying on exactly?
When I say GIL specific behavior, I mean "python code that specifically requires a GLOBAL interpreter lock to function properly"
Not something that simply requires atomic access or any of the garuntees that the GIL has this far provided, but like, specifically code that requires GIL like behavior above any CPython implementation details that could be implemented with more fine grained concurrency assurances?
I've seen some really cursed python in my days, like checking against `locals()` to see if a variable was defined ala JavaScript's 'foo in window' syntax (but I suppose more portable), but I can't recall anything specifically caring about a global interpreter lock (instead of what the GIL has semantically provided, which is much different)
> What non-transparent GIL specific behavior are developers relying on exactly?
They are relying on behavior of a single environment. We similarly see a lot of issues moving threaded code to a new operating system in C/C++, because the underlying implementation details (cooperative threading, m:n hybrid threading, cpu-bound scheduling, I/O and signaling behavior) will hide bugs which came from mistakes and assumptions.
finally. you don't even have to read anything to work this out -- if things like dictionary access were no longer atomic that would imply that threaded code without locks could crash the interpreter, which isn't going to happen.
> GIL removal doesn't mean "make all of this lockless"
Literally speaking, that's exactly what "removal" means. As far as I can tell, GP was wondering why there's so much discussion about replacement, since simply removing the GIL wouldn't break single-threaded code.
It is solvable by making the hard decision to move to Python 4 with no backward compatibility. The two core issues imo in Python are the GIL and the environment hell and both simply can’t be solved while still keeping the 3 moniker. We’re in a field of constant workarounds and duct tape because we try pleasing too much
Python tried that (version 2 to 3) and both the community and dev team were traumatized by the effects enough they've publicly said it'll never happen again.
That means they didn't learn from it at all. The problem with Python 2 to python 3 is that it lost backwards compatibility because of very silly reasons like turning the print statement into a function. The vast majority of the problems could have been avoided by not making pointless changes with dubious benefits.
I seriously doubt anyone had problems fixing print as a statement. 2to3 fixed it...
I'll admit that, yes, changing string to bytes and unicode to string was a bit annoying, but the change itself wasn't fundamentally 'of dubious benefit', it did have benefits, and related to this, the only major issue was that you couldn't, for a long time, have code that worked in both where it came to literals. The biggest problem here was the implicit conversion from 2, that I agree needed to go.
Most of the other things can be trivially fixed automatically, or at least detected automatically, but without type hinting, it wasn't really easy to fix the automatic conversion.
There were other changes that were a bit tricky, but the majority of issues stemmed from the str/bytes change.
> perhaps there will be a flag to selectively operate a python process without the GIL (and leave the onous on you to completely test and validate your code works with potentially new and spooky behaviour).
Worked for ruby. The original interpreter MRI has a GIL too. Rubinius and JRuby added multi-threading with limited amounts of pain and people fixed libraries over the years. Sometimes just sprinkling lock blocks around a particular FFI calls or only doing them from a dedicated thread will do the job.
Getting rid of the GIL would warrant the release of Python 4.0 for me, except the Python project shouldn't be supporting two different branches for as long as they supported 2.7.
I imagine there would need to be some kind of annotation to enable the GIL for a method and all of the code it calls, including libraries, so performant Python can take advantage of the lack of a GIL but old code doesn't break. Then all you need to do to maintain compatibility is to annotate your main() and your code should remain compatible for a while.
After all, the referenced PEP explicitly calls for making the GIL optional, not for removing it completely.
>Python project shouldn't be supporting two different branches for as long as they supported 2.7.
That wasn't the problem. The problem was not giving people the ability to make their code python 3 compatible while they were still stuck with python 2. The python 3 interpreter should have had a python 2 mode that gives you warnings.
Then just mark the extensions that is compatible with GIL. And also you will have a switch that disables GIL, controllable with a environmental variable or launch option.
The GIL only protected you during any of those operations, so you can still switch threads waiting between LOAD_FAST and STORE_FAST and have a race.
There are a lot of things to be worried about with the GIL conversion of new race conditions that could happen, but there's already too much misinformation out there about the GIL, let's not spread this one even further.
> Fundamentally every single piece of python code ever written will have to stop and now worry about potential race conditions with innocent things like accessing a dictionary item or incrementing a value
Not really, just make those operations atomic or have automatic locking
It will be optional. You don't have to worry about it, set the option to ON and for those who can worry about it, they will have the option to set it to OFF.
The problem is, code that uses on a lot of 3rd party libraries that throws the ON-switch for nogil, will suddenly depend on all these libraries maintainers having worried about this.
We can set it up such that if a module imports modules that don't support nogil, nogil will be automatically disabled for them too.
So, library designers will be under pressure to update their libraries to enable support. We could also have code tools that detect patterns that aren't GIL safe and throw out loud warnings
Why should library authors be put under pressure because someone else chose the wrong tool for the job, and is now trying to push that externality on to the community?
Python is a single-threaded language. That’s part of its DNA. The community has already been through one traumatic transition in recent history and the appetite for another one is low.
Library authors should not update their libraries to support multithreading, rather the people who want that should be forced to rewrite their code in a language that is more suitable for the problem they want to solve.
I disagree that single threadedness is in its DNA. It's an implementation detail of CPython. There are other implementations which don't have a GIL even today.
Would removing the GIL be a big change for CPython? Yes. But IMO it's worth it
> Python is a single-threaded language. That’s part of its DNA.
If the changes proposed in this PEP go through, that will no longer be the case. So library authors pretty much will have to either update, or see their modules wither the same way they would if they weren't updated as new Python 3.x versions come along.
> rather the people who want that should be forced to rewrite their code in a language that is more suitable for the problem they want to solve
The vast majority of the Python developer community WANT Python to support true multithreading and being able to solve these problems. We expend inordinate amounts of time and skill trying to make Python work around the GIL, eg. by utilizing multiprocessing.
We want Python to stay relevant long-term. In an age of abundant multi-core platforms, and workloads that can utilize them, the GIL is a major obstacle to that desire.
This is wrong, that ship sailed 10+ years ago. Python is used for almost everything, better get used to it.
This kind of garbage logic lead to the horrible Ruby/Python/Javascript with C extensions split that makes their ecosystems very brittle, versus Java/C# where it is expected that things are fast enough without C and package management is much easier.
It's true that python is used for "almost everything", but it's only true because it plays nicely with C.
I understand the desire/demand for general purpose tools. The thing is, there are always tradeoffs. Acknowledging the tradeoffs and designing more specialized tools that work well together isn't necessarily garbage logic.
> Python is a single-threaded language. That’s part of its DNA.
Boooo. Maybe if all you do is ops scripts, but for those of us in data science this couldn't be further from the truth imo.
I'm not saying the syntax is easy for asyncio, or anywhere near as nice as golang or even kotlin is for concurrency, but it's definitely workable in a concurrent environment.
Yeah, I'm not a full-time developer but even for my bits of scripting I'd be wary of another big change in Python.
Personally I'd much rather see this effort go towards a new language which "feels" like Python but adopts more of the development experience of Go and Rust. From my tinkering it seems like Nim might already be that language, in which case what is needed is investment in its package ecosystem.
The problem with Python 2 to python 3 is that python 3 was essentially a new python 2 esque language instead of just being a major version bump.
If python 4 was no GIL python, then both Python and C would remain unchanged as a language.
Cheers. That does sound like it would impair actually being useful on multicore, multithreading cpus (which of course has been all of them for basically the last two decades)
to be honest... we already saw that the migration from python2 to python3 was a complete shitshow, but the value python brings is so great the ecosystem survived.
i would gladly accept a python3 to python4 breaking change where all the necessary bits are changed to get real multi-threading (no-gil, gilectomy or whatever).
It's crazy how much work we put into trying to speed up these scripting languages. It's the same with Ruby. Shopify spends tons of dev resources on making JIT and stuff.
What do you find so crazy about it? Dynamically typed languages like these can be fast. It has already been demonstrated with Common Lisp and, more recently, JavaScript. Not only that, but LuaJIT has shown that you don't need impossibly complicated techniques or man-decades of effort to get very good performance. In Python's case, the main obstacle to improving the implementation isn't any feature of the language itself, but the giant ball and chain that is the C API.
Almost every single data structure access in python, even reading a dictionary item, at some level depends on the GIL for true correctness right now. It is very deeply embedded in cpython.
I think it’s the other way around? I haven’t run into any ML code that could be multithreaded that wasn’t written in C++, but have often run into server tasks that could use a polling thread, etc.
All the ML code is written in lower level languages and that’s very unlikely to change, GIL or no.
Yeah, you're right - even though CUDA is async, doing any preprocessing (in Python) can be harder if you don't have shared memory (the start-up latency hit of multiprocessing is not a problem in this context). I've only ever encountered "embarrassingly parallel" data-feeding problems, where the memory overhead of multiprocessing was small, but I could see other situations. Comment retracted.
The editorialized title "Meta pledges Three-Year sponsorship for Python if GIL removal is accepted" seems misleading at best. The actual wording was:
> support in the form of three engineer-years [...] between the acceptance of PEP 703 and the end of 2025
It seems much more like them pledging engineer time from Meta employees to work on this project than sponsorship, not a monetary sponsorship like the title implies.
That also depends on the scale of the monetary sponsorship, though. If I heard something like "Meta/Google/Whoever is the main sponsor of Python for the next 3 years", I'd assume (perhaps incorrectly) much more money per year than it takes to hire a single engineer. On the other hand, someone just saying "sure, we'll do the silver tier at your next few conferences" is worth a whole lot less than one engineer.
Regardless, the previous commenter's point stands that the title could be a lot more informative.
Yes, and the comment is to request help essentially:
"it would be great if Meta or another tech company could spare some engineers with established CPython internals experience to help the core dev team with this work."
It sounds like Facebook is volunteering 3 person-years which is nothing resembling a 3 year sponsorship. We're talking at least an order of magnitude difference. 3 person-years sounds like one team at Facebook working on the GIL for 3 months.
It's until the end of 2025, about 2.5 years from now. The comment was by GVR, who is presumably adding up all the pledged time to estimate whether there's enough in total.
Do you have JavaScript disabled? It is a deep link into a very, very long forum thread. Instead of pagination, comments outside of a certain window are loaded as you scroll.
Should take you to the first post, which is an overview of the proposal problem and potential solutions.
Mobile Chrome renders the page just fine. No idea about what other WebView you're using, but JS definitely needs to be enabled to load additional content.
In short, GIL is the global interpreter lock, something that is innocuous in single threaded software but a bane for performance in a true multithreading environment. I'd recommend reading up on it elsewhere if you're not familiar with it, as the post elides a lot of detail (the audience is the forum is, after all, people already familiar with python internals).
"nogil", therefore, is a shorthand for CPython running without the global interpreter lock. The crux of the proposal is a discussion on how to achieve it without making large sacrifices to reliability or performance in single threaded scenarios.
JavaScript enabled. Like I said, first site that ever behaves like this; three quarters of the Internet would be broken if I didn't have JS. (And Firefox and archive.is also run JS afaik)
> No idea about what other WebView you're using,
Android default, so also chrome-like, just not the proprietary Google Chrome browser
> "nogil", therefore, is a shorthand for CPython running without the global interpreter lock.
Ooh, I should have realised what nogil means. If this had been in the headline with capitalisation like noGIL, I wouldn't even have needed to click to understand what it is Facebook wants to work on. Thanks!
The GIL is Python’s global interpreter lock which bottlenecks performance in multithreaded applications by locking variables. They’re working on removing it but it introduces race conditions if programmers aren’t aware of the change.
Well, "put your money where your mouth is" would be more apposite. I commend them for volunteering to contribute in this way. A rising tide and all that.
Isn't this just how most open source software with enterprise customers operate?
Big companies pay (money or with developer time) in order to gain support and to steer the product to add features they want. The open source produce lives on, developers get paid for their work, big co gets features they want, and everyone lives a happy life.
This is potentially problematic. It creates a conflict of interest: will the GIL proposal be accepted based on technical merits and potential benefits to the Python community overall, or just because the Python foundation wanted money? The problems that Meta faces in production are not necessarily the problems faced by every Python developer. In particular, Meta might value performance more than backwards-compatibility, while other Python users might feel differently. If Meta were to offer unconditional support, that would clearly be good for Python; attaching conditions to their support makes the situation trickier to navigate.
And it's 3 years of engineer time to work on this project, so I don't get the bribery insinuation. Engineer time to work on a specific project is only valuable if you consider the specific project valuable.
You could maybe make the argument that they're trying to swing priorities if this is considered valuable but less valuable than other things, but given the PSF doesn't actually employ anyone who it can direct to work on specific projects anyway, how is this any different from the already existing prioritisation process of "things that somebody is interested enough to go through the PEP process and then work on"?
TBH having CPython internal experiance is quite a qualification. They would probably contribute at least some of thier time in the no PEP 703 case so the delta is even leass.
Because the whole point of the thread is "It creates a conflict of interest because money" which it doesn't. Time costs money, sure, but time is not equal to money. There are many differences. Otherwise where is my money from HN.
Woah, are you kidding? I would have died to have people dedicated to OSS I worked on! Time is money. We get paid for our _time_ at the core of what we do as a profession. This should be plainly obvious.
The alternative is Meta puts in no money to salaries and benefits for up to three individual's time on this project, or Meta puts in no money at all.
Meta is paying, one way or another, for this project.
> Meta is paying, one way or another, for this project.
Did I deny this?
To put it in extreme clear example, just said time for someone to punch you != time for someone to work for you, while money is same all across. I hope you don't disagree with it.
What is Meta's end game with Python? Do they genuinely just want a stable build without GIL because of some cranky in-house code piles, or is this the first step towards a bigger mission to wrangle an entire ecosystem?
Calling it a "donation" is even too generous. They use python and would save a lot of money running it (less severs) if it could run multi-threaded. Implementing it themselves and maintaining a fork may be cheaper in the short term but due to ongoing maintenance and third-party extension incompatibility it would likely cost them more in the long run. So they are effectively investing 3 engineer/years of resources to A) ensure the project gets done and B) help it get done faster.
I don't see any charity here, this is just a smart business move by them. When you run as many servers as Meta does being able to run 1/4 as many instances is huge RAM and therefore money saving. It won't take long to add up to 3 engineer/years.
I agree charity is not the right way to frame it. But if this was a pure money decision they would save time and effort forking the language and improving it as closed source.
They have the expertise to run a language fork and because of the scale they run at they would easily save more money doing so. Meta contributing to OSS is not an economic decision. It is about branding, trying to attract technically strong individuals (who tend to contribute to OSS or want to), and sticking to their open culture.
> they would save time and effort forking the language and improving it as closed source.
Actually, experience has shown the opposite to be true. Time and time again even large organisations find their fork falling behind the community version and the pain of merging new versions from upstream grows and grows
GIL has been a major pain point when training machine learning models, where every library is Python first. The data preprocessing phase must happen quickly enough and in parallel with respect to the GPU operations. The simplest option is doing data processing in several threads other than the one driving GPU operations, but that is not possible with GIL. TensorFlow and PyTorch each has their workarounds for this problem, but many difficulties remain. Removing the GIL will be greatly beneficial to this domain at least.
Python is closer to C than we all like to imagine, in more than one way. Lots of access to low-level libraries. Long history of writing modules in C. And little in actual memory safety protection more than relying on programmers to please not write bugs in their C or ctypes code.
Philosophically, Python is close to C in the sense that low-abstraction interfaces to system functionality (file descriptors, memory) exist. Numpy allows access to uninitalized memory and you can manipulate how numpy uses memory (mmap, shared memory, sharing memory between objects). Ctypes allows access to malloc and dereferencing dangling pointers. This is not exactly normal Python, but it's inside the realm of the possible in Python.
All of this makes me worried that the challenge of "free threading" is quite large. I wonder if the subinterpreters solution offers a safer route. At least it's not pulling off the GIL band-aid all in one go.
On other hand, allowing access to free threading would be a straight-line continuation of this culture: just make that feature available. ("we're writing system code in a high level language.")
They committed three person-years, not three years. This project would need dozens of person-years over a three year period.
Also a comment on Github is not a binding support contract. Meta executives could deprioritize this project at any time... hell the person making the commitment could already be laid off, we have no clue. As someone who worked in big tech for a long time, trust me - it needs to be in writing with an exec signature.
I can't even imagine how many engineering years it would require to fix just the most used libraries that expect the gil to exist. I assume 3 years wouldn't even cover the actually interpreter, throw in the library piece and how is this useful?
I can’t imagine you’ve read the proposal with a comment like this. The interpreter is already patched (twice in the proposal, for two different versions of Python), and Sam Gross has personally already patched many commonly used Python libraries. Here’s numpy patched, a mess of C and Fortran written for high performance code: https://github.com/colesbury/numpy/commits/v1.24.0-nogil
20+ years later, and billions of lines or python code later, the GIL discussion is still there. Kind of an amazing mess of a problem… mainly shows the priority of python being a ducktape type solution from the start.
Latest decade in software engineering indicate that multi-processing is much preferable over multi-threading. It's the architecture Edge, Firefox, and Chrome uses. So the question is what use cases multi-threading address that are not already covered by multi-processing? The memory overhead that multiple processes cause is relatively insignificant on recent machines with 32GB+ of ram.
Browsers use multiple processes for security and reliability, not as an alternative to multi-threading. They extensively use multi-threading for performance (as does a lot of other modern software).
I don't think anyone is unilaterally going to move to multiprocess as a programming paradigm. Web browsers have their own reasons for using Multi-process, including security from memory isolation, resilience against crashes in the client side code, killing individual tabs, and so on.
I'd wager the vast majority of programs being written that need concurrency or parallelism still use multi threading or SIMD.
If anything, we've seen that the memory overhead of threads is unacceptably high, that's why everything these days uses thread pools / green threads / async. Multiprocessing is a niche - way too much memory usage, way too high communication cost.
Tons of use cases! There are lots of concurrent algorithms that use locking and shared memory instead of message-passing. Matrix multiplication or tree search come to mind.
Parallel blocked n^3 matrix multiplication doesn't use any synchronization primitives (short of join). The processes read from the same input matrices, but write to non-overlapping regions of memory. This is easily accomplished using posix shared memory. Indeed, this is how it is done when the matrices to be multiplied are too large to be handled by single worker nodes.
I think at this point it is better Meta to ditch python and invest in a new programming language that mimics exactly wording for wording of python but without GIL. Google did it with their Carbon for C++, Go for C/D/C++. Microsoft even wrestle with Java for .NET and at one point their J++ 2 decades ago. Meta shpuld create their own. Maybe dump their resources into Modular. Python at this point is basically a C++ with a hotch potch of features and codebase. Start from clean like Mozilla Rust would help entire programming a lot than invest in a niche subset feature of Python which they dont have any control at all (I am speaking from corporate point of view). Python is like Perl, it will continue to have its place even decades later to be generic or niche for AI. But the rest of us may be moving on to something more specialized and cleaner like Rust, Ruby, Modular, etc.
Perl never had anywhere near the wealth of libraries and community support Python enjoys today. The language is used nearly everywhere, from utility-scripts to build piplines, desktop applications, webservice backends, all the way up to scientific computing and AI.
Python shines because of it's ecosystem and community support, which I'd guess (without having any numbers to back this up) may very well be unmatched among programming languages today. Maybe Java and C/C++ come somewhat close, but I'd doubt even these behemoths would stand a chance in a 1:1 comparison for sheer size and versatility of what for Python is just a quick `pip install` away.
Any language that want's to replace Python, even internally at a company, will either have to deal with the fallout of leaving all that behind, OR be compatible with Python to an extend that it doesn't matter. And that's a really tall order.
I think the problem there is even a company like Meta wants to rest on the shoulders of the giant Python community. It makes sense for Meta to try to get parts of Cinder into Python.
why not just use an existing language? ATen (pytorch’s autodiff engine) is written in C++ anyway, so why not use that?
i really don’t understand the trend with launching a new language every time it seems convenient; it may have the advantage of being made at {company}, but this is also its great downside: it exists, is maintained, and used only with relation to the project/company that prompted its creation
for example, swift is only used for swiftui and some other apple specific stuff, and no sane person is going to write an unrelated serious project in it
Agreed, they should port their AI stacks to a functional language. It looks like these efforts are designed to prop up Python for the scientific stack.
Most of this work has been on Javascript, which is single-threaded and less mutable than Python, but PyPy has also made some significant contributions. PyPy has a GIL.
There is also work on Ruby. Ruby is even more mutable than Python, but also has a GIL. Work on Ruby mirrors that on Javascript.
https://peps.python.org/pep-0703/#backwards-compatibility
That section is short and I encourage you to read it in full. Here's what it says about how removing the GIL affects existing Python code:
• Destructors and weak reference callbacks for code objects and top-level function objects are delayed until the next cyclic garbage collection due to the use of deferred reference counting.
• Destructors for some objects accessed by multiple threads may be delayed slightly due to biased reference counting. This is rare: most objects, even those accessed by multiple threads, are destroyed immediately as soon as their reference counts are zero. Two places in the Python standard library tests required gc.collect() calls to continue to pass.
That's it with respect to Python code.
Removing the GIL will require a new ABI, so existing C-API extensions will minimally need to be rebuilt and may also require other changes. Updating C-API extensions is where the majority of work will be if the PEP is accepted.
This will be an opt-in feature by building the interpreter using `--disable-gil`. The PEP is currently targeted at Python 3.13.
This is nothing like the Python 2 > 3 transition.