Étiquette : textual analysis (page 1 of 2)

“With a unified model for a large number of languages, we run the risk of being mediocre for each language, which makes the problem challenging. Moreover, it’s difficult to get human-annotated data for many of the languages. Although SynthText has been helpful as a way to bootstrap training, it’s not yet a replacement for human-annotated data sets. We are therefore exploring ways to bridge the domain gap between our synthetic engine and real-world distribution of text on images”.

Source : Rosetta: Understanding text in images and videos with machine learning – Facebook Code

« Franco Moretti, founder of the Stanford Literary Lab, which applies data analysis to the study of fiction, argues that certain books survive through the choices of ordinary readers, a process something like evolution: “Literary history is shaped by the fact that readers select a literary work, keeping it alive across the generations, because they like some of its prominent traits.”
What traits make Austen special, and can they be measured with data? Can literary genius be graphed? »

Source : The Word Choices That Explain Why Jane Austen Endures – NYTimes.com

« Contre le fléau des commentaires des trolls, l’intelligence artificielle de Google échoue tristement. Cela est dû à un système encore incapable de bien analyser les phrases, notamment si elles présentent des coquilles (volontaires) ou d’autres moyens de contournement ».

Voilà une incompréhension surprenante de l’IA et en particulier du machine learning. Les auteurs comprennent suffisamment le principe pour le détourner et faire parler d’eux, mais pas assez pour comprendre que cette critique est totalement infondée et que leur propos induit en erreur. Le machine learning repose sur l’apprentissage. C’est la raison pour laquelle elle ne peut pas s’adapter instantanément à un changement de pratiques de ce type. Mais elle s’adaptera très vite, ce que ce type d’approche n’aborde pas sérieusement. En plus de passer pour des crétins, ceux qui utiliseraient ce type de subterfuge seraient encore plus facilement détectés, car c’est beaucoup plus facile à identifier que les problèmes de fond qui furent plus sérieusement abordés par une autre étude… If Only AI Could Save Us from Ourselves de David Auerbach.

Source : L’IA anti-trolls de Google se fait berner par de simples coquilles – Tech – Numerama

« October 30th, 2010 marks the day that my sister Amy and I founded Meta on a mission to unlock scientific knowledge and accelerate the pace of discovery. In six years, through the hands and minds of our talented team of engineers and scientists, we figured out how to use artificial intelligence to analyze new scientific knowledge as it’s published – along with the majority of what has been written, throughout modern history. Those efforts have led us to today.I am excited to announce that Meta will be joining the Chan Zuckerberg Initiative to bring what we have built to the entire scientific community, toward their goal to cure, prevent, or manage all diseases by the end of the century ».

Source : Meta – AI for Science

« L’analyse automatique des textes se prête bien à l’apprentissage profond, capable de traiter efficacement de grandes quantités de données », explique Yoshua Bengio, directeur du MILA. « Grâce au Fonds Druide, nous pourrons accroitre d’environ 20 % notre budget de recherche pour l’analyse des textes.

Source : Druide donne un million de dollars à l’Université de Montréal | Druide

To train Google’s artificial Q&A brain, Orr and company also use old news stories, where machines start to see how headlines serve as short summaries of the longer articles that follow. But for now, the company still needs its team of PhD linguists. They not only demonstrate sentence compression, but actually label parts of speech in ways that help neural nets understand how human language works. Spanning about 100 PhD linguists across the globe, the Pygmalion team produces what Orr calls “the gold data,” w

Source : Google’s Hand-fed AI Now Gives Answers, Not Just Search Results | WIRED

If AI learns language sufficiently well, it will also learn cultural associations that are offensive, objectionable, or harmful. At a high level, bias is meaning. “Debiasing” these machine models, while intriguing and technically interesting, necessarily harms meaning.

Source : Language necessarily contains human biases, and so will machines trained on language corpora

Romeo and Juliet

Are Shakespeare’s tragedies all structured in the same way? Are the characters rather isolated, grouped, all connected?

Source : Network visualization: mapping Shakespeare’s tragedies (Martin Grandjean).

This is the social network from all the 6 movies combined together:

Source : The Star Wars social network

A new paper published in PLoS ONE outlines some of the major problems with the corpus of scanned books that powers Google Ngram. “It’s so beguiling, so powerful,” says Peter Sheridan Dodds, an applied mathematician at the University of Vermont who co-authored the paper. “But I think there’s a misrepresentation of what people should expect from this corpus right now.” Here are some of the problems.

Source : The Pitfalls of Using Google Ngram to Study Language | WIRED

« Older posts

© 2019 no-Flux

Theme by Anders NorenUp ↑